Free Ebook cover Polyglot Performance Patterns: Writing Fast, Safe Code Across Python, Ruby, Java, and C

Polyglot Performance Patterns: Writing Fast, Safe Code Across Python, Ruby, Java, and C

New course

17 pages

Capstone: High-Throughput Component with a C Core and Multi-Language Bindings

Capítulo 17

Estimated reading time: 0 minutes

+ Exercise

What You Are Building in This Capstone

This capstone walks through designing and implementing a high-throughput component whose hot path lives in C, while exposing stable, ergonomic APIs to Python, Ruby, and Java. The goal is not to re-teach general performance methodology or cross-language theory, but to show a concrete, end-to-end build that you can adapt: a C “core” library with a narrow ABI, a streaming interface, and bindings that feel native in each language.

The example component is a streaming “record normalizer” that ingests byte chunks, splits them into newline-delimited records, validates a small schema, and emits normalized records. It is intentionally simple but representative: it exercises incremental parsing, stateful streaming, error reporting, and high call volume. The core idea is that the C layer owns the state machine and the tight loops, while each language binding focuses on: (1) converting inputs to bytes without copying when possible, (2) feeding chunks, (3) draining outputs, and (4) mapping errors to idiomatic exceptions.

Core Design: A Small C ABI That Scales to Many Languages

Principles for the C Core API

  • Opaque handles: callers hold a pointer-sized handle; internal structs remain private to allow evolution without breaking bindings.
  • Streaming, not “one big call”: accept partial input and produce partial output; this reduces peak memory and fits IO pipelines.
  • Explicit buffers: the C core never returns pointers to internal memory that the caller must free unless the ownership rules are unambiguous.
  • Error codes + message retrieval: language bindings can map codes to exceptions while still exposing details.
  • Versioned ABI: a function to query ABI version and feature flags helps multi-language packaging and upgrades.

The C Header (Public ABI)

#ifndef HTP_CORE_H
#define HTP_CORE_H

#include <stddef.h>
#include <stdint.h>

#ifdef __cplusplus
extern "C" {
#endif

typedef struct htp_ctx htp_ctx;

typedef enum {
  HTP_OK = 0,
  HTP_EINVAL = 1,
  HTP_EOOM = 2,
  HTP_ESTATE = 3,
  HTP_EBADREC = 4,
  HTP_EOVERFLOW = 5
} htp_status;

typedef struct {
  const uint8_t* data;
  size_t len;
} htp_bytes;

typedef struct {
  uint32_t abi_version;
  uint32_t features;
} htp_info;

htp_info htp_get_info(void);

htp_ctx* htp_create(void);
void htp_destroy(htp_ctx* ctx);

void htp_reset(htp_ctx* ctx);

/* Feed input bytes. The core copies only what it must to complete a record. */
htp_status htp_feed(htp_ctx* ctx, const uint8_t* data, size_t len);

/* Signal end-of-stream to flush a final record if allowed. */
htp_status htp_finish(htp_ctx* ctx);

/* Drain normalized records into caller-provided buffer.
   Returns HTP_OK and sets *written to bytes written.
   If no output is available, *written = 0.
*/
htp_status htp_drain(htp_ctx* ctx, uint8_t* out, size_t out_cap, size_t* written);

/* Error details for last non-OK status. */
const char* htp_last_error(htp_ctx* ctx);

#ifdef __cplusplus
}
#endif

#endif

This ABI is intentionally “boring”: pointers, sizes, enums, and a couple of structs. That makes it bindable from almost anywhere. The streaming split between feed and drain avoids returning heap-allocated output that would require cross-runtime freeing conventions.

Implementing the C Core: State, Buffers, and Deterministic Output

Internal State Layout

Internally, the context holds: an input accumulator for partial records, an output ring buffer (or a simple growable buffer with read/write indices), and the last error string. The key is to keep ownership and lifetimes entirely within the C library.

/* htp_core.c (private) */
#include "htp_core.h"
#include <stdlib.h>
#include <string.h>

#define HTP_ABI_VERSION 1

struct htp_ctx {
  uint8_t* in_buf;
  size_t in_len;
  size_t in_cap;

  uint8_t* out_buf;
  size_t out_r;
  size_t out_w;
  size_t out_cap;

  char last_err[256];
  int finished;
};

static void set_err(htp_ctx* ctx, const char* msg) {
  if (!ctx) return;
  strncpy(ctx->last_err, msg, sizeof(ctx->last_err) - 1);
  ctx->last_err[sizeof(ctx->last_err) - 1] = '\0';
}

htp_info htp_get_info(void) {
  htp_info i;
  i.abi_version = HTP_ABI_VERSION;
  i.features = 0;
  return i;
}

Output Buffer Strategy

To keep bindings simple, htp_drain copies bytes into a caller-provided buffer. Internally you can implement the output as a circular buffer. For clarity, the following uses a linear buffer with read/write indices and compaction when needed.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

static htp_status ensure_cap(uint8_t** buf, size_t* cap, size_t need) {
  if (*cap >= need) return HTP_OK;
  size_t new_cap = (*cap == 0) ? 4096 : *cap;
  while (new_cap < need) {
    if (new_cap > (SIZE_MAX / 2)) return HTP_EOVERFLOW;
    new_cap *= 2;
  }
  uint8_t* p = (uint8_t*)realloc(*buf, new_cap);
  if (!p) return HTP_EOOM;
  *buf = p;
  *cap = new_cap;
  return HTP_OK;
}

static void out_compact(htp_ctx* ctx) {
  if (ctx->out_r == 0) return;
  if (ctx->out_r == ctx->out_w) {
    ctx->out_r = ctx->out_w = 0;
    return;
  }
  memmove(ctx->out_buf, ctx->out_buf + ctx->out_r, ctx->out_w - ctx->out_r);
  ctx->out_w -= ctx->out_r;
  ctx->out_r = 0;
}

Record Normalization Logic

Assume each record is key=value with ASCII key, and value is trimmed; output is key\tvalue\n. The core scans for newline, validates, normalizes, and appends to the output buffer. The binding languages do not need to understand the record format; they just stream bytes in and bytes out.

static int is_key_char(uint8_t c) {
  return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_' || c == '-';
}

static htp_status emit(htp_ctx* ctx, const uint8_t* data, size_t len) {
  out_compact(ctx);
  htp_status st = ensure_cap(&ctx->out_buf, &ctx->out_cap, ctx->out_w + len);
  if (st != HTP_OK) return st;
  memcpy(ctx->out_buf + ctx->out_w, data, len);
  ctx->out_w += len;
  return HTP_OK;
}

static htp_status normalize_record(htp_ctx* ctx, const uint8_t* rec, size_t len) {
  /* strip trailing \r */
  if (len > 0 && rec[len - 1] == '\r') len--;
  if (len == 0) return HTP_OK; /* ignore empty lines */

  size_t eq = 0;
  while (eq < len && rec[eq] != '=') eq++;
  if (eq == 0 || eq == len) {
    set_err(ctx, "bad record: expected key=value");
    return HTP_EBADREC;
  }
  for (size_t i = 0; i < eq; i++) {
    if (!is_key_char(rec[i])) {
      set_err(ctx, "bad record: invalid key character");
      return HTP_EBADREC;
    }
  }

  /* trim spaces around value */
  size_t v0 = eq + 1;
  while (v0 < len && (rec[v0] == ' ' || rec[v0] == '\t')) v0++;
  size_t v1 = len;
  while (v1 > v0 && (rec[v1 - 1] == ' ' || rec[v1 - 1] == '\t')) v1--;

  htp_status st;
  st = emit(ctx, rec, eq);
  if (st != HTP_OK) return st;
  st = emit(ctx, (const uint8_t*)"\t", 1);
  if (st != HTP_OK) return st;
  st = emit(ctx, rec + v0, v1 - v0);
  if (st != HTP_OK) return st;
  st = emit(ctx, (const uint8_t*)"\n", 1);
  return st;
}

Feeding Chunks and Splitting Lines

htp_feed appends incoming bytes to the input accumulator only when necessary. A simple approach is: scan the incoming chunk for newlines; for each complete record, normalize directly from either the chunk or the accumulator. If a record spans chunks, store the partial bytes in in_buf.

htp_ctx* htp_create(void) {
  htp_ctx* ctx = (htp_ctx*)calloc(1, sizeof(htp_ctx));
  if (!ctx) return NULL;
  set_err(ctx, "");
  return ctx;
}

void htp_destroy(htp_ctx* ctx) {
  if (!ctx) return;
  free(ctx->in_buf);
  free(ctx->out_buf);
  free(ctx);
}

void htp_reset(htp_ctx* ctx) {
  if (!ctx) return;
  ctx->in_len = 0;
  ctx->out_r = ctx->out_w = 0;
  ctx->finished = 0;
  set_err(ctx, "");
}

htp_status htp_feed(htp_ctx* ctx, const uint8_t* data, size_t len) {
  if (!ctx || (!data && len != 0)) return HTP_EINVAL;
  if (ctx->finished) {
    set_err(ctx, "cannot feed after finish");
    return HTP_ESTATE;
  }

  size_t i = 0;
  while (i < len) {
    size_t start = i;
    while (i < len && data[i] != '\n') i++;

    if (i < len && data[i] == '\n') {
      /* complete line in [start, i) */
      if (ctx->in_len == 0) {
        htp_status st = normalize_record(ctx, data + start, i - start);
        if (st != HTP_OK) return st;
      } else {
        /* append and normalize from accumulator */
        htp_status st = ensure_cap(&ctx->in_buf, &ctx->in_cap, ctx->in_len + (i - start));
        if (st != HTP_OK) return st;
        memcpy(ctx->in_buf + ctx->in_len, data + start, i - start);
        ctx->in_len += (i - start);
        st = normalize_record(ctx, ctx->in_buf, ctx->in_len);
        if (st != HTP_OK) return st;
        ctx->in_len = 0;
      }
      i++; /* skip \n */
    } else {
      /* trailing partial */
      size_t part = len - start;
      htp_status st = ensure_cap(&ctx->in_buf, &ctx->in_cap, ctx->in_len + part);
      if (st != HTP_OK) return st;
      memcpy(ctx->in_buf + ctx->in_len, data + start, part);
      ctx->in_len += part;
      break;
    }
  }
  return HTP_OK;
}

htp_status htp_finish(htp_ctx* ctx) {
  if (!ctx) return HTP_EINVAL;
  ctx->finished = 1;
  if (ctx->in_len == 0) return HTP_OK;
  /* allow final record without newline */
  htp_status st = normalize_record(ctx, ctx->in_buf, ctx->in_len);
  if (st != HTP_OK) return st;
  ctx->in_len = 0;
  return HTP_OK;
}

htp_status htp_drain(htp_ctx* ctx, uint8_t* out, size_t out_cap, size_t* written) {
  if (!ctx || !written || (!out && out_cap != 0)) return HTP_EINVAL;
  size_t avail = (ctx->out_w >= ctx->out_r) ? (ctx->out_w - ctx->out_r) : 0;
  size_t n = (avail < out_cap) ? avail : out_cap;
  if (n > 0) memcpy(out, ctx->out_buf + ctx->out_r, n);
  ctx->out_r += n;
  *written = n;
  return HTP_OK;
}

const char* htp_last_error(htp_ctx* ctx) {
  if (!ctx) return "";
  return ctx->last_err;
}

Build Artifacts: One Core, Many Packages

Compiling the C Core as a Shared Library

For multi-language distribution, you typically want a shared library (.so, .dylib, .dll) plus headers for native builds. Keep the exported symbol set minimal and stable.

# Linux example
cc -O3 -fPIC -shared -o libhtp_core.so htp_core.c

# macOS example
cc -O3 -fPIC -dynamiclib -o libhtp_core.dylib htp_core.c

In a real project, use a build system (CMake/Meson) to produce consistent outputs, set symbol visibility, and generate platform-specific names. The important capstone takeaway is to treat the C core as the single source of truth for behavior and to keep the ABI narrow.

Python Binding (ctypes): Zero-Copy Inputs, Buffered Outputs

Binding Strategy

Using ctypes keeps the example dependency-light. The binding wraps the opaque handle, converts Python bytes/bytearray/memoryview to a pointer+length, and provides a generator-like interface that yields normalized output chunks.

# htp_py.py
import ctypes as C

lib = C.CDLL("./libhtp_core.so")

class Ctx(C.Structure):
    pass

lib.htp_create.restype = C.POINTER(Ctx)
lib.htp_destroy.argtypes = [C.POINTER(Ctx)]
lib.htp_feed.argtypes = [C.POINTER(Ctx), C.POINTER(C.c_uint8), C.c_size_t]
lib.htp_feed.restype = C.c_int
lib.htp_finish.argtypes = [C.POINTER(Ctx)]
lib.htp_finish.restype = C.c_int
lib.htp_drain.argtypes = [C.POINTER(Ctx), C.POINTER(C.c_uint8), C.c_size_t, C.POINTER(C.c_size_t)]
lib.htp_drain.restype = C.c_int
lib.htp_last_error.argtypes = [C.POINTER(Ctx)]
lib.htp_last_error.restype = C.c_char_p

class HtpError(Exception):
    pass

class Normalizer:
    def __init__(self):
        self._ctx = lib.htp_create()
        if not self._ctx:
            raise MemoryError("htp_create failed")
        self._out = (C.c_uint8 * 65536)()

    def close(self):
        if self._ctx:
            lib.htp_destroy(self._ctx)
            self._ctx = None

    def _check(self, st):
        if st != 0:
            msg = lib.htp_last_error(self._ctx).decode("utf-8", "replace")
            raise HtpError(f"status={st}: {msg}")

    def feed(self, data):
        mv = memoryview(data)
        if not mv.contiguous:
            mv = mv.tobytes()
        ptr = C.cast(C.c_void_p(C.addressof(C.c_char.from_buffer(mv))), C.POINTER(C.c_uint8))
        self._check(lib.htp_feed(self._ctx, ptr, mv.nbytes))

    def finish(self):
        self._check(lib.htp_finish(self._ctx))

    def drain(self):
        written = C.c_size_t(0)
        while True:
            self._check(lib.htp_drain(self._ctx, self._out, len(self._out), C.byref(written)))
            n = written.value
            if n == 0:
                break
            yield bytes(self._out[:n])

Step-by-step usage in Python: create the wrapper, feed chunks, drain periodically, then finish and drain remaining output.

n = Normalizer()
try:
    n.feed(b"a= 1\n")
    n.feed(b"b=two")
    for chunk in n.drain():
        print(chunk)
    n.finish()
    for chunk in n.drain():
        print(chunk)
finally:
    n.close()

Ruby Binding (FFI): Idiomatic Wrapper with Explicit Lifecycle

Binding Strategy

Ruby’s ffi gem can bind to the shared library without compiling a native extension. The wrapper should manage the handle lifecycle and expose a simple API. Because Ruby strings are mutable and have encodings, treat inputs as binary (ASCII-8BIT) and pass pointers safely.

# htp_rb.rb
require 'ffi'

module HTP
  extend FFI::Library
  ffi_lib './libhtp_core.so'

  class Ctx < FFI::Struct
  end

  attach_function :htp_create, [], :pointer
  attach_function :htp_destroy, [:pointer], :void
  attach_function :htp_feed, [:pointer, :pointer, :size_t], :int
  attach_function :htp_finish, [:pointer], :int
  attach_function :htp_drain, [:pointer, :pointer, :size_t, :pointer], :int
  attach_function :htp_last_error, [:pointer], :string

  class Error < StandardError; end

  class Normalizer
    def initialize
      @ctx = HTP.htp_create
      raise NoMemoryError, 'htp_create failed' if @ctx.null?
      @out = FFI::MemoryPointer.new(:uint8, 65536)
      ObjectSpace.define_finalizer(self, self.class.finalize(@ctx))
    end

    def self.finalize(ctx)
      proc { HTP.htp_destroy(ctx) unless ctx.null? }
    end

    def close
      return if @ctx.nil? || @ctx.null?
      HTP.htp_destroy(@ctx)
      @ctx = FFI::Pointer::NULL
    end

    def check(st)
      return if st == 0
      raise Error, "status=#{st}: #{HTP.htp_last_error(@ctx)}"
    end

    def feed(str)
      s = str.dup
      s.force_encoding(Encoding::BINARY)
      ptr = FFI::MemoryPointer.from_string(s)
      # from_string adds a NUL terminator; pass explicit length without it
      check(HTP.htp_feed(@ctx, ptr, s.bytesize))
    end

    def finish
      check(HTP.htp_finish(@ctx))
    end

    def drain
      chunks = []
      written = FFI::MemoryPointer.new(:size_t)
      loop do
        check(HTP.htp_drain(@ctx, @out, 65536, written))
        n = written.read_size_t
        break if n == 0
        chunks << @out.get_bytes(0, n)
      end
      chunks
    end
  end
end

Step-by-step usage in Ruby: feed binary strings, drain to an array of chunks, and close explicitly in long-running processes.

n = HTP::Normalizer.new
n.feed("a= 1\n")
n.feed("b=two")
puts n.drain
n.finish
puts n.drain
n.close

Java Binding (JNI): DirectByteBuffer for Efficient Transfer

Binding Strategy

For Java, JNI is the common baseline. The binding uses a long field to store the native pointer. For IO-like usage, accept ByteBuffer inputs (preferably direct buffers) and write output into a caller-provided direct buffer. This avoids per-call array copies when the caller can supply direct buffers.

Java API Skeleton

// HtpNormalizer.java
public final class HtpNormalizer implements AutoCloseable {
  static { System.loadLibrary("htp_core_jni"); }

  private long ctx;

  public HtpNormalizer() {
    this.ctx = nativeCreate();
    if (this.ctx == 0) throw new OutOfMemoryError("nativeCreate failed");
  }

  private static native long nativeCreate();
  private static native void nativeDestroy(long ctx);
  public native void reset();
  public native void feed(java.nio.ByteBuffer in, int len);
  public native void finish();
  public native int drain(java.nio.ByteBuffer out); // returns bytes written
  private native String lastError();

  @Override
  public void close() {
    if (ctx != 0) {
      nativeDestroy(ctx);
      ctx = 0;
    }
  }

  void checkStatus(int st) {
    if (st != 0) throw new RuntimeException("status=" + st + ": " + lastError());
  }
}

JNI C Implementation Skeleton

/* htp_core_jni.c */
#include "htp_core.h"
#include <jni.h>

JNIEXPORT jlong JNICALL Java_HtpNormalizer_nativeCreate(JNIEnv* env, jclass cls) {
  (void)env; (void)cls;
  htp_ctx* ctx = htp_create();
  return (jlong)(uintptr_t)ctx;
}

JNIEXPORT void JNICALL Java_HtpNormalizer_nativeDestroy(JNIEnv* env, jclass cls, jlong p) {
  (void)env; (void)cls;
  htp_destroy((htp_ctx*)(uintptr_t)p);
}

static htp_ctx* get_ctx(JNIEnv* env, jobject self) {
  jclass c = (*env)->GetObjectClass(env, self);
  jfieldID f = (*env)->GetFieldID(env, c, "ctx", "J");
  jlong p = (*env)->GetLongField(env, self, f);
  return (htp_ctx*)(uintptr_t)p;
}

JNIEXPORT void JNICALL Java_HtpNormalizer_feed(JNIEnv* env, jobject self, jobject inBuf, jint len) {
  htp_ctx* ctx = get_ctx(env, self);
  uint8_t* in = (uint8_t*)(*env)->GetDirectBufferAddress(env, inBuf);
  if (!in) {
    /* Fallback could copy from heap ByteBuffer; omitted here for brevity */
    return;
  }
  int st = htp_feed(ctx, in, (size_t)len);
  if (st != HTP_OK) {
    const char* msg = htp_last_error(ctx);
    jclass ex = (*env)->FindClass(env, "java/lang/RuntimeException");
    (*env)->ThrowNew(env, ex, msg);
  }
}

JNIEXPORT jint JNICALL Java_HtpNormalizer_drain(JNIEnv* env, jobject self, jobject outBuf) {
  htp_ctx* ctx = get_ctx(env, self);
  uint8_t* out = (uint8_t*)(*env)->GetDirectBufferAddress(env, outBuf);
  jlong cap = (*env)->GetDirectBufferCapacity(env, outBuf);
  size_t written = 0;
  int st = htp_drain(ctx, out, (size_t)cap, &written);
  if (st != HTP_OK) {
    const char* msg = htp_last_error(ctx);
    jclass ex = (*env)->FindClass(env, "java/lang/RuntimeException");
    (*env)->ThrowNew(env, ex, msg);
    return 0;
  }
  return (jint)written;
}

Step-by-step usage in Java: allocate direct buffers, feed input, drain output in a loop. The caller controls buffer sizes and can integrate with NIO channels.

try (HtpNormalizer n = new HtpNormalizer()) {
  java.nio.ByteBuffer in = java.nio.ByteBuffer.allocateDirect(64 * 1024);
  java.nio.ByteBuffer out = java.nio.ByteBuffer.allocateDirect(64 * 1024);

  in.put("a= 1\n".getBytes(java.nio.charset.StandardCharsets.US_ASCII));
  in.flip();
  n.feed(in, in.remaining());

  out.clear();
  int w = n.drain(out);
  out.limit(w);
  // consume out

  n.finish();
}

Packaging and Compatibility: Keeping the Multi-Language Surface Stable

ABI Version Checks in Bindings

Because you will ship the same C core to multiple ecosystems, add a lightweight runtime check: bindings call htp_get_info and verify abi_version. This prevents subtle crashes when a binding expects a different ABI.

/* Example C call pattern from bindings: htp_get_info().abi_version == 1 */

Error Mapping and Diagnostics

Keep the C status codes stable and small. Bindings should map them to exceptions, but preserve the original status integer and the C error message. In practice, you will want to include enough context in last_err to debug malformed input (for example, record number, byte offset, or a short snippet). The capstone pattern is: C core stores the last error message in the context; bindings fetch it immediately when a call fails.

Threading Expectations

Make the threading contract explicit in the C API documentation: a single htp_ctx is not thread-safe; callers may create one context per thread. Bindings should not share a single instance across threads unless they add their own synchronization. This keeps the C core simple and avoids surprising contention.

Integration Pattern: A Unified Streaming Adapter in Each Language

To make the component easy to adopt, each binding should expose the same conceptual interface even if the idioms differ: feed(bytes), finish(), and drain(). This lets you write similar pipeline stages across Python, Ruby, and Java, and it makes cross-language test vectors reusable: the same input chunks should produce the same output bytes regardless of the host language.

A practical step-by-step approach to integrating this component into a larger system is: (1) define the C ABI and freeze it early, (2) implement the C core with a streaming state machine and deterministic output, (3) write a small “golden vectors” file of input chunks and expected output, (4) implement bindings that pass those vectors, (5) package the shared library per platform and ensure the bindings locate it reliably at runtime, (6) add a compatibility check via htp_get_info so mismatches fail fast.

Now answer the exercise about the content:

Why does the component use a streaming split between feed and drain instead of returning heap-allocated output from a single call?

You are right! Congratulations, now go to the next page

You missed! Try again.

Separating feed and drain lets the caller provide buffers, so the core does not hand out heap-allocated memory that bindings must free. It also supports partial input/output, which reduces peak memory and integrates well with streaming IO.

Next chapter

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.