What You Are Building in This Capstone
This capstone walks through designing and implementing a high-throughput component whose hot path lives in C, while exposing stable, ergonomic APIs to Python, Ruby, and Java. The goal is not to re-teach general performance methodology or cross-language theory, but to show a concrete, end-to-end build that you can adapt: a C “core” library with a narrow ABI, a streaming interface, and bindings that feel native in each language.
The example component is a streaming “record normalizer” that ingests byte chunks, splits them into newline-delimited records, validates a small schema, and emits normalized records. It is intentionally simple but representative: it exercises incremental parsing, stateful streaming, error reporting, and high call volume. The core idea is that the C layer owns the state machine and the tight loops, while each language binding focuses on: (1) converting inputs to bytes without copying when possible, (2) feeding chunks, (3) draining outputs, and (4) mapping errors to idiomatic exceptions.
Core Design: A Small C ABI That Scales to Many Languages
Principles for the C Core API
- Opaque handles: callers hold a pointer-sized handle; internal structs remain private to allow evolution without breaking bindings.
- Streaming, not “one big call”: accept partial input and produce partial output; this reduces peak memory and fits IO pipelines.
- Explicit buffers: the C core never returns pointers to internal memory that the caller must free unless the ownership rules are unambiguous.
- Error codes + message retrieval: language bindings can map codes to exceptions while still exposing details.
- Versioned ABI: a function to query ABI version and feature flags helps multi-language packaging and upgrades.
The C Header (Public ABI)
#ifndef HTP_CORE_H
#define HTP_CORE_H
#include <stddef.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
typedef struct htp_ctx htp_ctx;
typedef enum {
HTP_OK = 0,
HTP_EINVAL = 1,
HTP_EOOM = 2,
HTP_ESTATE = 3,
HTP_EBADREC = 4,
HTP_EOVERFLOW = 5
} htp_status;
typedef struct {
const uint8_t* data;
size_t len;
} htp_bytes;
typedef struct {
uint32_t abi_version;
uint32_t features;
} htp_info;
htp_info htp_get_info(void);
htp_ctx* htp_create(void);
void htp_destroy(htp_ctx* ctx);
void htp_reset(htp_ctx* ctx);
/* Feed input bytes. The core copies only what it must to complete a record. */
htp_status htp_feed(htp_ctx* ctx, const uint8_t* data, size_t len);
/* Signal end-of-stream to flush a final record if allowed. */
htp_status htp_finish(htp_ctx* ctx);
/* Drain normalized records into caller-provided buffer.
Returns HTP_OK and sets *written to bytes written.
If no output is available, *written = 0.
*/
htp_status htp_drain(htp_ctx* ctx, uint8_t* out, size_t out_cap, size_t* written);
/* Error details for last non-OK status. */
const char* htp_last_error(htp_ctx* ctx);
#ifdef __cplusplus
}
#endif
#endif
This ABI is intentionally “boring”: pointers, sizes, enums, and a couple of structs. That makes it bindable from almost anywhere. The streaming split between feed and drain avoids returning heap-allocated output that would require cross-runtime freeing conventions.
Implementing the C Core: State, Buffers, and Deterministic Output
Internal State Layout
Internally, the context holds: an input accumulator for partial records, an output ring buffer (or a simple growable buffer with read/write indices), and the last error string. The key is to keep ownership and lifetimes entirely within the C library.
/* htp_core.c (private) */
#include "htp_core.h"
#include <stdlib.h>
#include <string.h>
#define HTP_ABI_VERSION 1
struct htp_ctx {
uint8_t* in_buf;
size_t in_len;
size_t in_cap;
uint8_t* out_buf;
size_t out_r;
size_t out_w;
size_t out_cap;
char last_err[256];
int finished;
};
static void set_err(htp_ctx* ctx, const char* msg) {
if (!ctx) return;
strncpy(ctx->last_err, msg, sizeof(ctx->last_err) - 1);
ctx->last_err[sizeof(ctx->last_err) - 1] = '\0';
}
htp_info htp_get_info(void) {
htp_info i;
i.abi_version = HTP_ABI_VERSION;
i.features = 0;
return i;
}
Output Buffer Strategy
To keep bindings simple, htp_drain copies bytes into a caller-provided buffer. Internally you can implement the output as a circular buffer. For clarity, the following uses a linear buffer with read/write indices and compaction when needed.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
static htp_status ensure_cap(uint8_t** buf, size_t* cap, size_t need) {
if (*cap >= need) return HTP_OK;
size_t new_cap = (*cap == 0) ? 4096 : *cap;
while (new_cap < need) {
if (new_cap > (SIZE_MAX / 2)) return HTP_EOVERFLOW;
new_cap *= 2;
}
uint8_t* p = (uint8_t*)realloc(*buf, new_cap);
if (!p) return HTP_EOOM;
*buf = p;
*cap = new_cap;
return HTP_OK;
}
static void out_compact(htp_ctx* ctx) {
if (ctx->out_r == 0) return;
if (ctx->out_r == ctx->out_w) {
ctx->out_r = ctx->out_w = 0;
return;
}
memmove(ctx->out_buf, ctx->out_buf + ctx->out_r, ctx->out_w - ctx->out_r);
ctx->out_w -= ctx->out_r;
ctx->out_r = 0;
}
Record Normalization Logic
Assume each record is key=value with ASCII key, and value is trimmed; output is key\tvalue\n. The core scans for newline, validates, normalizes, and appends to the output buffer. The binding languages do not need to understand the record format; they just stream bytes in and bytes out.
static int is_key_char(uint8_t c) {
return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_' || c == '-';
}
static htp_status emit(htp_ctx* ctx, const uint8_t* data, size_t len) {
out_compact(ctx);
htp_status st = ensure_cap(&ctx->out_buf, &ctx->out_cap, ctx->out_w + len);
if (st != HTP_OK) return st;
memcpy(ctx->out_buf + ctx->out_w, data, len);
ctx->out_w += len;
return HTP_OK;
}
static htp_status normalize_record(htp_ctx* ctx, const uint8_t* rec, size_t len) {
/* strip trailing \r */
if (len > 0 && rec[len - 1] == '\r') len--;
if (len == 0) return HTP_OK; /* ignore empty lines */
size_t eq = 0;
while (eq < len && rec[eq] != '=') eq++;
if (eq == 0 || eq == len) {
set_err(ctx, "bad record: expected key=value");
return HTP_EBADREC;
}
for (size_t i = 0; i < eq; i++) {
if (!is_key_char(rec[i])) {
set_err(ctx, "bad record: invalid key character");
return HTP_EBADREC;
}
}
/* trim spaces around value */
size_t v0 = eq + 1;
while (v0 < len && (rec[v0] == ' ' || rec[v0] == '\t')) v0++;
size_t v1 = len;
while (v1 > v0 && (rec[v1 - 1] == ' ' || rec[v1 - 1] == '\t')) v1--;
htp_status st;
st = emit(ctx, rec, eq);
if (st != HTP_OK) return st;
st = emit(ctx, (const uint8_t*)"\t", 1);
if (st != HTP_OK) return st;
st = emit(ctx, rec + v0, v1 - v0);
if (st != HTP_OK) return st;
st = emit(ctx, (const uint8_t*)"\n", 1);
return st;
}
Feeding Chunks and Splitting Lines
htp_feed appends incoming bytes to the input accumulator only when necessary. A simple approach is: scan the incoming chunk for newlines; for each complete record, normalize directly from either the chunk or the accumulator. If a record spans chunks, store the partial bytes in in_buf.
htp_ctx* htp_create(void) {
htp_ctx* ctx = (htp_ctx*)calloc(1, sizeof(htp_ctx));
if (!ctx) return NULL;
set_err(ctx, "");
return ctx;
}
void htp_destroy(htp_ctx* ctx) {
if (!ctx) return;
free(ctx->in_buf);
free(ctx->out_buf);
free(ctx);
}
void htp_reset(htp_ctx* ctx) {
if (!ctx) return;
ctx->in_len = 0;
ctx->out_r = ctx->out_w = 0;
ctx->finished = 0;
set_err(ctx, "");
}
htp_status htp_feed(htp_ctx* ctx, const uint8_t* data, size_t len) {
if (!ctx || (!data && len != 0)) return HTP_EINVAL;
if (ctx->finished) {
set_err(ctx, "cannot feed after finish");
return HTP_ESTATE;
}
size_t i = 0;
while (i < len) {
size_t start = i;
while (i < len && data[i] != '\n') i++;
if (i < len && data[i] == '\n') {
/* complete line in [start, i) */
if (ctx->in_len == 0) {
htp_status st = normalize_record(ctx, data + start, i - start);
if (st != HTP_OK) return st;
} else {
/* append and normalize from accumulator */
htp_status st = ensure_cap(&ctx->in_buf, &ctx->in_cap, ctx->in_len + (i - start));
if (st != HTP_OK) return st;
memcpy(ctx->in_buf + ctx->in_len, data + start, i - start);
ctx->in_len += (i - start);
st = normalize_record(ctx, ctx->in_buf, ctx->in_len);
if (st != HTP_OK) return st;
ctx->in_len = 0;
}
i++; /* skip \n */
} else {
/* trailing partial */
size_t part = len - start;
htp_status st = ensure_cap(&ctx->in_buf, &ctx->in_cap, ctx->in_len + part);
if (st != HTP_OK) return st;
memcpy(ctx->in_buf + ctx->in_len, data + start, part);
ctx->in_len += part;
break;
}
}
return HTP_OK;
}
htp_status htp_finish(htp_ctx* ctx) {
if (!ctx) return HTP_EINVAL;
ctx->finished = 1;
if (ctx->in_len == 0) return HTP_OK;
/* allow final record without newline */
htp_status st = normalize_record(ctx, ctx->in_buf, ctx->in_len);
if (st != HTP_OK) return st;
ctx->in_len = 0;
return HTP_OK;
}
htp_status htp_drain(htp_ctx* ctx, uint8_t* out, size_t out_cap, size_t* written) {
if (!ctx || !written || (!out && out_cap != 0)) return HTP_EINVAL;
size_t avail = (ctx->out_w >= ctx->out_r) ? (ctx->out_w - ctx->out_r) : 0;
size_t n = (avail < out_cap) ? avail : out_cap;
if (n > 0) memcpy(out, ctx->out_buf + ctx->out_r, n);
ctx->out_r += n;
*written = n;
return HTP_OK;
}
const char* htp_last_error(htp_ctx* ctx) {
if (!ctx) return "";
return ctx->last_err;
}
Build Artifacts: One Core, Many Packages
Compiling the C Core as a Shared Library
For multi-language distribution, you typically want a shared library (.so, .dylib, .dll) plus headers for native builds. Keep the exported symbol set minimal and stable.
# Linux example
cc -O3 -fPIC -shared -o libhtp_core.so htp_core.c
# macOS example
cc -O3 -fPIC -dynamiclib -o libhtp_core.dylib htp_core.c
In a real project, use a build system (CMake/Meson) to produce consistent outputs, set symbol visibility, and generate platform-specific names. The important capstone takeaway is to treat the C core as the single source of truth for behavior and to keep the ABI narrow.
Python Binding (ctypes): Zero-Copy Inputs, Buffered Outputs
Binding Strategy
Using ctypes keeps the example dependency-light. The binding wraps the opaque handle, converts Python bytes/bytearray/memoryview to a pointer+length, and provides a generator-like interface that yields normalized output chunks.
# htp_py.py
import ctypes as C
lib = C.CDLL("./libhtp_core.so")
class Ctx(C.Structure):
pass
lib.htp_create.restype = C.POINTER(Ctx)
lib.htp_destroy.argtypes = [C.POINTER(Ctx)]
lib.htp_feed.argtypes = [C.POINTER(Ctx), C.POINTER(C.c_uint8), C.c_size_t]
lib.htp_feed.restype = C.c_int
lib.htp_finish.argtypes = [C.POINTER(Ctx)]
lib.htp_finish.restype = C.c_int
lib.htp_drain.argtypes = [C.POINTER(Ctx), C.POINTER(C.c_uint8), C.c_size_t, C.POINTER(C.c_size_t)]
lib.htp_drain.restype = C.c_int
lib.htp_last_error.argtypes = [C.POINTER(Ctx)]
lib.htp_last_error.restype = C.c_char_p
class HtpError(Exception):
pass
class Normalizer:
def __init__(self):
self._ctx = lib.htp_create()
if not self._ctx:
raise MemoryError("htp_create failed")
self._out = (C.c_uint8 * 65536)()
def close(self):
if self._ctx:
lib.htp_destroy(self._ctx)
self._ctx = None
def _check(self, st):
if st != 0:
msg = lib.htp_last_error(self._ctx).decode("utf-8", "replace")
raise HtpError(f"status={st}: {msg}")
def feed(self, data):
mv = memoryview(data)
if not mv.contiguous:
mv = mv.tobytes()
ptr = C.cast(C.c_void_p(C.addressof(C.c_char.from_buffer(mv))), C.POINTER(C.c_uint8))
self._check(lib.htp_feed(self._ctx, ptr, mv.nbytes))
def finish(self):
self._check(lib.htp_finish(self._ctx))
def drain(self):
written = C.c_size_t(0)
while True:
self._check(lib.htp_drain(self._ctx, self._out, len(self._out), C.byref(written)))
n = written.value
if n == 0:
break
yield bytes(self._out[:n])
Step-by-step usage in Python: create the wrapper, feed chunks, drain periodically, then finish and drain remaining output.
n = Normalizer()
try:
n.feed(b"a= 1\n")
n.feed(b"b=two")
for chunk in n.drain():
print(chunk)
n.finish()
for chunk in n.drain():
print(chunk)
finally:
n.close()
Ruby Binding (FFI): Idiomatic Wrapper with Explicit Lifecycle
Binding Strategy
Ruby’s ffi gem can bind to the shared library without compiling a native extension. The wrapper should manage the handle lifecycle and expose a simple API. Because Ruby strings are mutable and have encodings, treat inputs as binary (ASCII-8BIT) and pass pointers safely.
# htp_rb.rb
require 'ffi'
module HTP
extend FFI::Library
ffi_lib './libhtp_core.so'
class Ctx < FFI::Struct
end
attach_function :htp_create, [], :pointer
attach_function :htp_destroy, [:pointer], :void
attach_function :htp_feed, [:pointer, :pointer, :size_t], :int
attach_function :htp_finish, [:pointer], :int
attach_function :htp_drain, [:pointer, :pointer, :size_t, :pointer], :int
attach_function :htp_last_error, [:pointer], :string
class Error < StandardError; end
class Normalizer
def initialize
@ctx = HTP.htp_create
raise NoMemoryError, 'htp_create failed' if @ctx.null?
@out = FFI::MemoryPointer.new(:uint8, 65536)
ObjectSpace.define_finalizer(self, self.class.finalize(@ctx))
end
def self.finalize(ctx)
proc { HTP.htp_destroy(ctx) unless ctx.null? }
end
def close
return if @ctx.nil? || @ctx.null?
HTP.htp_destroy(@ctx)
@ctx = FFI::Pointer::NULL
end
def check(st)
return if st == 0
raise Error, "status=#{st}: #{HTP.htp_last_error(@ctx)}"
end
def feed(str)
s = str.dup
s.force_encoding(Encoding::BINARY)
ptr = FFI::MemoryPointer.from_string(s)
# from_string adds a NUL terminator; pass explicit length without it
check(HTP.htp_feed(@ctx, ptr, s.bytesize))
end
def finish
check(HTP.htp_finish(@ctx))
end
def drain
chunks = []
written = FFI::MemoryPointer.new(:size_t)
loop do
check(HTP.htp_drain(@ctx, @out, 65536, written))
n = written.read_size_t
break if n == 0
chunks << @out.get_bytes(0, n)
end
chunks
end
end
end
Step-by-step usage in Ruby: feed binary strings, drain to an array of chunks, and close explicitly in long-running processes.
n = HTP::Normalizer.new
n.feed("a= 1\n")
n.feed("b=two")
puts n.drain
n.finish
puts n.drain
n.close
Java Binding (JNI): DirectByteBuffer for Efficient Transfer
Binding Strategy
For Java, JNI is the common baseline. The binding uses a long field to store the native pointer. For IO-like usage, accept ByteBuffer inputs (preferably direct buffers) and write output into a caller-provided direct buffer. This avoids per-call array copies when the caller can supply direct buffers.
Java API Skeleton
// HtpNormalizer.java
public final class HtpNormalizer implements AutoCloseable {
static { System.loadLibrary("htp_core_jni"); }
private long ctx;
public HtpNormalizer() {
this.ctx = nativeCreate();
if (this.ctx == 0) throw new OutOfMemoryError("nativeCreate failed");
}
private static native long nativeCreate();
private static native void nativeDestroy(long ctx);
public native void reset();
public native void feed(java.nio.ByteBuffer in, int len);
public native void finish();
public native int drain(java.nio.ByteBuffer out); // returns bytes written
private native String lastError();
@Override
public void close() {
if (ctx != 0) {
nativeDestroy(ctx);
ctx = 0;
}
}
void checkStatus(int st) {
if (st != 0) throw new RuntimeException("status=" + st + ": " + lastError());
}
}
JNI C Implementation Skeleton
/* htp_core_jni.c */
#include "htp_core.h"
#include <jni.h>
JNIEXPORT jlong JNICALL Java_HtpNormalizer_nativeCreate(JNIEnv* env, jclass cls) {
(void)env; (void)cls;
htp_ctx* ctx = htp_create();
return (jlong)(uintptr_t)ctx;
}
JNIEXPORT void JNICALL Java_HtpNormalizer_nativeDestroy(JNIEnv* env, jclass cls, jlong p) {
(void)env; (void)cls;
htp_destroy((htp_ctx*)(uintptr_t)p);
}
static htp_ctx* get_ctx(JNIEnv* env, jobject self) {
jclass c = (*env)->GetObjectClass(env, self);
jfieldID f = (*env)->GetFieldID(env, c, "ctx", "J");
jlong p = (*env)->GetLongField(env, self, f);
return (htp_ctx*)(uintptr_t)p;
}
JNIEXPORT void JNICALL Java_HtpNormalizer_feed(JNIEnv* env, jobject self, jobject inBuf, jint len) {
htp_ctx* ctx = get_ctx(env, self);
uint8_t* in = (uint8_t*)(*env)->GetDirectBufferAddress(env, inBuf);
if (!in) {
/* Fallback could copy from heap ByteBuffer; omitted here for brevity */
return;
}
int st = htp_feed(ctx, in, (size_t)len);
if (st != HTP_OK) {
const char* msg = htp_last_error(ctx);
jclass ex = (*env)->FindClass(env, "java/lang/RuntimeException");
(*env)->ThrowNew(env, ex, msg);
}
}
JNIEXPORT jint JNICALL Java_HtpNormalizer_drain(JNIEnv* env, jobject self, jobject outBuf) {
htp_ctx* ctx = get_ctx(env, self);
uint8_t* out = (uint8_t*)(*env)->GetDirectBufferAddress(env, outBuf);
jlong cap = (*env)->GetDirectBufferCapacity(env, outBuf);
size_t written = 0;
int st = htp_drain(ctx, out, (size_t)cap, &written);
if (st != HTP_OK) {
const char* msg = htp_last_error(ctx);
jclass ex = (*env)->FindClass(env, "java/lang/RuntimeException");
(*env)->ThrowNew(env, ex, msg);
return 0;
}
return (jint)written;
}
Step-by-step usage in Java: allocate direct buffers, feed input, drain output in a loop. The caller controls buffer sizes and can integrate with NIO channels.
try (HtpNormalizer n = new HtpNormalizer()) {
java.nio.ByteBuffer in = java.nio.ByteBuffer.allocateDirect(64 * 1024);
java.nio.ByteBuffer out = java.nio.ByteBuffer.allocateDirect(64 * 1024);
in.put("a= 1\n".getBytes(java.nio.charset.StandardCharsets.US_ASCII));
in.flip();
n.feed(in, in.remaining());
out.clear();
int w = n.drain(out);
out.limit(w);
// consume out
n.finish();
}
Packaging and Compatibility: Keeping the Multi-Language Surface Stable
ABI Version Checks in Bindings
Because you will ship the same C core to multiple ecosystems, add a lightweight runtime check: bindings call htp_get_info and verify abi_version. This prevents subtle crashes when a binding expects a different ABI.
/* Example C call pattern from bindings: htp_get_info().abi_version == 1 */Error Mapping and Diagnostics
Keep the C status codes stable and small. Bindings should map them to exceptions, but preserve the original status integer and the C error message. In practice, you will want to include enough context in last_err to debug malformed input (for example, record number, byte offset, or a short snippet). The capstone pattern is: C core stores the last error message in the context; bindings fetch it immediately when a call fails.
Threading Expectations
Make the threading contract explicit in the C API documentation: a single htp_ctx is not thread-safe; callers may create one context per thread. Bindings should not share a single instance across threads unless they add their own synchronization. This keeps the C core simple and avoids surprising contention.
Integration Pattern: A Unified Streaming Adapter in Each Language
To make the component easy to adopt, each binding should expose the same conceptual interface even if the idioms differ: feed(bytes), finish(), and drain(). This lets you write similar pipeline stages across Python, Ruby, and Java, and it makes cross-language test vectors reusable: the same input chunks should produce the same output bytes regardless of the host language.
A practical step-by-step approach to integrating this component into a larger system is: (1) define the C ABI and freeze it early, (2) implement the C core with a streaming state machine and deterministic output, (3) write a small “golden vectors” file of input chunks and expected output, (4) implement bindings that pass those vectors, (5) package the shared library per platform and ensure the bindings locate it reliably at runtime, (6) add a compatibility check via htp_get_info so mismatches fail fast.