Why Data Representation Dominates Performance (and Safety)
Once you have a correct algorithm and you are not wasting time on avoidable allocations, the next large lever is how your data is represented in memory and how it is laid out for access. “Representation” means the types you choose (integers vs floats, strings vs bytes, structs vs maps) and the invariants you encode (ranges, nullability, ownership). “Layout” means how those values are arranged: contiguous arrays vs pointer-linked nodes, packed vs padded structs, row-major vs columnar storage, and whether you store “array of structs” (AoS) or “struct of arrays” (SoA). These choices affect CPU cache behavior, branch prediction, vectorization, and the amount of work needed to validate and safely manipulate data.
This chapter focuses on patterns you can apply across Python, Ruby, Java, and C to make data access fast and predictable while also reducing the surface area for bugs (overflow, invalid states, encoding errors, and unsafe aliasing). We will avoid rehashing benchmarking, profiling, memory models, allocation behavior, and garbage collection pressure; instead, we will concentrate on representation and layout decisions that stand on their own.
Core Principles
1) Prefer Contiguous, Homogeneous Storage for Hot Data
Modern CPUs are optimized for sequential access. When your hot loop walks contiguous memory, you get fewer cache misses and better prefetching. When your data is homogeneous (same type/size), the runtime and JIT can optimize more aggressively, and you can often use SIMD/vector operations.
- Good: arrays of primitives (C arrays, Java
int[], Pythonarray/memoryview, Ruby packed strings used as byte buffers). - Risky for speed: lists/arrays of objects where each element is a pointer to separately allocated objects (Python list of ints, Ruby Array of Integers, Java
Integer[]).
2) Encode Invariants in Types and Layout
Safety improves when invalid states are unrepresentable or at least hard to construct. Examples: using unsigned types for non-negative values, using fixed-width integers for protocol fields, using enums for small sets of states, and using length-prefixed buffers to avoid scanning for terminators.
- In C, this means careful use of
uint32_t,size_t, and structs with explicit fields. - In Java, it means preferring primitives and using
recordor small immutable classes to preserve invariants. - In Python and Ruby, it means using
dataclasses/Struct, validating at boundaries, and using bytes-like objects for binary data.
3) Separate “Hot” Fields from “Cold” Fields
Many objects contain a mix of frequently accessed fields (hot) and rarely used metadata (cold). If you store them together, every access drags cold bytes into cache. A common pattern is to store hot fields in tight arrays and keep cold data in side tables keyed by index or ID.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
This is a layout decision that can be applied even in high-level languages: you can keep parallel arrays (SoA) for hot numeric fields and a dictionary/hash for occasional metadata.
4) Choose AoS vs SoA Intentionally
AoS (array of structs) stores each record contiguously: [x,y,z][x,y,z].... SoA (struct of arrays) stores each field contiguously: [x,x,x...] [y,y,y...] [z,z,z...]. AoS is convenient and good when you touch most fields per element. SoA is often faster when you process one or two fields across many elements (vectorization, fewer cache lines).
Practical Pattern: AoS vs SoA for Coordinate Updates
Consider updating positions with velocities for many particles. If your update loop touches only x and vx (and similarly for y), SoA can be faster.
C Example
AoS:
typedef struct { float x, y; float vx, vy; } Particle; void step_aos(Particle* p, size_t n, float dt) { for (size_t i = 0; i < n; i++) { p[i].x += p[i].vx * dt; p[i].y += p[i].vy * dt; } }SoA:
typedef struct { float* x; float* y; float* vx; float* vy; } Particles; void step_soa(Particles p, size_t n, float dt) { for (size_t i = 0; i < n; i++) { p.x[i] += p.vx[i] * dt; p.y[i] += p.vy[i] * dt; } }Step-by-step decision process:
- List the fields used in the hot loop (here: all four fields, but accessed in a streaming way).
- Estimate whether you often process subsets of fields (e.g., sometimes only
xfor projections). If yes, SoA tends to win. - Check alignment and padding in AoS: structs may include padding that increases memory bandwidth needs.
- In C, consider whether SoA enables vectorization (compiler can more easily auto-vectorize contiguous arrays).
Java Example
Java AoS often becomes “array of object references,” which is not AoS in memory. If you write Particle[] where Particle is a class, you get an array of pointers to heap objects. For hot numeric loops, prefer SoA with primitive arrays:
final class Particles { final float[] x, y, vx, vy; Particles(int n) { x = new float[n]; y = new float[n]; vx = new float[n]; vy = new float[n]; } void step(float dt) { for (int i = 0; i < x.length; i++) { x[i] += vx[i] * dt; y[i] += vy[i] * dt; } } }Safety note: keep arrays the same length and encapsulate them so callers cannot accidentally mismatch indices. Expose methods that operate on an index rather than exposing raw arrays.
Python Example
Python lists store references, so a list of floats is not contiguous float storage. For numeric hot paths, use array, memoryview, or a library that provides packed numeric arrays. Here is a standard-library approach using array:
from array import array def step(x, y, vx, vy, dt): for i in range(len(x)): x[i] += vx[i] * dt y[i] += vy[i] * dt # setup n = 100000 x = array('f', [0.0]) * n y = array('f', [0.0]) * n vx = array('f', [1.0]) * n vy = array('f', [1.0]) * n step(x, y, vx, vy, 0.016)Step-by-step: (1) identify numeric fields, (2) replace list-of-floats with array('f') for packed floats, (3) keep arrays encapsulated in a class to preserve invariants, (4) use memoryview if you need slicing without copying.
Ruby Example
Ruby Arrays also store references. For packed numeric storage, Ruby does not have a built-in primitive array like Java, but you can represent numeric data in a packed binary string and unpack when needed, or use String as a byte buffer for fixed-width fields. For floats, packing/unpacking each iteration is expensive; the pattern is most useful for binary protocols and IO buffers. For compute-heavy numeric loops, Ruby typically relies on native extensions or specialized libraries, but the representation principle still applies: keep hot data in contiguous bytes when you can.
Binary Data: Bytes, Endianness, and Bounds Safety
When dealing with file formats, network protocols, or serialization, representation choices determine both speed and safety. Parsing text (JSON, CSV) is flexible but expensive. Parsing binary with fixed-width fields is fast but demands careful bounds checks and endianness handling.
Step-by-step: Designing a Safe Binary Record Layout
- Define a fixed header with a magic number and version to reject incompatible data early.
- Use fixed-width integer types for fields with known ranges (e.g.,
uint32for counts). - Store lengths explicitly (length-prefixed) rather than relying on terminators.
- Decide endianness (often little-endian) and document it.
- Validate all lengths before reading payloads to avoid out-of-bounds reads and denial-of-service.
C: Parsing with Explicit Bounds
#include <stdint.h> #include <string.h> typedef struct { uint32_t len; uint32_t type; } Header; int parse_record(const uint8_t* buf, size_t n, Header* out, const uint8_t** payload) { if (n < sizeof(Header)) return 0; Header h; memcpy(&h, buf, sizeof(Header)); // If data is little-endian on the wire, convert here if needed. if (h.len > n - sizeof(Header)) return 0; *out = h; *payload = buf + sizeof(Header); return 1; }Safety points: never cast a pointer to a struct and dereference it unless you are sure about alignment and packing; use memcpy to avoid undefined behavior. Always validate len against remaining buffer size before computing pointers.
Java: ByteBuffer with Explicit Order
import java.nio.ByteBuffer; import java.nio.ByteOrder; final class Record { final int len; final int type; final ByteBuffer payload; Record(int len, int type, ByteBuffer payload) { this.len = len; this.type = type; this.payload = payload; } } static Record parse(ByteBuffer buf) { buf.order(ByteOrder.LITTLE_ENDIAN); if (buf.remaining() < 8) return null; int len = buf.getInt(); int type = buf.getInt(); if (len < 0 || len > buf.remaining()) return null; ByteBuffer payload = buf.slice(); payload.limit(len); buf.position(buf.position() + len); return new Record(len, type, payload); }Speed and safety: ByteBuffer gives bounds checks and avoids manual pointer arithmetic. Using slice() creates a view without copying, but you must manage positions carefully to avoid confusing state; encapsulate parsing in one place.
Python: struct + memoryview for Zero-Copy Slices
import struct def parse(buf: bytes): mv = memoryview(buf) if len(mv) < 8: return None len_, type_ = struct.unpack_from('<II', mv, 0) if len_ > len(mv) - 8: return None payload = mv[8:8+len_] return (len_, type_, payload)memoryview allows slicing without copying. struct.unpack_from reads from the buffer directly. Always validate len_ before slicing.
Ruby: unpack with Length Checks
def parse(buf) return nil if buf.bytesize < 8 len, type = buf.unpack('V2') # little-endian uint32 return nil if len > buf.bytesize - 8 payload = buf.byteslice(8, len) [len, type, payload] endUse byteslice to avoid encoding surprises and to operate on raw bytes. Validate lengths before slicing.
Strings vs Bytes: Encoding as a Representation Choice
Text handling is a frequent source of both performance issues and security bugs. The key representation decision is whether your data is “text” (characters) or “bytes” (opaque). Mixing them leads to repeated transcoding, incorrect length calculations, and vulnerabilities in boundary checks.
- Use bytes for protocol payloads, hashes, compressed data, and encrypted blobs.
- Decode to text at the boundary where you need to interpret characters, and validate encoding once.
- When you need to count bytes (e.g., network limits), count bytes, not characters.
Python Boundary Pattern
def handle_request(raw: bytes): # raw is bytes from socket; keep as bytes for parsing headers # decode only the specific header value that is defined as UTF-8 name_bytes = extract_name_field(raw) name = name_bytes.decode('utf-8', errors='strict') validate_name(name) return nameRuby Boundary Pattern
def handle_request(raw) raw = raw.b # force ASCII-8BIT (bytes) name_bytes = extract_name_field(raw) name = name_bytes.force_encoding('UTF-8') raise 'bad encoding' unless name.valid_encoding? validate_name(name) name endIn Ruby, .b and ASCII-8BIT help keep data as bytes. Convert to UTF-8 only when you need text semantics, and validate encoding explicitly.
Bit Packing and Field Compression (When It’s Worth It)
Sometimes the fastest representation is smaller, because it reduces memory bandwidth and improves cache residency. Bit packing can store multiple small fields in one machine word. This can be a win when you process huge arrays of records and the fields have small ranges (flags, small enums, bounded counters). The trade-off is complexity and the risk of bugs in masking/shifting.
Step-by-step: Safe Bit Packing
- Write down the range of each field and compute required bits.
- Allocate bit positions and document them as constants.
- Provide accessor functions (get/set) rather than open-coded shifts everywhere.
- Validate inputs before packing (range checks).
- In higher-level languages, keep packed representation internal and expose a clear API.
C Example: Packing Two 12-bit Values and 8 Flags
#include <stdint.h> enum { A_BITS = 12, B_BITS = 12, F_BITS = 8 }; enum { A_SHIFT = 0, B_SHIFT = A_SHIFT + A_BITS, F_SHIFT = B_SHIFT + B_BITS }; static inline uint32_t pack(uint32_t a, uint32_t b, uint32_t f) { if (a >= (1u << A_BITS)) return 0; if (b >= (1u << B_BITS)) return 0; if (f >= (1u << F_BITS)) return 0; return (a << A_SHIFT) | (b << B_SHIFT) | (f << F_SHIFT); } static inline uint32_t get_a(uint32_t x) { return (x >> A_SHIFT) & ((1u << A_BITS) - 1u); }Safety: range checks prevent silent truncation. Returning 0 on error is not always safe if 0 is a valid value; consider an out-parameter with a success flag.
Java Example: Packed int with Accessors
final class Packed { static final int A_BITS = 12, B_BITS = 12, F_BITS = 8; static final int A_MASK = (1 << A_BITS) - 1; static final int B_MASK = (1 << B_BITS) - 1; static int pack(int a, int b, int f) { if ((a & ~A_MASK) != 0) throw new IllegalArgumentException(); if ((b & ~B_MASK) != 0) throw new IllegalArgumentException(); if ((f & ~((1 << F_BITS) - 1)) != 0) throw new IllegalArgumentException(); return a | (b << A_BITS) | (f << (A_BITS + B_BITS)); } static int a(int x) { return x & A_MASK; } }Java’s defined shift behavior and exceptions make it easier to keep safe. Keep these operations centralized to avoid scattered bit logic.
Struct Layout, Padding, and Alignment (C) and Their Cross-Language Echo
In C, struct layout is a direct performance and correctness concern. Field order can introduce padding; misaligned access can be slower or even fault on some platforms. Even if you are not writing C, understanding these effects helps when you define binary formats consumed by multiple languages.
Step-by-step: Minimizing Padding
- Group fields by decreasing alignment requirement (e.g., 8-byte, then 4-byte, then 2-byte, then 1-byte).
- Use fixed-width types for on-disk/on-wire formats (
uint32_t,uint16_t). - Do not rely on compiler-specific packing for external formats; define serialization explicitly.
C Example: Field Reordering
typedef struct { uint8_t tag; uint64_t id; uint32_t count; } Bad; typedef struct { uint64_t id; uint32_t count; uint8_t tag; } Better;Bad likely inserts padding after tag to align id. Better reduces padding by placing the 8-byte field first.
Cross-language note: if you serialize struct bytes directly, other languages may not match your padding/alignment. Prefer explicit serialization (write each field in a defined endianness) rather than dumping raw structs.
Nullability and Sentinel Values
Representing “missing” values is both a safety and performance decision. Options include: nullable references, sentinel values (like -1), separate validity bitsets, or separate arrays for present values. Each has trade-offs.
- Nullable references are convenient but can add branches and risk
NullPointerException/AttributeError. - Sentinels are fast but dangerous if the sentinel can collide with valid data.
- Validity bitsets (one bit per element) are compact and can be processed efficiently; they also make missingness explicit.
Java: Sentinel vs Optional
Using Optional in hot data structures can add overhead. For dense numeric arrays, a sentinel or a parallel boolean array is often clearer and faster:
final class IntColumn { final int[] values; final boolean[] present; IntColumn(int n) { values = new int[n]; present = new boolean[n]; } int getOrDefault(int i, int def) { return present[i] ? values[i] : def; } }Safety: the presence array prevents accidental use of uninitialized values and makes missingness explicit without relying on a magic number.
Python/Ruby: Explicit Missingness
In Python and Ruby, None/nil are common, but in hot loops they add branches and type checks. For numeric columns, consider parallel arrays: one packed numeric array plus a bytearray/String bitmask for presence. Keep the representation internal and provide methods that hide the complexity.
Copying vs Views: Representing Slices Safely
Another representation choice is whether a “slice” is a copy or a view. Views avoid copying and can be faster, but they require careful lifetime and mutation rules.
- Java
ByteBuffer.slice()is a view: changes affect the same underlying buffer. - Python
memoryviewis a view: it keeps the underlying object alive and can expose writable access. - Ruby
byteslicereturns a new string in many cases (implementation-dependent), but it is still safer to treat slices as separate values unless you know the sharing behavior. - C pointer+length pairs are views; you must ensure the backing storage remains valid.
Step-by-step: Safe View Design
- Represent a view as (pointer, length) or (buffer, offset, length) rather than relying on terminators.
- Make mutability explicit: provide separate APIs for read-only vs writable views.
- Ensure the backing storage outlives the view (ownership/lifetime rule).
- Validate offsets and lengths at creation time, not repeatedly in inner loops.
C Example: Slice Type
typedef struct { const uint8_t* p; size_t n; } Slice; int slice_from(const uint8_t* buf, size_t n, size_t off, size_t len, Slice* out) { if (off > n) return 0; if (len > n - off) return 0; out->p = buf + off; out->n = len; return 1; }This centralizes bounds checks and makes it harder to accidentally compute an out-of-range pointer.
Choosing Representations at Module Boundaries
A reliable way to keep both speed and safety is to standardize representations at boundaries: parse/validate once, then operate on a compact internal form.
Step-by-step Boundary Workflow
- At input: accept flexible external forms (JSON/text, user objects), but immediately validate and convert.
- Internally: store in a representation optimized for access (packed arrays, IDs instead of strings, normalized encoding).
- At output: convert back to external form only when needed.
Examples of internal normalization:
- Replace repeated strings with integer IDs (dictionary encoding) and store IDs in an
intarray. - Store timestamps as integer epoch nanoseconds/milliseconds rather than strings.
- Store booleans as bitsets when you have millions of them.
Python Example: Dictionary Encoding for Repeated Strings
def encode(values): # values: list[str] table = {} ids = [] for s in values: i = table.get(s) if i is None: i = len(table) table[s] = i ids.append(i) return table, idsEven without specialized libraries, this reduces repeated string comparisons in later stages and makes hot loops operate on integers.
Java Example: Intern Table + int[]
import java.util.*; final class DictEncoded { final Map<String,Integer> dict = new HashMap<>(); final ArrayList<String> rev = new ArrayList<>(); int id(String s) { Integer i = dict.get(s); if (i != null) return i; int nid = rev.size(); rev.add(s); dict.put(s, nid); return nid; } }Store the resulting IDs in an int[] for processing. Safety: keep the mapping encapsulated so IDs are always valid indices into rev.
Checklist: Representation and Layout Decisions You Can Apply Immediately
- Hot numeric data: use contiguous primitive storage (C arrays, Java primitive arrays, Python
array/memoryview, Ruby byte buffers for binary). - AoS vs SoA: choose based on access pattern; SoA often wins for columnar processing.
- Binary formats: fixed-width fields, explicit endianness, length-prefixed payloads, validate lengths before slicing.
- Text vs bytes: keep bytes as bytes; decode once at boundaries with strict validation.
- Missing values: avoid ambiguous sentinels; consider parallel presence arrays/bitsets.
- Views: represent slices as (buffer, offset, length); validate once; make mutability explicit.
- Struct padding (C): reorder fields to reduce padding; never serialize raw structs for cross-language interchange.