Free Ebook cover Polyglot Performance Patterns: Writing Fast, Safe Code Across Python, Ruby, Java, and C

Polyglot Performance Patterns: Writing Fast, Safe Code Across Python, Ruby, Java, and C

New course

17 pages

Garbage Collection Pressure and Allocation-Aware Design

Capítulo 5

Estimated reading time: 0 minutes

+ Exercise

Why GC Pressure Matters in Real Systems

Garbage collection (GC) pressure is the operational cost you impose on a managed runtime by creating, retaining, and discarding objects in ways that force the collector to work harder or more often than necessary. Even when total heap usage looks “fine,” GC pressure can still be high if your program allocates at a high rate, creates many short-lived objects, or triggers frequent promotions into older generations. The visible symptoms are often latency spikes, throughput drops, CPU time spent in GC, and unpredictable tail behavior under load.

Allocation-aware design is the practice of shaping APIs, data structures, and hot-path code so that you allocate less, allocate more predictably, and retain objects only as long as needed—without sacrificing safety or clarity. This chapter focuses on design patterns that reduce GC work across Python, Ruby, Java, and C (where “GC pressure” maps to allocator pressure and fragmentation). The goal is not to avoid allocation at all costs, but to make allocation a deliberate choice in the parts of the system where it matters.

GC Pressure vs. “Memory Usage”

It is common to equate “high memory usage” with “GC problems,” but GC pressure is more about churn and retention patterns than raw size. A service can use a large but stable heap and run smoothly, while another uses a smaller heap but allocates aggressively and triggers constant collections. Think of pressure as the rate at which you force the runtime to scan, copy, mark, compact, or sweep objects, plus the overhead of maintaining metadata (write barriers, remembered sets, card tables, etc.).

  • High allocation rate: many objects created per request or per iteration.
  • High temporary object count: intermediate strings, arrays, iterators, lambdas/closures, boxed numbers.
  • Promotion pressure: objects that survive young-generation collections and move to older regions, increasing later GC cost.
  • Heap fragmentation / allocator contention: especially relevant in C and also in runtimes with native allocations (Python/Ruby extensions, Java direct buffers).

Allocation-Aware Design: A Practical Workflow

The most reliable way to reduce GC pressure is to treat allocation as a first-class design constraint in hot paths. The following workflow is intentionally mechanical so you can apply it repeatedly.

Step 1: Identify Hot Paths and Allocation Hotspots

You already know how to measure and profile; here we focus on what to look for once you have a suspect code path. In allocation-aware design, the “hot path” is not only CPU-hot but allocation-hot: code that creates many objects per unit of work.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

  • Request parsing and validation
  • Serialization/deserialization (JSON, XML, protobuf wrappers)
  • String formatting and logging
  • Collection transformations (map/filter/flatMap chains)
  • Per-element callbacks in tight loops

Step 2: Classify Allocations by Necessity

For each hotspot, classify allocations into:

  • Essential: objects that represent real domain data that must exist (e.g., a parsed record).
  • Incidental: objects created as byproducts of convenience (temporary lists, substrings, wrapper objects, iterators).
  • Accidental: allocations caused by API misuse (boxing, repeated conversions, copying when a view would do).

Your biggest wins typically come from eliminating incidental and accidental allocations while keeping essential allocations understandable and safe.

Step 3: Choose a Strategy

Common strategies include:

  • Reuse: keep and reset objects instead of re-creating them (buffers, builders, arrays).
  • Batch: process in chunks to reduce per-item overhead and amortize allocations.
  • Stream: avoid materializing intermediate collections; produce results incrementally.
  • Represent differently: use primitives/arrays/structs instead of object graphs; use indices instead of substrings.
  • Move allocation boundaries: allocate once per request instead of per element; allocate at startup instead of per call.

Patterns That Reduce GC Pressure (Cross-Language)

1) Avoid Intermediate Collections in Transform Pipelines

Chaining transformations is expressive but often allocates intermediate arrays/lists and iterator objects. Prefer single-pass loops in hot paths.

Python: list comprehensions allocate; generator expressions can reduce intermediate lists but still allocate generator frames and may be slower if you later materialize anyway. For hot loops, a single explicit loop can minimize temporaries.

# Less allocation-friendly: creates intermediate lists twice if you wrap with list() later  filtered = [x for x in xs if x > 0]  squared = [x*x for x in filtered]  # More allocation-aware: one pass, one output list  squared = []  append = squared.append  for x in xs:      if x > 0:          append(x*x)

Ruby: chained map/select creates arrays each step. Prefer each with manual push, or use filter_map (Ruby 2.7+) to combine filter and map in one pass.

# Allocation-heavy  squared = xs.select { |x| x > 0 }.map { |x| x * x }  # Allocation-aware  squared = []  xs.each do |x|    next unless x > 0    squared << x * x  end  # Or: squared = xs.filter_map { |x| x > 0 ? x * x : nil }

Java: streams can allocate lambdas, spliterators, and intermediate objects; they can be fine, but in tight loops prefer for-loops and primitive arrays or IntStream to avoid boxing.

// Allocation-heavy: boxing if using Stream<Integer>  List<Integer> out = xs.stream()      .filter(x -> x > 0)      .map(x -> x * x)      .toList();  // Allocation-aware: single pass  int[] out = new int[countPositives(xs)];  int j = 0;  for (int x : xs) {      if (x > 0) out[j++] = x * x;  }

C: the analog is avoiding repeated heap allocations in loops; preallocate output buffers and write into them.

size_t j = 0;  for (size_t i = 0; i < n; i++) {      int x = xs[i];      if (x > 0) out[j++] = x * x;  }

2) Prefer Builders/Buffers Over Repeated Concatenation

String concatenation in loops often creates many transient strings. Use a builder/buffer pattern.

Python: repeated += on strings in loops creates new strings; prefer list accumulation and ''.join, or io.StringIO for streaming writes.

parts = []  append = parts.append  for item in items:      append(item.name)      append(':')      append(str(item.value))      append('\n')  text = ''.join(parts)

Ruby: use a single mutable string with << (which mutates) rather than + (which allocates a new string).

s = +''  items.each do |item|    s << item.name << ':' << item.value.to_s << "\n"  end

Java: use StringBuilder (or StringBuffer if synchronized is required). Avoid + in loops.

StringBuilder sb = new StringBuilder(1024);  for (Item item : items) {      sb.append(item.name()).append(':').append(item.value()).append('\n');  }  String text = sb.toString();

C: use a preallocated char buffer with tracked length, or a growable buffer strategy that doubles capacity.

size_t len = 0;  for (size_t i = 0; i < n; i++) {      len += snprintf(buf + len, cap - len, "%s:%d\n", items[i].name, items[i].value);  }

3) Control Object Shape: Flatten Data Where It Pays Off

Object graphs (objects containing objects containing lists of objects) are convenient but increase GC scanning and pointer chasing. In hot paths, consider flatter representations that reduce object count and improve locality.

  • Java: prefer primitive arrays (int[], long[]) over List<Integer> to avoid boxing and per-element object overhead.
  • Python: consider array module, memoryview, or struct packing for numeric-heavy workloads; for general records, __slots__ can reduce per-instance overhead and incidental allocations for attribute dictionaries.
  • Ruby: prefer arrays/hashes carefully; avoid creating many tiny objects for numeric-heavy tasks; consider packing into strings (binary) only when it’s a clear win and you can keep code safe.
  • C: use structs-of-arrays or arrays-of-structs depending on access patterns; avoid per-element heap allocation.

Flattening is a trade-off: it can reduce GC pressure but may reduce readability. Apply it only where allocation churn is a proven cost.

4) Reuse Scratch Space (But Keep It Safe)

Many algorithms need temporary buffers: parsing, encoding, sorting, hashing, formatting. Allocating these buffers per call creates churn. Reuse can be done per-thread, per-request, or via object pools. The key is to avoid accidental sharing and to reset state correctly.

Python: reuse bytearray or io.BytesIO objects within a request scope. Avoid global mutable buffers unless you have strict single-threaded guarantees.

def encode_records(records):      buf = bytearray()      out = []      for r in records:          buf.clear()          buf.extend(r.id.to_bytes(8, 'big'))          buf.extend(r.payload)          out.append(bytes(buf))  # one allocation for the final bytes      return out

Ruby: reuse a mutable string buffer and call clear. Be careful to duplicate when storing results, because later mutations would otherwise affect stored references.

buf = +''  out = []  records.each do |r|    buf.clear    buf << [r.id].pack('Q>')    buf << r.payload    out << buf.dup  end

Java: reuse StringBuilder or byte buffers. For byte buffers, consider ThreadLocal caches for scratch arrays, but cap sizes to avoid retaining huge buffers indefinitely.

private static final ThreadLocal<byte[]> TL = ThreadLocal.withInitial(() -> new byte[4096]);  static byte[] scratch(int min) {      byte[] b = TL.get();      if (b.length < min) {          b = new byte[min];          TL.set(b);      }      return b;  }

C: reuse stack buffers when sizes are bounded; otherwise maintain a reusable heap buffer per worker thread. Always track capacity and avoid buffer overruns.

5) Reduce Hidden Allocations in APIs

Many allocations are “hidden” behind friendly APIs: slicing strings, splitting, regex matches, converting between types, and building exceptions. Allocation-aware design means choosing APIs that expose control over allocation.

  • Python: str.split allocates a list and substrings; for large inputs, consider incremental parsing with indices or re.finditer to avoid materializing all tokens at once.
  • Ruby: String#split allocates arrays and substrings; prefer scanning with indices or scan with a block to process matches as they arrive.
  • Java: String.split uses regex and allocates heavily; prefer manual parsing or indexOf loops. Also be mindful of autoboxing in generic collections.
  • C: avoid repeated malloc for tokens; parse in-place by writing NUL terminators into a mutable buffer or store spans (pointer+length) referencing the original buffer.

Case Study: Allocation-Aware Line Parsing

Consider a hot path that parses lines like key=value and updates a map. A naive approach often allocates multiple substrings per line. An allocation-aware approach uses indices/spans and converts only what is needed.

Python: Index-Based Parsing

def parse_lines(lines):      out = {}      for line in lines:          line = line.strip()          if not line or line[0] == '#':              continue          i = line.find('=')          if i < 0:              continue          key = line[:i]          val = line[i+1:]          out[key] = val      return out

This still allocates key and val substrings. If keys are repeated and values are small, you can reduce churn by interning keys (carefully) or by parsing into a reusable structure per request. In Python, aggressive interning can increase retention, so apply only when keys are from a small bounded set.

Ruby: Avoid Regex Split

def parse_lines(lines)  out = {}  lines.each do |line|    line = line.strip    next if line.empty? || line.getbyte(0) == '#'.ord    i = line.index('=')    next unless i    key = line[0, i]    val = line[(i + 1)..-1]    out[key] = val  end  out  end

Using index avoids regex overhead. You still allocate substrings; if you only need to compare keys against a known set, you can avoid allocating new key strings by comparing slices (more complex) or by normalizing keys once and caching them.

Java: Manual Scan and Minimal Objects

static void parseLines(List<String> lines, Map<String,String> out) {      for (String line : lines) {          int n = line.length();          int start = 0;          while (start < n && Character.isWhitespace(line.charAt(start))) start++;          if (start == n) continue;          if (line.charAt(start) == '#') continue;          int eq = line.indexOf('=', start);          if (eq < 0) continue;          int end = n;          while (end > start && Character.isWhitespace(line.charAt(end - 1))) end--;          String key = line.substring(start, eq).trim();          String val = line.substring(eq + 1, end).trim();          out.put(key, val);      }  }

This allocates key and val. If the input is large and you want to reduce allocations further, parse from a char[] or byte[] buffer and store keys/values in a compact structure, or use a custom map keyed by slices. That is more complex but can be worthwhile in parsers and protocol handlers.

C: Span-Based Tokenization

typedef struct { const char* p; size_t n; } span;  int split_kv(const char* line, size_t len, span* key, span* val) {      size_t i = 0;      while (i < len && (line[i] == ' ' || line[i] == '\t')) i++;      if (i == len || line[i] == '#') return 0;      size_t eq = i;      while (eq < len && line[eq] != '=') eq++;      if (eq == len) return 0;      size_t end = len;      while (end > i && (line[end-1] == '\n' || line[end-1] == '\r' || line[end-1] == ' ' || line[end-1] == '\t')) end--;      key->p = line + i;      key->n = eq - i;      val->p = line + (eq + 1);      val->n = end - (eq + 1);      return 1;  }

This avoids allocating new strings entirely by referencing the original buffer. The trade-off is lifetime management: the backing buffer must remain valid while spans are used. This is a core allocation-aware technique in C and can inspire similar “slice” approaches in other languages (e.g., using indices rather than substrings).

Managing Retention: Don’t Accidentally Keep the World Alive

GC pressure is not only about allocation rate; it also increases when objects are retained longer than intended, causing more live data to be scanned and potentially promoted. Allocation-aware design includes designing ownership boundaries so temporary objects do not leak into long-lived structures.

Common Retention Traps

  • Caches without bounds: unbounded memoization or maps keyed by high-cardinality inputs.
  • Accidental references: closures capturing large objects; listeners/subscribers never removed; global registries.
  • Buffer retention: keeping a reference to a large byte array/string when you only need a small slice of it.

In managed languages, a small object that references a large buffer can keep that buffer alive. In C, the analog is keeping pointers to large allocations and never freeing them, or fragmenting the heap with many differently sized allocations.

Design Tactics

  • Bound caches with size limits and eviction policies; prefer approximate caches when exactness is not required.
  • Copy small, drop large: if you only need a small portion of a large buffer long-term, copy that portion into a right-sized allocation and release the large buffer.
  • Explicit unsubscribe/close: ensure lifecycle hooks remove references (Ruby blocks, Python callbacks, Java listeners).

Object Pools: When They Help and When They Hurt

Pooling reuses objects to reduce allocation. It can reduce GC pressure, but it can also increase retention (pooled objects stay alive), add contention, and complicate correctness. Use pooling for objects that are expensive to allocate or initialize, or for large buffers that would otherwise churn.

Guidelines for Safe Pooling

  • Pool big, not tiny: pooling small short-lived objects often backfires because modern GCs handle them efficiently; pooling can increase live set size.
  • Reset thoroughly: ensure no references to request-specific data remain in a returned object.
  • Prefer thread-local pools to reduce contention, but cap sizes to avoid hoarding memory.
  • Measure tail latency: pooling can reduce GC pauses but introduce lock contention or cache misses.

Java example: bounded buffer pool

final class ByteArrayPool {      private final int bufSize;      private final java.util.concurrent.ArrayBlockingQueue<byte[]> q;      ByteArrayPool(int bufSize, int capacity) {          this.bufSize = bufSize;          this.q = new java.util.concurrent.ArrayBlockingQueue<>(capacity);      }      byte[] acquire() {          byte[] b = q.poll();          return (b != null) ? b : new byte[bufSize];      }      void release(byte[] b) {          if (b == null || b.length != bufSize) return;          q.offer(b);      }  }

Python/Ruby note: pooling user-level objects can be less effective due to interpreter overhead and because many objects are small; focus on pooling large buffers (bytearray, strings) or using libraries that already manage buffers efficiently.

Native Allocations and “GC Pressure” Outside the GC

Even in GC languages, not all memory is managed by the GC. Native allocations can create pressure elsewhere: allocator contention, fragmentation, and RSS growth. Examples include Python/Ruby C extensions, Java direct buffers, memory-mapped files, and compression libraries.

Design Implications

  • Prefer reuse of native buffers when interacting with native libraries.
  • Be explicit about ownership: ensure native resources are released promptly (close/free patterns).
  • Avoid per-call native allocations in hot loops; batch calls or use streaming APIs.

Checklist: Allocation-Aware API Design

When you design an API that will be used in hot paths, make allocation behavior visible and controllable.

  • Provide “into” variants: methods that write into a caller-provided buffer/collection (Java: appendTo(StringBuilder); Python: accept a list to append into; C: write into a provided buffer).
  • Separate parsing from allocation: allow parsing to produce indices/spans first, then allocate only if the caller needs owned strings/objects.
  • Offer streaming interfaces: iterate results rather than returning a fully materialized list.
  • Document ownership and lifetimes: especially for C spans and for reused buffers in any language.
  • Avoid surprising conversions: don’t implicitly box primitives, stringify objects, or copy buffers unless requested.

Practical Refactoring Recipe: From Allocation-Heavy to Allocation-Aware

Step-by-step

  • 1) Inline the pipeline: replace chained transformations with a single loop in the hotspot.
  • 2) Pre-size outputs when you can estimate size (Java arrays, Ruby arrays with Array.new, Python lists by appending but avoiding repeated concatenation).
  • 3) Replace repeated concatenation with builders/buffers.
  • 4) Remove hidden allocations: avoid regex split, avoid boxing, avoid converting types repeatedly.
  • 5) Introduce scratch reuse scoped to a request or thread; add caps to avoid retaining huge buffers.
  • 6) Validate retention boundaries: ensure temporary data does not escape into long-lived structures.

Apply the recipe to one hotspot at a time. Allocation-aware design is most effective when it is localized: keep the rest of the codebase idiomatic and readable, and concentrate low-allocation techniques where they pay for themselves.

Now answer the exercise about the content:

Which change best reflects allocation-aware design for a hot path that currently creates many short-lived temporary objects?

You are right! Congratulations, now go to the next page

You missed! Try again.

Allocation-aware design reduces GC pressure by cutting incidental/accidental allocations in hotspots, e.g., avoiding intermediate collections and using builders or reusable scratch space. Heap growth or unbounded caches may worsen retention and tail latency.

Next chapter

Data Representation and Layout for Speed and Safety

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.