All courses > Hobbies and Special Interests > Robotics and Drones ::

Filtering and Preprocessing Pipelines for Real-Time Vision

Capítulo 6

Estimated reading time: 10 minutes

Why preprocessing pipelines matter in real-time robotics

In robotics, preprocessing is not about making images “look nicer”; it is about making downstream perception (detection, tracking, segmentation, pose estimation) more stable under real-world variation while staying within strict latency and compute budgets. Every operation you add increases per-frame processing time, which can reduce frame rate, increase end-to-end delay, and degrade closed-loop control (e.g., overshoot, oscillation, or missed obstacles).

A good pipeline is therefore designed like a control component: define what instability you must remove (jitter, noise, lens artifacts, lighting flicker), apply the minimum set of operations that fixes it, measure cost, and validate against corner cases.

Latency, frame rate, and control responsiveness

Key timing terms you should track

Per-frame compute time: time spent in preprocessing + inference + postprocessing.
End-to-end latency: time from photon capture to control action (camera exposure + sensor readout + transport + buffering + compute + actuator command).
Frame age: how old the image is when used by the controller.
Jitter: variability in latency; often more harmful than a slightly higher but consistent latency.

Rule of thumb for closed-loop systems: a stable controller prefers predictable latency. If your preprocessing occasionally spikes (e.g., due to large-kernel morphology), the controller may react to stale information.

Budgeting: start from the control loop

Work backwards from the control frequency. If your robot runs a 50 Hz control loop (20 ms period), and you want vision updates at 25–30 Hz, you might target an end-to-end vision latency under ~40–60 ms with low jitter. That budget includes camera exposure and transport, so preprocessing often needs to be in the single-digit milliseconds on embedded hardware.

Component	Typical impact	What to watch
Resize / crop	Often reduces total cost	Interpolation choice, memory copies
Undistort/rectify	Moderate to high cost	Remap implementation, map caching
Smoothing	Low to moderate	Kernel size, separable filters
Sharpening	Low to moderate	Noise amplification
Morphology	Moderate to high	Kernel size/shape, iterations
Temporal filtering	Low compute, adds delay	Introduced lag, responsiveness

Pipeline design approach: minimum operations, measured cost

Step 1: Define the downstream detector’s sensitivities

List what breaks your detector/tracker in the field. Examples:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Small objects missed because input resolution is too low.
False positives due to sensor noise in low light.
Edge-based features unstable due to motion blur or aliasing.
Binary segmentation noisy due to speckle; needs morphology.
Tracking jitter due to frame-to-frame noise; needs temporal smoothing.

Translate each failure into a candidate preprocessing step. Avoid “standard” pipelines; pick only what addresses a documented failure mode.

Step 2: Choose an initial minimal pipeline

Start with the cheapest operations that give the biggest stability gains:

Crop ROI (reduce pixels processed).
Resize to the detector’s native input size (avoid extra resizes later).
Optional light smoothing if noise causes instability.

Step 3: Measure cost and latency, not just FPS

Measure:

Average and worst-case per-stage time.
End-to-end latency (timestamp at capture and at decision).
Jitter (standard deviation, max spikes).

Keep measurements on the target hardware. Desktop profiling can mislead due to different memory bandwidth and accelerators.

Step 4: Validate against corner cases

Build a small “torture set” of scenarios: fast motion, low light, glare, repetitive textures, partial occlusions, and maximum expected robot speed. For each change to preprocessing, verify:

Detector stability improves (fewer flickers, fewer missed frames).
Latency stays within budget (including worst-case spikes).
Control behavior improves (less oscillation, fewer late reactions).

Common preprocessing steps (robotics lens)

1) Resizing: match the model, control aliasing, reduce compute

Resizing is often the highest-leverage knob because it changes the number of pixels every later stage must touch. Use it intentionally:

Downscale to reduce compute and increase FPS, but watch for small-object loss.
Upscale rarely helps; it increases compute without adding information.

Interpolation choice matters:

INTER_AREA (or equivalent) is good for downscaling (reduces aliasing).
INTER_LINEAR is a common default, often fastest on many platforms.
INTER_NEAREST is fastest but can introduce jagged edges and unstable features.

Practical step-by-step (design decision):

Pick the smallest resolution that still meets detection requirements (test with your smallest target object at farthest distance).
Measure inference + preprocessing time at that resolution.
If small objects fail, consider ROI cropping before increasing full-frame resolution.

2) Cropping ROI: spend pixels where it matters

ROI cropping reduces compute and can reduce false positives by removing irrelevant regions (sky, ceiling, robot chassis). In robotics, ROI is often defined by geometry (e.g., ground plane) or task constraints (e.g., conveyor belt area).

Static ROI is cheapest: fixed rectangle(s). Dynamic ROI (based on last detection or tracking) can be powerful but risks losing targets when tracking fails.

Practical step-by-step:

Start with a conservative static ROI that covers all expected target locations.
Verify that cropping does not remove targets during turns, bumps, or camera vibration.
If using dynamic ROI, add a fallback: periodically run full-frame detection (e.g., every N frames) or expand ROI when confidence drops.

3) Rectification/undistortion: use only when it stabilizes geometry

Undistortion/rectification can improve geometric consistency (straight lines, consistent scale across the image), which helps:

Feature tracking and visual odometry.
Stereo matching (rectification is often essential).
Metric measurements from image geometry.

But it can be expensive because it involves per-pixel remapping. To keep it real-time:

Precompute maps and reuse them (avoid recomputing per frame).
Use hardware acceleration if available (GPU/ISP/VPU).
Undistort only the ROI if your library supports remapping a subregion.

Practical step-by-step:

Test downstream performance with and without undistortion.
If needed, implement remap with cached maps.
Profile worst-case time; remap cost scales with pixel count, so combine with ROI/resize.

4) Smoothing (denoising): stabilize detections without adding lag

Smoothing reduces high-frequency noise that can cause flickering detections or unstable edges. Common choices:

Gaussian blur: good general-purpose smoothing; separable implementations are efficient.
Median blur: effective for salt-and-pepper noise; can be slower for large kernels.
Bilateral filter: preserves edges but often too slow for tight budgets.

Robotics trade-off: too much smoothing can erase small features and reduce responsiveness to sudden changes.

Practical step-by-step:

Start with a small Gaussian kernel (e.g., 3×3 or 5×5).
Check if false positives/feature jitter decreases.
Increase kernel only if needed; re-measure latency and detection of small objects.

5) Sharpening: recover edge contrast, but avoid amplifying noise

Sharpening can help when images are slightly blurred (motion, focus, or downscaling). A common approach is unsharp masking (original + scaled high-pass). In robotics, sharpening is most useful when your detector relies on edges or texture, but it can amplify noise and create ringing artifacts that confuse thresholding or feature detectors.

Practical step-by-step:

Only add sharpening if you can demonstrate improved downstream metrics (e.g., higher detection confidence, fewer missed edges).
Apply mild sharpening after smoothing (if both are used) to avoid boosting noise.
Profile: sharpening is usually not the bottleneck, but it adds another full-frame pass.

6) Morphological operations: clean up binary masks efficiently

Morphology is common after thresholding or segmentation to remove speckles, fill holes, and connect components:

Erosion: removes small blobs/noise, shrinks regions.
Dilation: fills gaps, expands regions.
Opening (erode then dilate): removes small noise.
Closing (dilate then erode): fills small holes.

Robotics lens: morphology can be expensive with large kernels or multiple iterations. It can also change object geometry (biasing size/shape), which matters for grasping or precise alignment.

Practical step-by-step:

Use the smallest kernel that fixes the issue (often 3×3).
Prefer one iteration with a slightly larger kernel over many iterations (measure both).
Apply morphology on a reduced-resolution mask when possible, then map results back if needed.

7) Temporal filtering: reduce flicker, but manage added delay

Temporal filtering uses information across frames to stabilize outputs. It can be applied to:

Pixel values (temporal smoothing of the image).
Binary masks (majority vote over last N frames).
Detections (smooth bounding boxes, confidence scores).

Key trade-off: temporal filtering reduces noise but introduces lag. In closed-loop control, lag can cause the robot to react late (e.g., braking after passing the obstacle).

Common methods:

Exponential moving average (EMA) on detection outputs: low compute, minimal memory.
Fixed window average/median: stronger smoothing, more delay and memory.
Simple tracking filter (e.g., constant-velocity model) on object position: improves stability and can predict through brief dropouts.

Practical step-by-step (EMA on a scalar like confidence or x-position):

// y_t = filtered value, x_t = new measurement, alpha in (0,1] (higher = more responsive) y_t = alpha * x_t + (1 - alpha) * y_{t-1}

Start with a relatively responsive alpha (e.g., 0.5–0.8).
Measure control behavior: does the robot stop/turn late?
Lower alpha only if flicker still causes unstable actions.

Putting it together: example pipelines and when to use them

Pipeline A: low-latency object detection for navigation

Crop ROI (road/aisle region)
Resize to model input
Optional small Gaussian blur (3×3) if noise causes flicker

Why: minimizes passes over the image; avoids expensive remap/morphology unless proven necessary.

Pipeline B: binary segmentation cleanup for line following or lane marking

Crop ROI (lower half of image)
Resize down (if acceptable)
Threshold/segmentation (downstream step)
Opening (3×3) to remove speckles
Closing (3×3) to fill small gaps
Temporal smoothing on the estimated line position (EMA)

Why: morphology stabilizes the mask; temporal smoothing stabilizes steering commands.

Pipeline C: geometry-sensitive tracking (needs rectification)

Rectify/undistort with cached remap maps
Crop ROI
Resize
Optional mild sharpening if features are weak
Temporal filter on pose/track state (predict + update)

Why: rectification first ensures later measurements are consistent; ROI/resize contain the cost.

Synchronization, timestamping, and buffering in real-time systems

Timestamp frames at the right point

For closed-loop control, you need to know when the photons were captured, not when your code received the frame. Prefer a hardware-provided capture timestamp if available. If not, timestamp as early as possible in the acquisition thread, before any buffering or conversion.

Capture timestamp: best for aligning with IMU/odometry.
Receive timestamp: can hide transport delays and queueing.

Buffering strategies and their consequences

Buffers smooth bursty compute, but they also create stale frames. Common patterns:

Queue all frames: maximizes throughput but can build latency under load (bad for control).
Keep latest only (drop old frames): minimizes latency, improves responsiveness (often best for control).
Small bounded queue: compromise; prevents unbounded lag while reducing drops.

In many robotics controllers, it is better to drop frames than to act on old frames. A 10 Hz perception result that is fresh can be safer than a 30 Hz stream that is 300 ms behind.

Detecting and handling stale frames

Implement a maximum acceptable frame age. If the current frame is older than a threshold, you can:

Skip processing and wait for a newer frame.
Run a cheaper fallback pipeline (e.g., smaller resize, no morphology).
Switch the controller to a conservative mode (slow down, increase safety margin).

Practical check:

frame_age_ms = now_ms() - frame.capture_timestamp_ms if (frame_age_ms > MAX_AGE_MS) { drop_frame(); }

Synchronizing vision with other sensors

When fusing camera with IMU/odometry, misalignment can look like perception noise. Practical notes:

Use consistent time bases (monotonic clock) across processes.
Record timestamps for: capture, preprocessing start/end, inference start/end, and publish time.
If using approximate synchronization, bound the allowed time difference and log when it is exceeded.

Compute-aware implementation tips

Reduce memory bandwidth and copies

On embedded systems, memory movement can cost more than arithmetic.

Prefer in-place operations when safe.
Keep images in the format expected by the next stage (avoid repeated color conversions).
Fuse operations when possible (e.g., crop+resize in one step).

Prefer predictable runtime

For control, consistent timing beats occasional high FPS.

Avoid algorithms with data-dependent runtime spikes when possible.
Use fixed kernel sizes and bounded iterations for morphology.
Pin threads or set real-time priorities where appropriate (system-dependent).

Measure per-stage timing with instrumentation

Instrument each stage and log percentiles (p50/p90/p99), not just averages. A pipeline that averages 10 ms but spikes to 40 ms will create control issues.

Stage	Metric to log	Why
Acquire	capture_ts, receive_ts	Detect transport/driver delays
Preprocess	start/end, p99	Find jitter sources
Inference	start/end, p99	Budget main compute block
Publish/Control	decision_ts	Compute end-to-end latency

Now answer the exercise about the content:

When designing a real-time vision preprocessing pipeline for closed-loop robot control, what approach best supports stable control behavior under strict latency budgets?

You are right! Congratulations, now go to the next page

You missed! Try again.

For stable closed-loop control, predictable low-latency updates matter. Build a minimal pipeline tied to documented failure modes, then profile per-stage time, end-to-end latency, and jitter on the target hardware to stay within budget.