All courses > Hobbies and Special Interests > Robotics and Drones ::

Image Fundamentals for Robotics: Pixels, Color, and Noise

Capítulo 2

Estimated reading time: 10 minutes

Images as Data: Arrays, Channels, and Coordinates

(1) What the concept is

A robot camera image is a grid of samples. In code, it is typically represented as a 2D array for grayscale or a 3D array for color:

Grayscale: shape (H, W) where each element is an intensity value.
Color: shape (H, W, C) where C is the number of channels (commonly 3 for RGB/BGR).

Each pixel is addressed by integer coordinates. Most robotics libraries use (x, y) to mean (column, row), while array indexing uses [row, col]:

x increases to the right (columns).
y increases downward (rows).
The origin is usually the top-left pixel: (x=0, y=0).

Be careful with conventions: some math/graphics contexts place the origin at the bottom-left, but typical camera images in robotics software use top-left.

(2) How it appears in real robot data

Common symptoms of coordinate confusion include bounding boxes drawn in the wrong location, masks shifted by one axis, or swapped width/height. Another frequent issue is mixing (x, y) points with array indexing:

You compute a feature at (x=120, y=50) but access img[x, y] instead of img[y, x].
You resize an image but forget to scale coordinates of detections and keypoints.

(3) Quick checks and simple mitigations

Sanity overlay: draw a crosshair at a known point (e.g., image center) and verify it lands where expected.
Print shapes: log H, W, and channel count before processing.
Coordinate helper: standardize on a single internal convention (e.g., always store points as (x, y), always index arrays as [y, x]).

# Pseudocode (Python-like) for safe access and annotation helpers
H, W = img.shape[0], img.shape[1]

def get_pixel(img, x, y):
    return img[y, x]

def in_bounds(x, y, W, H):
    return 0 <= x < W and 0 <= y < H

cx, cy = W//2, H//2
if in_bounds(cx, cy, W, H):
    val = get_pixel(img, cx, cy)

Grayscale vs RGB: Channels and What They Mean

(1) What the concept is

Grayscale stores one intensity per pixel. RGB stores three intensities per pixel (red, green, blue). In practice, many robotics pipelines use BGR ordering (common in OpenCV), which is the same data but with channel order swapped.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Grayscale is often derived from RGB using a weighted combination (because human perception and sensor responses differ by wavelength). The key point for robotics: grayscale reduces data size and can be robust for shape/edge tasks, while RGB preserves color cues useful for segmentation and recognition.

(2) How it appears in real robot data

Edge/feature detection: often works well on grayscale, especially under stable lighting.
Color-based segmentation: requires color channels; grayscale loses discriminative information (e.g., red vs green objects with similar brightness).
Channel-order bugs: a red object appears blue in visualization or segmentation fails because thresholds were tuned for RGB but data is BGR.

(3) Quick checks and simple mitigations

Verify channel order: inspect a known colored object (e.g., a red marker). If it appears blue, swap channels.
Keep raw and derived: store the original color frame and a derived grayscale frame so you can choose per algorithm.
Normalize consistently: decide whether your pipeline uses [0,255] uint8 or [0,1] float and stick to it.

Color Spaces in Robotics: RGB, HSV, and YCbCr

(1) What the concept is

A color space is a way to represent color numerically. RGB mixes brightness and chromaticity in each channel, which can make thresholding sensitive to lighting changes. Alternative spaces separate these factors:

HSV (Hue, Saturation, Value): hue represents “color type,” saturation represents colorfulness, and value approximates brightness.
YCbCr: Y is luma (brightness), while Cb/Cr are chroma components (blue-difference and red-difference). This separation can improve robustness to illumination changes for some tasks.

In robotics, HSV is commonly used for quick color segmentation, while YCbCr is often used when you want to reduce sensitivity to lighting intensity by focusing on chroma.

(2) How it appears in real robot data

HSV segmentation: a robot picking colored parts can threshold hue ranges to isolate objects, but may struggle when saturation drops (washed-out colors) or when highlights shift hue near specular reflections.
YCbCr robustness: in environments with changing brightness (e.g., moving from shadow to sun), using chroma channels can keep color classification more stable than raw RGB thresholds.
Hue wrap-around: hue is circular (e.g., red may appear near both low and high hue values), so thresholds may need two ranges.

(3) Quick checks and simple mitigations

Inspect channel images: visualize H, S, V (or Y, Cb, Cr) as grayscale images to see where your object stands out.
Use saturation/value gates: in HSV, require a minimum saturation to avoid classifying gray/white regions as a color.
Prefer chroma for lighting changes: in YCbCr, threshold in Cb/Cr and treat Y separately (or ignore it) when brightness varies.

Practical segmentation workflow (step-by-step):

Convert the image to HSV (or YCbCr).
Visualize histograms of relevant channels over a region containing the target object.
Choose thresholds (e.g., hue range + minimum saturation).
Apply morphological cleanup (erode/dilate) to remove speckles and fill holes.
Validate across multiple lighting conditions and camera exposures.

# Pseudocode for HSV thresholding
hsv = to_HSV(bgr)
H, S, V = hsv[:,:,0], hsv[:,:,1], hsv[:,:,2]

mask = (H in [h1,h2]) AND (S > s_min) AND (V > v_min)
mask = morph_open(mask, k=3)
mask = morph_close(mask, k=5)

Bit Depth and Dynamic Range: What Your Sensor Can Represent

(1) What the concept is

Bit depth is how many discrete intensity levels each pixel can take:

8-bit: 256 levels (0–255). Common for standard video streams.
10/12/16-bit: 1024/4096/65536 levels. Common for machine vision cameras and raw sensor outputs.

Dynamic range is the span between the darkest and brightest measurable signals before clipping (black crush or white saturation). Higher dynamic range helps in scenes with both shadows and bright highlights.

Bit depth and dynamic range are related but not identical: you can store many levels (high bit depth) but still clip highlights if the sensor saturates or exposure is too high.

(2) How it appears in real robot data

Clipping: bright reflective parts become flat white regions; dark areas become flat black. Downstream detectors lose texture and edges.
Banding: in low-light with 8-bit images, smooth gradients can become step-like, affecting thresholding and optical flow.
Auto-exposure side effects: exposure changes frame-to-frame, causing apparent intensity changes that confuse background subtraction or intensity-based tracking.

(3) Quick checks and simple mitigations

Histogram inspection: check if many pixels pile up at 0 or max value (clipping).
Lock exposure when possible: for consistent perception, fix exposure/gain/white balance during a task, or constrain auto-exposure ranges.
Use higher bit depth when available: for precision tasks (inspection, low-light), capture 10/12/16-bit and convert carefully for algorithms expecting 8-bit.

Symptom	Likely cause	Effect on detection	Mitigation
Large white blobs	Overexposure / saturation	Lost edges/texture; false contours	Reduce exposure/gain; add polarizer; adjust lighting
Large black regions	Underexposure	Missed features; noisy shadows	Increase exposure; add illumination; denoise
Frame-to-frame brightness jumps	Aggressive auto-exposure	Unstable thresholds and tracking	Lock exposure or smooth exposure changes

Noise in Robot Images: Shot Noise, Read Noise, and What They Do

(1) What the concept is

Noise is random variation in pixel values not caused by the scene. Two common sources:

Shot noise: due to the discrete nature of photons. It increases with signal level but the relative noise is worse in low light. It often appears as graininess, especially in darker regions.
Read noise: introduced by sensor electronics during readout and amplification. It can be noticeable in low-light and high-gain settings, sometimes with fixed-pattern components.

Noise matters because many vision algorithms assume that intensity changes correspond to real edges or textures. Noise creates false edges, unstable keypoints, and flickering segmentation masks.

(2) How it appears in real robot data

Low-light navigation: feature detectors may find many spurious points; optical flow becomes jittery.
Color segmentation: noisy chroma channels produce speckled masks, causing false positives.
Depth/IR cameras: while not purely “image noise,” similar effects appear as speckle and missing pixels, which can break downstream detection if not handled.

(3) Quick checks and simple mitigations

Quick check: temporal stability by looking at a static scene; if pixel values fluctuate heavily, you are noise-limited.
Exposure control: more light (or longer exposure) reduces relative shot noise; lowering gain reduces read-noise amplification.
Denoising: apply light denoising before sensitive steps (thresholding, feature extraction). Choose filters based on what you need to preserve:

Gaussian blur: simple, reduces high-frequency noise but softens edges.
Median filter: good for salt-and-pepper noise and preserves edges better than Gaussian in some cases.
Bilateral filter: reduces noise while preserving edges, but is slower.

# Pseudocode: denoise before thresholding
img_d = median_filter(img, k=3)  # or gaussian(img, sigma)
mask = img_d > T

Compression Artifacts: When the Codec Becomes Part of Your Data

(1) What the concept is

Many robots receive camera data through compressed streams (e.g., MJPEG, H.264/H.265). Compression reduces bandwidth but introduces artifacts:

Blocking: visible square blocks, especially in flat regions.
Ringing: ripples near sharp edges.
Color bleeding: chroma subsampling reduces color resolution, smearing color boundaries.

These artifacts are not random like noise; they are structured and can systematically bias detection.

(2) How it appears in real robot data

Edge-based detection: blocking creates artificial edges, increasing false positives.
Small object detection: fine details get smoothed away; tiny targets disappear.
Color segmentation: chroma subsampling can shift or blur color boundaries, causing masks to leak into the background.

(3) Quick checks and simple mitigations

Compare raw vs compressed: if possible, capture a short sequence uncompressed and compare detection outputs.
Increase bitrate / quality: for perception-critical tasks, allocate bandwidth to reduce artifacts.
Avoid repeated re-encoding: keep the stream in one codec stage; don’t decode and re-encode multiple times.
Prefer intra-frame for vision debugging: MJPEG can be easier to reason about than long-GOP video when diagnosing artifacts (at the cost of bandwidth).

Practical Debug Toolkit: Fast Diagnostics Before You Tune Algorithms

Histogram inspection (step-by-step)

Compute histogram of intensity (grayscale) or Y channel (YCbCr) for a representative frame.
Check for clipping: spikes at 0 or max indicate under/overexposure.
Check spread: a very narrow histogram suggests low contrast, which can hurt edge detection and thresholding.
Repeat on a region-of-interest (ROI) containing the object to see if it separates from background.

Color-space conversion checks

Verify expected ranges: some libraries represent hue as 0–180 instead of 0–360, and channel scaling differs across implementations.
Threshold on the right channel: if lighting changes are the problem, try chroma channels (Cb/Cr) or use HSV with saturation gating.

Noise and artifact triage

Static-scene test: point the robot at a still scene and record 2–3 seconds. If detections flicker, suspect noise, auto-exposure, or compression.
Downstream sensitivity test: run your detector on (a) original, (b) lightly blurred, (c) denoised frames. If performance improves dramatically with mild denoising, noise is a major factor.
Bandwidth test: temporarily raise stream quality/bitrate; if detection improves, compression artifacts are limiting you.

Common pitfalls checklist

Swapped axes: using (x,y) as [x,y] indexing.
Wrong channel order: treating BGR as RGB.
Inconsistent scaling: mixing uint8 [0–255] with float [0–1] without conversion.
Auto settings drift: auto-exposure/auto-white-balance changing mid-task.
Thresholds tuned in one lighting condition: not robust across environments.

Now answer the exercise about the content:

A robot computes a feature point at (x=120, y=50) in image coordinates, but the overlay appears in the wrong location. Which fix best matches common robotics image conventions for accessing the pixel value at that point?

You are right! Congratulations, now go to the next page

You missed! Try again.

Robotics images usually store points as (x,y) = (column,row), but array indexing is [row,col]. So the correct access is img[y,x]; using img[x,y] swaps axes and misplaces overlays.