All courses > Hobbies and Special Interests > Robotics and Drones ::

Cameras in Robotics: From Pixels to Measurements

Capítulo 7

Estimated reading time: 10 minutes

What a Camera Actually Measures

A camera is a 2D array of light sensors. Each pixel converts incoming light into an electrical signal that is digitized into an image. In robotics, the “raw measurement” is not an object label or a distance; it is image intensity (brightness) and, for color cameras, color channel intensities (typically RGB or a Bayer mosaic that is later demosaiced).

Two key ideas help connect pixels to physical meaning:

Radiometry (how much light): pixel values depend on scene illumination, surface reflectance, lens aperture, exposure time, and sensor gain.
Geometry (where light comes from): each pixel corresponds to a viewing direction through the lens; with a camera model and calibration, pixel coordinates can be mapped to rays in 3D.

From pixels to robotic measurements

Robotics pipelines convert images into measurements such as:

Features: keypoints (corners/blobs) and descriptors used for matching and tracking.
Motion cues: optical flow or feature tracks to estimate camera motion and scene motion.
Depth cues: stereo disparity, structure-from-motion, or monocular depth inference (model-based or learned).
Object detections: bounding boxes, instance masks, pose estimates, and semantic labels.

All of these are derived measurements whose accuracy depends on camera modeling, settings, noise/artifacts, and calibration.

Conceptual Camera Models (Geometry You Can Compute With)

Pinhole model: the core idea

The pinhole model approximates the camera as a single point (the camera center) where rays pass through and intersect an image plane. A 3D point in camera coordinates (X, Y, Z) projects to normalized image coordinates (x, y) = (X/Z, Y/Z). These are then mapped to pixel coordinates using the intrinsic parameters.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Intrinsics: focal length and principal point

In pixel units, the mapping is often written:

u = fx * (X/Z) + cx
v = fy * (Y/Z) + cy

fx, fy: focal lengths in pixels (related to physical focal length and pixel size).
cx, cy: principal point (where the optical axis hits the image).

Practical interpretation:

Larger fx, fy (narrower field of view) makes objects appear larger and can improve angular precision but reduces coverage.
cx, cy matters for metric tasks (e.g., projecting detections into 3D); small errors can bias measurements.

Lens distortion: why straight lines bend

Real lenses deviate from the pinhole model. The most common effects:

Radial distortion (barrel/pincushion): increases with distance from the image center.
Tangential distortion: due to lens misalignment.

Distortion is not just a cosmetic issue. If you compute angles, epipolar geometry, or depth from disparity, unmodeled distortion can create systematic errors.

Practical Camera Settings (Radiometry and Timing)

Exposure time (integration time)

Exposure time controls how long each pixel collects photons.

Long exposure: brighter image, better in low light, but more motion blur.
Short exposure: less blur, but darker image and more noise amplification (if gain is increased).

Gain / ISO (analog/digital amplification)

Gain amplifies the sensor signal. It can make images look brighter without changing exposure time, but it also amplifies noise and can reduce effective dynamic range.

Frame rate and latency

Frame rate determines how often you get measurements; latency determines how old the image is when used by the robot. Higher frame rate can improve tracking and control responsiveness, but increases bandwidth and compute load.

Rolling shutter vs global shutter

Global shutter: all pixels expose at the same time. Best for fast motion and accurate geometry.
Rolling shutter: rows expose at slightly different times. During motion, the image can warp (skew/“jello”), which corrupts feature tracks and can bias motion/depth estimation.

Rule of thumb: if the robot or camera moves quickly (drones, fast arms, mobile robots over rough terrain), global shutter is often worth the cost.

Noise and Artifacts You Must Expect (and Design Around)

Motion blur

Motion blur occurs when the image changes during exposure. It smears edges and reduces the repeatability of keypoints and descriptors.

Symptoms: fewer stable features, poor matching, degraded optical flow.
Mitigation: shorten exposure, add light, use global shutter, stabilize camera, or accept lower frame rate with controlled exposure.

Low-light noise

In low light, fewer photons arrive, increasing shot noise; raising gain adds read noise and quantization effects. The result is grainy images and unstable feature detection.

Symptoms: flickering detections, drifting tracks, false corners.
Mitigation: brighter optics (lower f-number), larger pixels/sensor, controlled illumination, denoising (with care), or longer exposure if motion allows.

Lens distortion and imperfect rectification

Even with calibration, distortion compensation can be imperfect if calibration is poor or if focus/zoom changes. For stereo, small rectification errors can create depth bias.

White balance shifts and color inconsistencies

Auto white balance changes the mapping from sensor channels to output RGB over time. This can break color-based segmentation and cause appearance changes that confuse learned detectors.

Mitigation: lock white balance and exposure when consistent appearance matters (e.g., inspection, mapping).

Synchronization issues (multi-sensor and multi-camera)

Robotic measurements often combine camera data with other sensors or multiple cameras. If timestamps are inconsistent or frames are not captured simultaneously, you can get:

Incorrect triangulation in stereo (depth errors).
Bias in motion estimation when pairing images with the wrong time.
Inconsistent object tracking across cameras.

Mitigation involves hardware triggering, precise timestamping, and verifying end-to-end latency (capture → transfer → processing).

Calibration: Turning Images into Metric Measurements

Intrinsic calibration (camera-only parameters)

Intrinsic calibration estimates parameters that map rays to pixels:

Focal lengths fx, fy
Principal point cx, cy
Distortion coefficients (radial/tangential)

Why it matters: any pipeline that turns pixels into angles, rays, or 3D points depends on intrinsics. Errors show up as systematic scale/angle bias and depth inaccuracies.

Extrinsic calibration (camera-to-robot transform)

Extrinsics define the rigid transform between the camera frame and the robot frame (or another reference frame): rotation and translation. This is essential when you want to:

Project detections into the robot/world frame.
Fuse camera-derived measurements with robot kinematics.
Use the camera for navigation relative to the robot body.

Small extrinsic errors can cause large position errors at distance. For example, a small angular misalignment can shift projected points by many centimeters or more several meters away.

How calibration quality impacts metric accuracy

Calibration is not “set and forget” if the physical setup changes. Accuracy degrades when:

The lens focus changes (some lenses shift intrinsics slightly with focus).
The camera mount flexes or is bumped (extrinsics change).
Temperature changes cause mechanical drift.

Practical implication: if you need metric accuracy (e.g., grasping, measurement, mapping), treat calibration as part of the system’s maintenance and validation.

Step-by-step: a practical calibration workflow

1) Prepare the setup

Mount the camera rigidly; avoid wobble and cable strain.
Decide whether you will lock focus and zoom (recommended for repeatability).
Fix camera settings if possible: disable auto exposure/auto white balance for consistent calibration images.

2) Collect intrinsic calibration images

Use a calibration target (checkerboard/AprilTag grid).
Capture many views: different positions, tilts, and distances; cover the whole image (corners and edges matter for distortion).
Avoid motion blur; ensure sharp corners.

3) Solve and validate intrinsics

Compute intrinsics and distortion.
Validate by undistorting images: straight edges in the scene should look straight; the calibration target should align well after reprojection.
Check reprojection error and also visually inspect edge regions (low error can still hide localized issues if coverage was poor).

4) Collect data for extrinsic calibration

Define the robot frame you care about (base, end-effector, etc.).
Capture images of a known target while the robot moves through multiple poses, or use a hand-eye calibration procedure if the camera is mounted on the robot.
Ensure the robot pose data and images are time-aligned.

5) Solve and validate extrinsics

Project known points/targets into the image and check alignment.
Project image detections into the robot frame and verify distances/angles against ground truth measurements.

6) Operational checks

Record a short dataset and verify that metric outputs (e.g., depth from stereo, pose from markers) remain stable across time and motion.
Re-check after any mechanical change.

Extracting Measurements: Common Robotics Outputs

Feature detection and tracking

Features convert images into sparse, trackable points. A typical pipeline:

Detect keypoints (e.g., corners) in each frame.
Compute descriptors and match across frames, or track directly using optical flow.
Reject outliers (e.g., with geometric consistency checks).

Practical notes:

Textureless surfaces yield few features; repetitive patterns cause ambiguous matches.
Motion blur and rolling shutter reduce track quality.

Optical flow and motion cues

Optical flow estimates apparent pixel motion between frames. It can support:

Visual odometry (camera motion estimation).
Obstacle motion detection.
Control tasks (e.g., keeping a target centered).

Flow is sensitive to lighting changes and requires sufficient frame rate to keep inter-frame motion small.

Depth cues

Stereo: depth from disparity; depends heavily on calibration, synchronization, and rectification.
Structure-from-motion: depth from camera motion; needs stable feature tracking and good geometric modeling.
Monocular learned depth: can provide dense depth-like outputs but metric scale and generalization depend on training and scene conditions.

Object detection and pose estimation

Detectors output bounding boxes or masks; pose estimators may output 6D pose for known objects. Practical considerations:

Exposure/white balance stability improves consistency.
Higher resolution can improve small-object detection but increases compute.
Latency matters: a correct detection that arrives too late can be useless for control.

Camera Selection Criteria for Robotics

Criterion	What it affects	Practical guidance
Resolution	Spatial detail, small-object detection, feature density	Choose enough pixels to resolve the smallest relevant object/feature; don’t overshoot if compute is limited.
Frame rate	Tracking stability, control responsiveness, motion estimation	Higher is better for fast motion; ensure exposure can be short enough at that frame rate.
Field of view (FoV)	Coverage vs angular precision	Wide FoV helps awareness but increases distortion and reduces pixel-per-degree; narrow FoV improves precision but can lose context.
Shutter type	Geometric correctness under motion	Prefer global shutter for fast-moving platforms or accurate metric tasks; rolling shutter can be acceptable for slow motion and cost-sensitive designs.
Low-light performance	Noise, blur trade-offs	Look for larger sensor/pixels, good quantum efficiency, and lenses with wide aperture; avoid relying solely on high gain.
Dynamic range	Handling bright/dark regions simultaneously	Important for outdoor scenes, windows, shiny parts; consider HDR modes if they don’t introduce motion artifacts.
Compute requirements	Real-time feasibility	Higher resolution/frame rate increases bandwidth and processing; plan for GPU/accelerators or use ROI/cropping and efficient models.
Interface & bandwidth	Dropped frames, latency	Ensure the link (USB3, MIPI, GigE) supports sustained throughput with headroom.

Step-by-step: choosing a camera for a real-time robot

1) Define the measurement you need

Navigation and tracking: prioritize frame rate, shutter type, and low latency.
Inspection and detection of small defects: prioritize resolution, optics quality, and lighting control.
Metric depth (stereo/SfM): prioritize calibration stability, global shutter (often), and synchronization.

2) Set performance targets

Maximum robot speed and expected angular rate.
Minimum object size at maximum distance.
Maximum acceptable end-to-end latency for control.

3) Translate targets into specs

Pick FoV and resolution to meet pixel coverage needs.
Pick frame rate so inter-frame motion stays trackable.
Pick shutter type based on motion and metric accuracy needs.

4) Check lighting and exposure feasibility

Compute whether you can use short exposure without excessive gain in your worst lighting.
If not, plan illumination (IR/visible) or choose a more sensitive sensor/lens.

5) Budget compute and bandwidth

Estimate data rate: width × height × bytes_per_pixel × fps.
Prototype the vision pipeline and measure actual latency and dropped frames.

6) Validate with a realistic test

Test in representative motion, lighting, and vibration conditions.
Verify that calibration remains stable and that derived measurements (tracks, depth, detections) meet accuracy requirements.

Now answer the exercise about the content:

When a robot needs accurate geometry while the camera is moving quickly, which shutter type is typically the better choice and why?

You are right! Congratulations, now go to the next page

You missed! Try again.

Global shutter exposes all pixels simultaneously, which avoids the row-by-row timing differences of rolling shutter that can cause skew/"jello" during motion and corrupt feature tracking, motion estimation, and depth.