All courses > Hobbies and Special Interests > Robotics and Drones ::

Camera Calibration and Coordinate Frames for Robotics Integration

Capítulo 5

Estimated reading time: 11 minutes

Why Calibration Matters for Robotics Integration

In robotics, a camera is not just an image source; it is a sensor that must report measurements in the same geometric language as the robot. Calibration is the process of estimating (1) the camera’s intrinsic parameters (how 3D rays map to pixels) and (2) the camera’s extrinsic parameters (where the camera is mounted relative to the robot and/or the world). When these are wrong, pixel measurements turn into incorrect distances, incorrect poses, and misaligned overlays between perception and motion.

Camera Intrinsics: Mapping Rays to Pixels

Intrinsic matrix (pinhole model)

Most robotics pipelines start with the pinhole model. A 3D point in the camera coordinate frame (Xc, Yc, Zc) projects to normalized image coordinates (x, y) as:

x = Xc / Zc
y = Yc / Zc

Then it maps to pixel coordinates (u, v) using the intrinsic matrix K:

[u]   [fx  0  cx] [x]
[v] = [ 0 fy  cy] [y]
[1]   [ 0  0   1] [1]

fx, fy: focal lengths in pixel units (not millimeters). They encode both lens focal length and pixel size.
cx, cy: principal point (where the optical axis hits the sensor), typically near the image center but not exactly.

Distortion parameters

Real lenses bend rays, so the ideal pinhole projection is corrected by a distortion model. The most common is radial + tangential distortion:

Radial: k1, k2, k3, ... (barrel/pincushion effects)
Tangential: p1, p2 (decentering due to lens/sensor misalignment)

In practice, you do not hand-derive these; you estimate them from calibration images and then use them to undistort images or to project points accurately during pose estimation.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

What intrinsics affect in robotics

Depth from geometry (e.g., monocular size-based ranging, PnP pose): wrong fx, fy scales distance estimates.
Pose estimation (fiducials, 3D-2D correspondences): wrong principal point or distortion biases rotation/translation.
Straight-line assumptions: distortion not modeled correctly makes straight edges curve, breaking line-based mapping or corridor detection.

Camera Extrinsics: Pose Relative to the Robot

Extrinsics as a rigid transform

Extrinsics describe the camera pose relative to another frame (commonly the robot base). This is a 3D rigid transform consisting of rotation R and translation t:

p_target = R * p_source + t

Using homogeneous coordinates, the transform is:

T_target_source = [ R  t ]
                  [ 0  1 ]

For robotics integration, a typical need is T_base_camera (or its inverse T_camera_base). Once known, any point measured in the camera frame can be expressed in the robot base frame for planning and control.

Extrinsics vs. intrinsics: common confusion

Intrinsics: properties of the camera+lens+sensor mapping to pixels (do not change when you move the camera).
Extrinsics: where the camera is mounted (change if you bump the mount, re-seat the camera, or adjust tilt).

Coordinate Frames and Conventions in Robotics

Three frames you will use constantly

Camera frame ({C}): origin at the camera optical center. Axes depend on convention (see below).
Robot base frame ({B}): fixed to the robot chassis (e.g., center of axle or base_link). Used by odometry, kinematics, and planners.
World/map frame ({W}): a global reference (map, motion-capture world, or SLAM map). Used for navigation goals and global localization.

Axis conventions (be explicit)

Two common camera conventions appear in robotics and vision libraries:

Computer vision convention (often OpenCV): X right, Y down, Z forward (into the scene).
Robotics convention (often ROS REP-103 for body frames): X forward, Y left, Z up. Camera optical frames in ROS are often defined with Z forward, X right, Y down, which matches the vision convention but differs from base frames.

Integration errors frequently come from mixing these conventions. Always document which frame each transform maps from and to, and verify with a physical sanity check (e.g., move an object to the robot’s left and confirm the sign of the transformed coordinate).

Transform chaining

Robotics systems chain transforms to move measurements between frames. For example, to express a 3D point observed by the camera in the world frame:

p_W = T_WB * T_BC * p_C

Where:

T_BC is the camera pose in the base frame (extrinsic mounting calibration).
T_WB is the robot base pose in the world (from localization/odometry/SLAM).

Keep a consistent naming scheme such as T_target_source to avoid accidental inversions.

Practical Calibration Workflow (Intrinsics + Extrinsics)

Step 0: Decide what you are calibrating

Intrinsics calibration: per camera + lens + resolution + focus setting. Changing resolution, digital zoom, or focus can change intrinsics.
Extrinsics calibration: per robot build and mount. Any physical shift changes extrinsics.
Hand–eye calibration (if applicable): camera relative to an arm end-effector or moving mechanism; requires additional motion data. (If your camera is fixed on the base, you typically just need a static T_BC.)

Step 1: Collect calibration images

Use a checkerboard or a ChArUco board (ArUco markers + chessboard corners). ChArUco is often more robust when parts of the board are occluded or at steep angles.

Capture 20–60 images (more if wide-angle).
Vary pose: near/far, left/right, top/bottom, and tilted views so corners cover the whole image area.
Avoid motion blur; ensure the board is flat and not warped.
Use the same resolution and camera settings you will use in the robot application.

Step 2: Detect feature points (checkerboard corners / ChArUco corners)

The calibration tool detects 2D image points and associates them with known 3D points on the board (in the board’s coordinate frame). Practical tips:

Reject frames where detection is partial or corners are poorly localized.
Ensure corners are not saturated or heavily shadowed.
For ChArUco, verify marker IDs are correct; wrong IDs corrupt correspondences.

Step 3: Estimate intrinsics and distortion

Run a standard calibration routine (e.g., OpenCV calibrateCamera for checkerboard, or aruco.calibrateCameraCharuco for ChArUco). The output typically includes:

K (intrinsic matrix)
distortion coefficients
per-image board pose estimates (useful for diagnostics)

Store results with metadata: camera serial number, resolution, date, focus setting, and lens configuration.

Step 4: Validate using reprojection error (and interpret it correctly)

Reprojection error measures how well the estimated parameters can reproject the known 3D board points back onto the observed 2D corners.

Mean error (pixels): lower is better, but “good” depends on resolution and corner quality.
Look for patterns: if error is small in the center but large at edges, distortion modeling or coverage is insufficient.
Check per-image error: a few bad images can skew results; remove outliers and recalibrate.

Do not rely solely on a single average number; inspect error distribution across the image and across frames.

Step 5: Real-scene verification (sanity checks)

After intrinsics are estimated, verify them in the environment where the robot operates:

Straight-line check: undistort an image of a scene with straight edges (door frames, floor tiles, shelving). Lines should appear straight across the frame, especially near edges.
Known-distance check: place two points a known distance apart (e.g., markers on a wall) and estimate distance using your perception method (PnP with a known target, or triangulation if stereo). Compare to ground truth.
Overlay check: project a known 3D object model or a measured point cloud into the image; misalignment indicates calibration or frame issues.

Step 6: Estimate extrinsics (camera-to-base transform)

There are several practical ways to obtain T_BC:

Mechanical measurement (rough): measure translation and approximate rotation from CAD or tape measure + inclinometer. Often insufficient for precise manipulation but can work for coarse navigation.
Target-based calibration (recommended): place a calibration target at a known pose in the robot base frame (or measure it carefully), detect it in the camera, estimate T_CTarget via PnP, then solve for T_BC.
In-situ alignment: adjust T_BC to minimize error between camera-derived landmarks and robot/world landmarks (requires a reliable world reference).

A common target-based approach:

Define a target frame {T} rigidly attached to a board.
Measure or constrain T_BT (target pose in base frame) by placing the board at a known location relative to the robot.
Detect board corners in the image and run PnP to estimate T_CT (target pose in camera frame).
Compute T_BC using transform relations. With consistent notation: T_BC = T_BT * (T_CT)^{-1}.

Repeat for multiple placements and average/optimize to reduce noise.

Step 7: Verify extrinsics with a physical motion test

Place a distinct object on the floor at a known position in front of the robot.
Estimate its 3D position in the camera frame (from depth sensor, stereo, or known planar constraints).
Transform it into the base frame using T_BC.
Check whether the object’s base-frame coordinates match tape-measured ground truth (especially left/right sign and forward distance).

How Calibration Errors Propagate into Robotics Tasks

Obstacle distance estimation

Even when you are not explicitly computing 3D pose, calibration affects distance estimates:

Monocular size-based distance: if you estimate distance from apparent size, an error in fx scales distance roughly proportionally (e.g., 2% error in fx can yield ~2% distance bias under simple models).
Ground-plane projection: many mobile robots estimate obstacle distance by projecting image points onto the ground plane using camera height and pitch. Small pitch errors in extrinsics can create large distance errors far from the robot (because the ray intersects the ground at a shallow angle).
Depth + extrinsics: if you have depth (RGB-D), wrong T_BC shifts obstacles in the base frame, causing the planner to think obstacles are closer/farther or left/right than they are.

Fiducial marker pose (PnP) sensitivity

Fiducial markers (e.g., ArUco) often use PnP to estimate T_CMarker. Errors arise from:

Intrinsics: wrong cx, cy biases rotation; wrong distortion warps corner locations; wrong fx, fy scales translation.
Marker size: incorrect physical marker size directly scales the estimated translation.
Extrinsics: even if T_CMarker is correct, converting to base/world (T_BC, T_WB) can misplace the marker, breaking docking or alignment behaviors.

Practical symptom: the marker pose looks stable in the image but the robot “misses” the docking station by a consistent offset—often an extrinsics rotation/translation bias.

Compounding transforms in world integration

When you compute p_W = T_WB * T_BC * p_C, each component contributes uncertainty:

T_WB uncertainty from localization drift or SLAM.
T_BC uncertainty from mounting and extrinsic calibration.
p_C uncertainty from pixel noise, depth noise, and intrinsic calibration.

Even small angular errors in T_BC can dominate at longer ranges because they rotate rays. This is why careful extrinsic calibration is crucial for long-range perception and precise navigation near obstacles.

Maintaining Calibration Over Time (Robust Robotics Practice)

Mechanical rigidity and repeatability

Use a rigid mount with minimal flex; avoid long cantilevered brackets.
Use thread-locking compounds where appropriate; mark screw positions with paint to detect loosening.
Add mechanical locating features (dowel pins, keyed mounts) so reassembly is repeatable.

Temperature and vibration effects

Temperature changes can slightly alter focus and effective intrinsics, especially with plastic lenses or non-locked focus rings.
Vibration can shift extrinsics over time; periodic checks are essential for mobile robots.

Lens and focus changes

Any of the following can invalidate intrinsics:

Changing lens, adding/removing filters, or changing zoom.
Changing focus (even small changes can affect distortion and focal length in pixel units).
Changing resolution or applying digital scaling/cropping in the driver.

Operational rule: treat each unique combination of (camera, lens, resolution, focus/zoom) as a separate intrinsic calibration profile.

Operational checks and recalibration triggers

Quick undistort check: periodically image a scene with straight edges; if curvature appears, re-check intrinsics.
Fiducial consistency check: place a marker at a known location; verify estimated pose in base/world is within tolerance.
After maintenance: recalibrate extrinsics after any camera remounting, collision, or bracket adjustment.

Versioning and traceability

Store calibration as versioned artifacts:

Intrinsic YAML/JSON with K, distortion, resolution, timestamp, camera ID.
Extrinsic transform with frame IDs (base, camera), rotation representation (quaternion or RPY), timestamp, and how it was obtained.

This allows you to correlate behavior regressions (e.g., docking failures) with calibration changes and to roll back to a known-good configuration.

Practical Reference: Common Data You Should Record

Item	Why it matters	Example
Camera ID / serial	Ensures calibration matches hardware	SN: 12345
Resolution	Intrinsics depend on it	1280×720
Intrinsics `K`	Pixel-to-ray mapping	`fx=910, fy=905, cx=640, cy=360`
Distortion	Corrects edge warping	`k1, k2, p1, p2, k3`
`T_BC`	Camera mounting pose	translation + quaternion
Method + dataset	Reproducibility	ChArUco, 45 images
Validation metrics	Detect drift	mean reprojection error

Now answer the exercise about the content:

When converting a 3D point measured in the camera frame into the world frame, which transform chain is consistent with the stated naming convention?

You are right! Congratulations, now go to the next page

You missed! Try again.

To express a camera-frame point in the world frame, first map from camera to base using T_BC, then from base to world using T_WB, giving p_W = T_WB * T_BC * p_C.