All courses > Hobbies and Special Interests > Robotics and Drones ::

Robotics Vision Task: Line Detection and Following

Capítulo 10

Estimated reading time: 13 minutes

System overview: from camera frame to steering command

Line following is a closed-loop vision task: each camera frame produces an estimate of where the line is relative to the robot, and the controller converts that estimate into steering (and sometimes speed) commands. A practical pipeline has two parallel goals: (1) detect the line reliably under real-world variation, and (2) produce stable geometric measurements (lateral offset and heading) that behave smoothly over time.

Perception output: lateral error (how far the line is from the robot’s desired path) and heading error (how misaligned the robot is relative to the line direction).
Control output: steering angle (Ackermann) or differential wheel speeds (skid-steer), optionally speed modulation based on curvature/confidence.
Constraints: limited compute, motion blur, latency, and intermittent visibility (intersections, worn tape, shadows).

Camera mounting and viewpoint for line following

Mounting choices and their trade-offs

Forward-looking (shallow pitch): sees farther ahead, helps anticipate curves and intersections, but makes the line thinner and more sensitive to perspective and shadows.
Downward-looking (steeper pitch): simplifies geometry and segmentation, line appears thicker and more consistent, but reduces look-ahead distance (harder at higher speeds).

Practical mounting guidelines

Height: choose a height that yields a line width of at least ~5–15 pixels in the ROI at typical distance; too thin increases noise sensitivity.
Pitch: aim so the bottom of the image contains the near field (where control is most sensitive) and the mid-image contains look-ahead (for heading).
Roll: minimize roll; even small roll biases lateral error. If roll exists, compensate by rotating the image or adjusting the ROI.
Vibration: use rigid mounting and, if needed, short exposure or mechanical damping; vibration shows up as jitter in heading estimates.

ROI selection: focus compute where it matters

A region of interest (ROI) reduces false positives and computation. For line following, a common approach is a trapezoidal ROI covering the floor area where the line is expected.

Step-by-step ROI design

Start with a bottom band: e.g., bottom 30–50% of the image where the line is closest and largest.
Add look-ahead: include a mid-height band to estimate heading from the line direction.
Use a trapezoid mask: narrow at the top, wide at the bottom to match perspective and exclude irrelevant areas.
Dynamic ROI (optional): center the ROI around the previously detected line position to improve robustness and speed.

Keep two ROIs if helpful: a near ROI for lateral error and a far ROI for heading. This separation often improves stability because near pixels dominate offset while far pixels dominate direction.

Color/brightness normalization for stable thresholding

Even with fixed camera settings, floors vary in reflectance and shadows. Before thresholding, normalize brightness and reduce illumination sensitivity.

Practical normalization options

Use HSV and normalize V: apply a mild contrast stretch or CLAHE on the V channel to reduce shadow impact while keeping color separation.
White/black tape cases: for white tape on dark floor, normalize V and threshold high V; for black tape on bright floor, threshold low V.
Specular highlights: clamp extreme V values or apply a small median blur to reduce sparkles on glossy floors.

Keep normalization lightweight to preserve real-time performance and avoid amplifying noise. If your controller is sensitive to jitter, prefer gentle normalization over aggressive equalization.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Thresholding in HSV: segment the line

HSV thresholding is a common way to isolate colored tape (e.g., blue, red, green). For white/black lines, HSV still helps because you can combine constraints on saturation (S) and value (V).

Step-by-step HSV thresholding

Convert BGR/RGB to HSV.
Choose thresholds: define H_min..H_max, S_min..S_max, V_min..V_max for the line color.
Apply ROI mask: threshold only inside the ROI.
Morphology: use opening (remove specks) then closing (fill small gaps) with a kernel sized to the expected line width in pixels.

# Pseudocode (OpenCV-like) for a colored tape line in HSV within ROI mask
hsv = cvtColor(frame, BGR2HSV)
roi_hsv = hsv & roi_mask
mask = inRange(roi_hsv, (Hmin,Smin,Vmin), (Hmax,Smax,Vmax))
mask = morphology_open(mask, k=3)
mask = morphology_close(mask, k=5)

Threshold tuning tips

Shadows: widen V range downward; keep S constraint to avoid including gray floor.
Worn tape: widen S and V ranges slightly; rely on shape/continuity later to reject clutter.
Different floors: maintain a calibration routine that records HSV stats of the line under current lighting and updates thresholds within safe bounds.

Edge detection as a complementary cue

When color segmentation is unreliable (e.g., white tape on light floor), edges can help. You can run edge detection on a normalized grayscale image and then fit a line to edge pixels. Often, the best approach is to combine cues: use HSV mask to limit where you look for edges.

Practical approach

Compute grayscale in ROI, apply mild blur.
Run Canny edge detection.
Optionally AND edges with a relaxed HSV mask to reduce false edges from texture.

gray = toGray(frame)
gray_roi = gray & roi_mask
gray_roi = gaussianBlur(gray_roi, k=5)
edges = canny(gray_roi, t1, t2)
edges = edges & relaxed_color_mask

Line fitting: Hough transform vs contour-based fitting

Option A: Hough transform (good for clear edges)

Hough is effective when the line produces strong, continuous edges. Use probabilistic Hough to get line segments and then select/merge the ones consistent with your expected geometry.

Input: edge image (Canny output).
Output: segments (x1,y1,x2,y2).
Selection: prefer segments near the previous line position, with plausible slope, and with sufficient length.

segments = HoughLinesP(edges, rho=1, theta=pi/180, threshold=50,
                       minLineLength=30, maxLineGap=20)
# Choose best segment(s) by score: length + proximity to last estimate + slope constraint

Option B: Contour-based centerline (good for thick tape masks)

If you have a solid binary mask of the tape, contour-based methods are often more stable than Hough. You can find the largest contour in the ROI, compute its centroid for lateral error, and fit a line to its pixels for heading.

Step-by-step

Find connected components or contours in the binary mask.
Filter by area, aspect ratio, and position (reject small blobs).
Pick the best candidate: largest area or closest to previous centroid.
Compute centroid (c_x, c_y) using image moments.
Fit a line using least squares (e.g., fitLine) to contour points to get direction.

contours = findContours(mask)
candidates = [c for c in contours if area(c) > A_min]
line_blob = select_best(candidates, prev_cx)
(cx, cy) = centroid(line_blob)
(vx, vy, x0, y0) = fitLine(points(line_blob))  # direction (vx,vy) and point (x0,y0)

Which should you choose?

Situation	Prefer	Why
Colored tape with clean segmentation	Contour-based	Stable centroid and direction from dense pixels
Thin painted line with strong contrast edges	Hough	Works directly on edges even if mask is weak
Textured floor causing many edges	Contour-based + strong ROI	Mask reduces clutter; contours reject scattered edges

Estimating lateral offset and heading from image measurements

Lateral error (pixel domain)

A simple and effective lateral error is the horizontal difference between the line position and the image center at a chosen reference row (usually near the bottom of the ROI).

Pick a reference y-coordinate y_ref (near field).
Compute the line x-position at y_ref (from centroid, fitted line, or scanline peak).
Pixel lateral error: e_x = x_line(y_ref) - x_center.

If you use a fitted line with point-direction form (x0,y0) and (vx,vy), compute intersection with y=y_ref:

# Solve y_ref = y0 + t*vy  => t = (y_ref - y0)/vy
x_ref = x0 + ((y_ref - y0)/vy)*vx
e_x = x_ref - x_center

Heading error (image domain)

Heading error estimates how rotated the line is relative to the robot’s forward direction in the image. If the line direction vector is (vx, vy), the angle in image coordinates is:

theta_line = atan2(vx, vy)  # note swapped order due to image y axis pointing down

Interpretation: if the line leans to the right as it goes away, the robot typically needs to steer right (sign depends on your coordinate convention). Validate sign by placing the robot slightly left of the line and checking that the computed command steers toward the line.

Converting pixels to metric errors (optional but useful)

You can control directly in pixel units, but converting to meters can make tuning more portable across cameras/resolutions. A practical approximation uses a scale factor at the reference row:

Measure how many pixels correspond to a known width on the floor at y_ref (e.g., tape width).
Compute meters_per_pixel(y_ref) and convert e_yaw and e_lat accordingly.

e_lat_m = e_x * meters_per_pixel_at_yref

Even if you keep heading in radians and lateral in pixels, ensure consistent scaling in the controller gains.

From vision errors to control signals

Differential drive (skid-steer) mapping

A common approach is to compute an angular velocity command from a weighted sum of lateral and heading errors, then convert to left/right wheel speeds.

# e_lat: lateral error (pixels or meters), e_head: heading error (radians)
omega = K_lat * e_lat + K_head * e_head
v = v_base * speed_schedule(confidence, curvature)

v_left  = v - (wheel_base/2) * omega
v_right = v + (wheel_base/2) * omega

Ackermann steering mapping

For car-like robots, map errors to a steering angle. Keep steering bounded and consider reducing speed when curvature is high or confidence is low.

delta = clamp(K_lat * e_lat + K_head * e_head, -delta_max, delta_max)
v = v_base * speed_schedule(confidence, |delta|)

Look-ahead based control (more stable at speed)

Instead of using only near-field offset, compute a target point on the line at some look-ahead distance in the image (higher y in the ROI). Steer to minimize the angle to that point. This reduces oscillation because the robot aims smoothly toward where the line is going.

Choose y_look in the far ROI.
Compute x_look = x_line(y_look).
Define a target vector from image center to (x_look, y_look) and convert to a steering command.

Controller tuning with vision latency in mind

Vision introduces delay: exposure time, processing time, and actuation update rate. Delay reduces stability margin and can cause oscillation if gains are too aggressive.

Measure latency and update rate

Frame-to-command latency: timestamp camera capture and the moment you publish motor command; compute average and worst-case.
Control rate: ensure the controller runs at a consistent rate; jitter behaves like variable delay.

Tuning procedure (practical)

Start slow: set a low base speed so the robot can correct without overshoot.
Use heading first: increase K_head until the robot aligns with the line direction without oscillation.
Add lateral correction: increase K_lat until it converges to the line center; if it oscillates, reduce K_lat or increase look-ahead.
Account for delay: if latency is high, reduce gains and/or reduce speed; consider filtering errors with a small low-pass filter to reduce jitter-driven oscillation.
Speed scheduling: increase speed only when confidence is high and curvature is low; reduce speed near intersections or when the line is partially lost.

Simple filtering that helps control

Apply temporal smoothing to the measured errors, but keep it light to avoid adding more delay.

e_lat_f  = (1-a)*e_lat_f  + a*e_lat
e_head_f = (1-a)*e_head_f + a*e_head
# a in [0.1, 0.4] often works; tune based on noise and latency

Robustness techniques for real environments

Handling intersections and branches

At T or X intersections, the “largest blob” may suddenly change shape, and Hough may return multiple strong segments. Decide behavior explicitly rather than hoping the detector picks the right one.

Detect intersection: sudden increase in mask area, multiple competing line directions, or a wide horizontal component.
Policy: go straight, turn left/right, or follow a predefined route based on higher-level navigation.
Temporal consistency: prefer the candidate whose position/direction is closest to the previous estimate unless an intersection is detected.

Worn tape and gaps

Close small gaps: morphological closing sized to bridge typical wear gaps.
Use continuity: fit a line to all inlier pixels using RANSAC-style rejection of outliers (or robust fitting) so missing segments do not dominate.
Confidence score: based on inlier count, contour area, or segment length; reduce speed when confidence drops.

Shadows and lighting gradients

Prefer S over V for colored tape: shadows reduce V but often keep hue/saturation relatively informative.
Adaptive V thresholds: compute V statistics in the ROI and adjust V bounds within limits.
Shadow edges: if using edges, restrict to areas supported by color mask or by expected line width.

Varying floor textures and clutter

Stronger ROI: exclude regions where texture is heavy (e.g., near walls) and focus on the expected path corridor.
Shape constraints: enforce plausible line width in pixels and reject blobs that are too wide/narrow.
Model-based tracking: maintain a predicted line position from last frame and search locally (reduces false positives from texture).

Confidence estimation and loss handling

Always compute a confidence value and use it in both control and safety logic.

Confidence examples: contour area above threshold, number of inlier pixels, Hough segment length, consistency with last frame (small jump in e_lat and e_head).
Degrade gracefully: when confidence drops, reduce speed and rely more on heading memory (short-term) rather than noisy measurements.

Practical validation: test patterns, metrics, and fail-safes

Test patterns to validate perception and control

Straight line: verify steady-state lateral error near zero and minimal oscillation.
Gentle curve (large radius): verify heading estimation and look-ahead behavior.
Sharp curve (small radius): test speed scheduling and steering saturation.
Broken line / worn segments: test gap handling and confidence-based slowdown.
Intersection (T/X): test intersection detection and branch policy.
Shadow band across the line: test normalization and threshold robustness.
Texture patch: place patterned mat near the line to test false positives.

Metrics to record

Tracking error: RMS and max of lateral error (pixels or meters) over a run.
Heading error: RMS and max of heading error (radians or degrees).
Overshoot: peak lateral error after a step disturbance (e.g., start offset from the line).
Settling time: time to return within a tolerance band (e.g., ±5 px or ±1 cm).
Line-loss rate: fraction of frames with confidence below threshold; also measure longest continuous loss duration.
Latency stats: average and 95th percentile frame-to-command delay.

Fail-safe behaviors when the line is lost

Immediate slowdown: if confidence drops below C_low, reduce speed to a safe crawl.
Short-term dead reckoning: for a brief window (e.g., 0.2–0.5 s), keep steering based on last reliable heading to bridge small gaps.
Search behavior: if loss persists, execute a controlled scan (small alternating turns) while keeping speed low, and expand ROI gradually.
Stop condition: if the line is not reacquired within a timeout or the robot approaches a boundary, stop and signal for assistance.

Now answer the exercise about the content:

Why might you keep two separate regions of interest (ROIs)—a near ROI and a far ROI—when following a line?

You are right! Congratulations, now go to the next page

You missed! Try again.

Using a near ROI emphasizes close pixels that most affect lateral offset, while a far ROI emphasizes look-ahead pixels that better capture line direction. Separating them often yields smoother, more stable error estimates.