Feature Detection and Tracking: Corners, Edges, and Optical Flow

Capítulo 7

Estimated reading time: 9 minutes

+ Exercise

What Local Features Are (and Why Robots Care)

Local features are distinctive patterns in a small neighborhood of an image that can be detected reliably and matched or tracked over time. Instead of reasoning about every pixel, a robot can focus on a sparse set of informative points or structures that are easier to follow across frames.

Robots use local features because they support three core capabilities:

  • Localization: match features against a map or previous keyframes to estimate where the robot is.
  • Tracking: keep a stable lock on a target (a tool, a docking marker, a person) by following features on it.
  • Motion estimation: infer camera/robot motion from how features move between frames (a foundation of visual odometry).

In practice, features come in different “shapes”:

  • Edges: intensity changes along a line (useful for boundaries and structure).
  • Corners: intensity changes in two directions (often the most trackable points).
  • Descriptors: compact signatures that let you re-identify a feature later, even if the view changes.

Edge Detection: Sobel and Canny

Sobel: Fast Gradient-Based Edges

Sobel estimates image gradients in the horizontal and vertical directions. The gradient magnitude highlights strong intensity changes, which often correspond to object boundaries.

Key outputs:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • Gx, Gy: gradients
  • |G| = sqrt(Gx^2 + Gy^2): edge strength
  • atan2(Gy, Gx): edge orientation

Robotics use: quick structural cues (walls, lane markings, shelf edges) and as a pre-step for line extraction.

Canny: Cleaner Edges with Better Localization

Canny is a multi-stage edge detector designed to produce thin, well-localized edges while suppressing noise responses. It typically includes gradient computation, non-maximum suppression (thin edges), and hysteresis thresholding (connect strong edges through weak-but-consistent ones).

Practical knobs:

  • Low/high thresholds: control edge connectivity and noise sensitivity.
  • Optional smoothing: reduces spurious edges but can blur fine detail.

Robotics use: stable boundaries for mapping indoor structure, detecting object outlines, and supporting geometric fitting (lines, rectangles).

Corner Detection: Harris and Shi–Tomasi

Corners are generally better than pure edges for tracking because they have a unique local appearance in two directions. Along a straight edge, many points look similar if you slide along the edge (the “aperture problem”), making motion ambiguous. Corners reduce that ambiguity.

Harris Corner Detector

Harris measures how much the image changes when a small window shifts in different directions. If the window shift causes a large change in intensity for shifts in both x and y, the point is corner-like.

Intuition via the structure tensor (second-moment matrix):

M = [ sum(Ix^2)  sum(IxIy) ]
    [ sum(IxIy)  sum(Iy^2) ]

Harris response (one common form):

R = det(M) - k * (trace(M))^2

Interpretation:

  • Flat region: small gradients → small eigenvalues → low response
  • Edge: one strong direction → one large eigenvalue, one small → moderate/low response
  • Corner: strong in both directions → two large eigenvalues → high response

Shi–Tomasi (Good Features to Track)

Shi–Tomasi refines the corner score by using the minimum eigenvalue of M as the quality measure. It often produces points that track more reliably in practice.

Why it’s popular in robotics pipelines:

  • Directly tuned for trackability
  • Works well with optical flow trackers
  • Produces a controllable number of strong points

Descriptors: ORB as a Practical Option

Detection finds where features are; descriptors help you recognize them again later (across frames, viewpoints, or after temporary occlusion). For many robotics systems, a practical balance is needed: robust enough for real scenes, fast enough for real time.

ORB in a Nutshell

ORB (Oriented FAST and Rotated BRIEF) is widely used because it is efficient and works well on CPU.

  • Keypoint detection: FAST-like corner detection with a multi-scale pyramid.
  • Orientation: assigns a dominant direction to make the descriptor rotation-aware.
  • Descriptor: BRIEF-style binary tests → compact binary string.

Binary descriptors enable fast matching using Hamming distance, which is computationally attractive for embedded robots.

When to Use Descriptors vs. Pure Tracking

  • Optical flow tracking: best for short-term, frame-to-frame motion (high rate, small displacements).
  • Descriptor matching: best for re-detection, loop closure, or recovering after occlusion/blur.

Tracking via Optical Flow: Lucas–Kanade

Optical flow estimates how image points move between consecutive frames. The Lucas–Kanade (LK) method assumes small motion and approximately constant brightness in a local neighborhood, then solves for the displacement that best explains the observed intensity changes.

Core Assumptions (and What They Mean for Robots)

  • Small inter-frame motion: high frame rate or slow motion improves tracking.
  • Brightness constancy: sudden lighting changes can break tracks.
  • Local coherence: nearby pixels share similar motion (works well on rigid objects).

Pyramidal Lucas–Kanade for Larger Motion

To handle larger displacements, LK is often run on an image pyramid: estimate motion at low resolution (large apparent motion becomes smaller), then refine at higher resolutions.

Practical knobs:

  • Window size: larger windows handle noise and texture-poor areas but may blur motion boundaries.
  • Pyramid levels: more levels handle faster motion but cost time.
  • Termination criteria: iterations vs. accuracy trade-off.

Step-by-Step Pipeline: From Features to Motion Cues

This structure is common in robotics systems that need real-time motion cues and stable tracking.

Step 1 — Detect Candidate Features

Pick a detector based on the task:

  • Tracking-focused: Shi–Tomasi corners (often paired with LK).
  • Matching-focused: ORB keypoints (with ORB descriptors).
  • Structure-focused: Canny edges (for boundaries and geometry fitting).

Implementation tip: enforce spatial diversity by limiting features per grid cell so you don’t get all points on one textured patch.

Step 2 — Filter by Quality and Geometry

Not all detected points are worth tracking. Common filters:

  • Quality threshold: keep only points above a corner score or response threshold.
  • Non-maximum suppression: avoid clusters of nearly identical points.
  • Border margin: discard points too close to image edges (they may leave the frame quickly).
  • Minimum distance: ensure features are spread out for better motion estimation.

Step 3 — Track Across Frames

Use pyramidal Lucas–Kanade to track each feature from frame t to t+1. The tracker returns:

  • New positions for each feature
  • Status flag (tracked or lost)
  • Error/score (how consistent the match was)

Practical step: immediately drop tracks with bad status or high error to prevent contaminating later estimation.

Step 4 — Reject Outliers with RANSAC

Even with good tracking, some correspondences will be wrong due to occlusion, repeated textures, reflections, or motion boundaries. RANSAC robustly fits a motion model while ignoring outliers.

Choose a model based on expected motion:

  • 2D translation/affine: for small planar motion or image stabilization tasks.
  • Homography: when most points lie on a plane (e.g., floor, wall, tabletop) or camera rotates around its center.
  • Essential/Fundamental matrix: for general 3D scenes and camera motion (used in visual odometry).

RANSAC workflow:

  • Randomly sample minimal point sets
  • Fit the model
  • Count inliers (points consistent with the model within a threshold)
  • Keep the model with the most inliers and refine using all inliers

Step 5 — Compute Motion Cues

Once you have inlier tracks, you can compute useful cues:

  • Average flow vector: rough camera motion direction in the image.
  • Rotation vs. translation hints: patterns like radial expansion (forward motion) or uniform shift (lateral motion).
  • Time-to-contact proxy: rapid expansion of features can indicate approaching obstacles.
  • Stability metrics: inlier ratio and reprojection error indicate confidence.
OutputHow it’s computedRobotics use
Inlier tracksLK + RANSAC filteringReliable tracking and motion estimation
2D motion modelAffine/homography fitVideo stabilization, target lock stabilization
Epipolar geometryEssential/fundamental matrixVisual odometry, pose change estimation
Flow statisticsMean/variance of inlier flowObstacle cues, motion anomaly detection

Robotics Connections

Visual Odometry Basics (Feature-Based)

A common feature-based visual odometry loop uses corners or ORB features to estimate camera motion between frames:

  • Detect features (often ORB or Shi–Tomasi)
  • Track (LK) or match (ORB descriptors)
  • Use RANSAC to estimate a geometric model (often essential matrix in general 3D scenes)
  • Recover relative motion (up to scale for monocular setups) and integrate over time

Practical note: feature tracking (LK) is often used for short-term motion because it’s fast; descriptors are often used to re-associate features when tracking fails or when revisiting areas.

Stabilizing a Target Lock

Suppose a robot must keep a camera centered on a moving object (e.g., a docking handle). A robust approach is to track multiple features on the target region:

  • Initialize by detecting corners inside a bounding box
  • Track corners with LK
  • Use RANSAC to fit a 2D transform (translation/affine) between frames
  • Update the target center from the transform and drive gimbal/robot control

This multi-point approach is more stable than tracking a single point because individual points can be lost or corrupted, while the consensus motion remains usable.

Motion-Based Obstacle Cues

Even without full 3D reconstruction, optical flow can provide obstacle-related signals:

  • Expansion: features spreading outward from a focus point can indicate forward motion toward a surface.
  • Flow magnitude spikes: unusually large flow in a region can indicate a nearby object or relative motion (e.g., a crossing pedestrian).
  • Motion boundaries: sharp changes in flow can indicate object edges or independently moving objects.

In practice, these cues are often combined with geometric filtering (RANSAC) to separate dominant ego-motion from independently moving objects.

Limitations and Failure Modes (What to Watch For)

Low Texture

Blank walls, uniform floors, fog, or defocus produce few reliable corners. Edge detectors may still find boundaries, but tracking becomes sparse and unstable. Mitigations include seeking textured regions, using larger LK windows (with care), or switching to complementary sensors.

Repetitive Patterns

Grids, tiles, and fences can create ambiguous matches: many locations look similar. This can cause incorrect correspondences that pass naive checks. RANSAC helps, but if most points are ambiguous, even robust fitting can fail. Enforcing spatial diversity and using stronger descriptors can reduce risk.

Fast Motion and Motion Blur

LK assumes small motion; fast camera movement can exceed the pyramid’s capture range, and blur destroys the local structure needed for corners and descriptors. Mitigations: higher frame rate, shorter exposure, more pyramid levels, and re-detection when track count drops.

Illumination Changes

Brightness constancy is violated by flicker, shadows, specular highlights, and auto-exposure changes. Symptoms include drifting tracks and sudden track loss. Practical mitigations: track with robust error thresholds, refresh features frequently, and prefer descriptors or normalized patches when lighting varies.

Occlusions and Dynamic Scenes

Features can disappear behind objects or move independently (people, vehicles). Without outlier rejection, these points corrupt motion estimation. RANSAC and inlier ratio monitoring are essential; when inlier ratios collapse, reinitialize detection and consider segmenting moving objects.

Now answer the exercise about the content:

A robot tracks many corner features between frames using Lucas–Kanade, but some tracks are wrong due to occlusion and repeated textures. What is the main purpose of applying RANSAC next?

You are right! Congratulations, now go to the next page

You missed! Try again.

RANSAC repeatedly fits a motion model from random minimal samples and keeps the model with the most inliers, filtering out bad tracks caused by occlusion, reflections, or repeated patterns.

Next chapter

From Pixels to Objects: Segmentation and Object Detection Concepts

Arrow Right Icon
Free Ebook cover Computer Vision for Robotics: A Beginner’s Guide to Seeing and Understanding
50%

Computer Vision for Robotics: A Beginner’s Guide to Seeing and Understanding

New course

14 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.