Skip to content

Visual Odometry

Deep Dive

Visual Odometry (VO) is the process of estimating a robot’s egomotion (position and orientation) using only camera images. Unlike full SLAM, VO focuses solely on motion estimation without building a persistent map—it outputs incremental 6-DoF pose updates, similar to wheel encoders but using vision.

VO vs SLAM vs VIO

SystemMap BuildingIMU FusionDrift Correction
Visual OdometryNoNoNo
Visual SLAMYesOptionalYes (loop closure)
Visual-Inertial Odometry (VIO)NoYesPartial (IMU helps)
Visual Odometry: Sensors → [VO] → Pose only
Visual SLAM: Sensors → [SLAM] → Pose + Map + Loop Closure

Camera Configurations

Single camera setup.

Pros: Minimal hardware, low cost, lightweight Cons: Scale ambiguity (cannot determine absolute distances), requires motion for initialization

Use case: Drones where weight matters, simple robots

Algorithmic Approaches

Extract and track distinctive keypoints across frames.

Pipeline:

  1. Detect keypoints (ORB, FAST, SIFT)
  2. Compute descriptors
  3. Match features between frames
  4. Estimate motion from correspondences

Examples: ORB-SLAM (odometry component), RTAB-Map

Pros: Robust to moderate lighting changes, efficient Cons: Fails in textureless environments (blank walls)

Feature Extractors Comparison

ExtractorSpeedAccuracyNotes
FASTFastestLowerNo scale invariance
ORBFastGoodBest speed/accuracy tradeoff
SURFMediumHighScale/rotation invariant
SIFTSlowHighestMost robust, computationally expensive

Processing Pipeline

┌─────────────────────────────────────────────────────────────────┐
│ Visual Odometry Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. IMAGE ACQUISITION │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Camera │──►│ Undistort │ │
│ │ Frame(s) │ │ + Rectify │ │
│ └──────────────┘ └──────────────┘ │
│ │ │
│ 2. FEATURE PROCESSING ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Feature │──►│ Feature │──►│ Outlier │ │
│ │ Detection │ │ Matching │ │ Rejection │ │
│ │ (ORB, FAST) │ │ (BF/FLANN) │ │ (RANSAC) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ 3. MOTION ESTIMATION ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Pose │◄──│ Bundle │◄──│ Essential/ │ │
│ │ Output │ │ Adjustment │ │ Fundamental │ │
│ │ (6-DoF) │ │ (optional) │ │ Matrix │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Visual-Inertial Odometry (VIO)

VIO fuses camera images with IMU measurements for more robust tracking.

Coupling Approaches

ApproachDescriptionExamples
Loosely-coupledIndependent visual and inertial estimates, fused via Kalman filterROVIO
Tightly-coupledJoint optimization over all states (state-of-the-art)VINS-Mono, OpenVINS, MSCKF

IMU Preintegration

IMU measurements are summarized into single constraints between keyframes, reducing computational cost while maintaining accuracy. Essential for real-time operation.

┌────────────────┐
│ Stereo Camera │────┐
└────────────────┘ │ ┌─────────────┐
├───►│ Tightly- │───► Fused Pose
┌────────────────┐ │ │ Coupled │ (6-DoF)
│ IMU │────┘ │ Optimizer │
│ (200+ Hz) │ └─────────────┘
└────────────────┘

Challenges

Isaac ROS Visual SLAM (cuVSLAM)

NVIDIA’s GPU-accelerated solution supports both VO and full SLAM modes.

Key Features

  • Stereo visual-inertial odometry (SVIO)
  • Multi-camera support (up to 32 cameras / 16 stereo pairs)
  • Automatic IMU fallback when visual tracking fails
  • Sub-1% trajectory error on KITTI benchmark

Performance (Jetson Orin AGX)

  • Stereo mode: 2.7 ms per frame
  • Stereo-Inertial: 30 FPS camera, 200 Hz IMU

Launch Example

Terminal window
# Launch cuVSLAM with RealSense camera
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_realsense.launch.py
# Check odometry output
ros2 topic echo /visual_slam/tracking/odometry

Output topics:

  • /visual_slam/tracking/odometry — nav_msgs/Odometry (6-DoF pose)
  • /visual_slam/vis/observations_cloud — Feature point cloud
  • /visual_slam/status — Tracking status

ROS 2 Integration

RTAB-Map Visual Odometry

Terminal window
# Stereo odometry
ros2 launch rtabmap_ros stereo_odometry.launch.py \
left_image_topic:=/stereo/left/image_rect \
right_image_topic:=/stereo/right/image_rect \
left_camera_info_topic:=/stereo/left/camera_info \
right_camera_info_topic:=/stereo/right/camera_info

Sensor Fusion with robot_localization

Combine VO with wheel odometry and IMU for robust state estimation:

┌────────────────┐
│ Wheel Encoders │────┐
└────────────────┘ │ ┌─────────────┐
├───►│ EKF / │───► Fused Odometry
┌────────────────┐ │ │ UKF │
│ Visual Odometry│────┤ └─────────────┘
└────────────────┘ │
┌────────────────┐ │
│ IMU │────┘
└────────────────┘

Applications

DomainUse CaseWhy VO
Drones/UAVsAutonomous flightGPS-denied environments
Ground robotsNavigation, AGVsWheel slip compensation
Autonomous vehiclesSelf-drivingGPS complement, tunnels
AR/VRHead trackingLow latency, 6-DoF
Underwater robotsInspectionGPS unavailable

Prerequisites

Sources