Visual Odometry

Deep Dive

Visual Odometry (VO) is the process of estimating a robot’s egomotion (position and orientation) using only camera images. Unlike full SLAM, VO focuses solely on motion estimation without building a persistent map—it outputs incremental 6-DoF pose updates, similar to wheel encoders but using vision.

VO vs SLAM vs VIO

System	Map Building	IMU Fusion	Drift Correction
Visual Odometry	No	No	No
Visual SLAM	Yes	Optional	Yes (loop closure)
Visual-Inertial Odometry (VIO)	No	Yes	Partial (IMU helps)

Visual Odometry:     Sensors → [VO] → Pose only
Visual SLAM:         Sensors → [SLAM] → Pose + Map + Loop Closure

Camera Configurations

Single camera setup.

Pros: Minimal hardware, low cost, lightweight Cons: Scale ambiguity (cannot determine absolute distances), requires motion for initialization

Use case: Drones where weight matters, simple robots

Extract and track distinctive keypoints across frames.

Pipeline:

Detect keypoints (ORB, FAST, SIFT)
Compute descriptors
Match features between frames
Estimate motion from correspondences

Examples: ORB-SLAM (odometry component), RTAB-Map

Pros: Robust to moderate lighting changes, efficient Cons: Fails in textureless environments (blank walls)

Feature Extractors Comparison

Extractor	Speed	Accuracy	Notes
FAST	Fastest	Lower	No scale invariance
ORB	Fast	Good	Best speed/accuracy tradeoff
SURF	Medium	High	Scale/rotation invariant
SIFT	Slow	Highest	Most robust, computationally expensive

Processing Pipeline

┌─────────────────────────────────────────────────────────────────┐
│                    Visual Odometry Pipeline                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. IMAGE ACQUISITION                                           │
│  ┌──────────────┐   ┌──────────────┐                           │
│  │   Camera     │──►│  Undistort   │                           │
│  │   Frame(s)   │   │  + Rectify   │                           │
│  └──────────────┘   └──────────────┘                           │
│                            │                                    │
│  2. FEATURE PROCESSING     ▼                                    │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │   Feature    │──►│   Feature    │──►│   Outlier    │        │
│  │  Detection   │   │   Matching   │   │  Rejection   │        │
│  │ (ORB, FAST)  │   │  (BF/FLANN)  │   │  (RANSAC)    │        │
│  └──────────────┘   └──────────────┘   └──────────────┘        │
│                                               │                 │
│  3. MOTION ESTIMATION                         ▼                 │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │    Pose      │◄──│   Bundle     │◄──│   Essential/ │        │
│  │   Output     │   │  Adjustment  │   │  Fundamental │        │
│  │   (6-DoF)    │   │  (optional)  │   │    Matrix    │        │
│  └──────────────┘   └──────────────┘   └──────────────┘        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Visual-Inertial Odometry (VIO)

VIO fuses camera images with IMU measurements for more robust tracking.

Coupling Approaches

Approach	Description	Examples
Loosely-coupled	Independent visual and inertial estimates, fused via Kalman filter	ROVIO
Tightly-coupled	Joint optimization over all states (state-of-the-art)	VINS-Mono, OpenVINS, MSCKF

IMU Preintegration

IMU measurements are summarized into single constraints between keyframes, reducing computational cost while maintaining accuracy. Essential for real-time operation.

┌────────────────┐
│ Stereo Camera  │────┐
└────────────────┘    │    ┌─────────────┐
                      ├───►│  Tightly-   │───► Fused Pose
┌────────────────┐    │    │  Coupled    │     (6-DoF)
│      IMU       │────┘    │  Optimizer  │
│  (200+ Hz)     │         └─────────────┘
└────────────────┘

Challenges

Isaac ROS Visual SLAM (cuVSLAM)

NVIDIA’s GPU-accelerated solution supports both VO and full SLAM modes.

Key Features

Stereo visual-inertial odometry (SVIO)
Multi-camera support (up to 32 cameras / 16 stereo pairs)
Automatic IMU fallback when visual tracking fails
Sub-1% trajectory error on KITTI benchmark

Performance (Jetson Orin AGX)

Stereo mode: 2.7 ms per frame
Stereo-Inertial: 30 FPS camera, 200 Hz IMU

Launch Example

# Launch cuVSLAM with RealSense camera
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_realsense.launch.py

# Check odometry output
ros2 topic echo /visual_slam/tracking/odometry

Output topics:

/visual_slam/tracking/odometry — nav_msgs/Odometry (6-DoF pose)
/visual_slam/vis/observations_cloud — Feature point cloud
/visual_slam/status — Tracking status

ROS 2 Integration

RTAB-Map Visual Odometry

# Stereo odometry
ros2 launch rtabmap_ros stereo_odometry.launch.py \
    left_image_topic:=/stereo/left/image_rect \
    right_image_topic:=/stereo/right/image_rect \
    left_camera_info_topic:=/stereo/left/camera_info \
    right_camera_info_topic:=/stereo/right/camera_info

Sensor Fusion with robot_localization

Combine VO with wheel odometry and IMU for robust state estimation:

┌────────────────┐
│ Wheel Encoders │────┐
└────────────────┘    │    ┌─────────────┐
                      ├───►│   EKF /     │───► Fused Odometry
┌────────────────┐    │    │   UKF       │
│ Visual Odometry│────┤    └─────────────┘
└────────────────┘    │
                      │
┌────────────────┐    │
│      IMU       │────┘
└────────────────┘

Applications

Domain	Use Case	Why VO
Drones/UAVs	Autonomous flight	GPS-denied environments
Ground robots	Navigation, AGVs	Wheel slip compensation
Autonomous vehicles	Self-driving	GPS complement, tunnels
AR/VR	Head tracking	Low latency, 6-DoF
Underwater robots	Inspection	GPS unavailable

Prerequisites

Cameras Camera types and calibration fundamentals

IMU Inertial measurement for VIO fusion

Coordinate Frames Understanding spatial transforms

SLAM Full SLAM with mapping and loop closure

Isaac ROS GPU-accelerated perception on Jetson

Nav2 ROS 2 navigation using odometry

TF2 Transform broadcasting for robot frames

Sources

Isaac ROS Visual SLAM — cuVSLAM documentation, multi-camera support, KITTI benchmarks
cuVSLAM Concepts — GPU-accelerated VO/SLAM architecture
Visual Odometry Tutorial (Scaramuzza) — Foundational VO theory and algorithms
RTAB-Map ROS 2 — Visual odometry strategies and ROS 2 integration
OpenVINS — Open-source VIO with active ROS 2 support
robot_localization — EKF/UKF sensor fusion for ROS 2