Visual Odometry
Visual Odometry (VO) is the process of estimating a robot’s egomotion (position and orientation) using only camera images. Unlike full SLAM, VO focuses solely on motion estimation without building a persistent map—it outputs incremental 6-DoF pose updates, similar to wheel encoders but using vision.
VO vs SLAM vs VIO
| System | Map Building | IMU Fusion | Drift Correction |
|---|---|---|---|
| Visual Odometry | No | No | No |
| Visual SLAM | Yes | Optional | Yes (loop closure) |
| Visual-Inertial Odometry (VIO) | No | Yes | Partial (IMU helps) |
Visual Odometry: Sensors → [VO] → Pose onlyVisual SLAM: Sensors → [SLAM] → Pose + Map + Loop ClosureCamera Configurations
Single camera setup.
Pros: Minimal hardware, low cost, lightweight Cons: Scale ambiguity (cannot determine absolute distances), requires motion for initialization
Use case: Drones where weight matters, simple robots
Two cameras with known baseline.
Pros: Metric scale from triangulation, more robust Cons: Higher cost, calibration required, degrades to monocular at long distances
Use case: Ground robots, autonomous vehicles
RGB camera + depth sensor.
Pros: Direct depth measurement, simplified algorithm Cons: Limited range (~4m for structured light), affected by sunlight
Use case: Indoor robots, manipulation
Algorithmic Approaches
Extract and track distinctive keypoints across frames.
Pipeline:
- Detect keypoints (ORB, FAST, SIFT)
- Compute descriptors
- Match features between frames
- Estimate motion from correspondences
Examples: ORB-SLAM (odometry component), RTAB-Map
Pros: Robust to moderate lighting changes, efficient Cons: Fails in textureless environments (blank walls)
Use raw pixel intensities instead of features.
Pipeline:
- Select pixels with high gradient
- Minimize photometric error between frames
- Joint optimization of pose and depth
Examples: LSD-SLAM, DSO (Direct Sparse Odometry)
Pros: Works in low-texture environments, can produce dense depth Cons: Sensitive to lighting changes, exposure variations
Feature Extractors Comparison
| Extractor | Speed | Accuracy | Notes |
|---|---|---|---|
| FAST | Fastest | Lower | No scale invariance |
| ORB | Fast | Good | Best speed/accuracy tradeoff |
| SURF | Medium | High | Scale/rotation invariant |
| SIFT | Slow | Highest | Most robust, computationally expensive |
Processing Pipeline
┌─────────────────────────────────────────────────────────────────┐│ Visual Odometry Pipeline │├─────────────────────────────────────────────────────────────────┤│ ││ 1. IMAGE ACQUISITION ││ ┌──────────────┐ ┌──────────────┐ ││ │ Camera │──►│ Undistort │ ││ │ Frame(s) │ │ + Rectify │ ││ └──────────────┘ └──────────────┘ ││ │ ││ 2. FEATURE PROCESSING ▼ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Feature │──►│ Feature │──►│ Outlier │ ││ │ Detection │ │ Matching │ │ Rejection │ ││ │ (ORB, FAST) │ │ (BF/FLANN) │ │ (RANSAC) │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │ ││ 3. MOTION ESTIMATION ▼ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Pose │◄──│ Bundle │◄──│ Essential/ │ ││ │ Output │ │ Adjustment │ │ Fundamental │ ││ │ (6-DoF) │ │ (optional) │ │ Matrix │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Visual-Inertial Odometry (VIO)
VIO fuses camera images with IMU measurements for more robust tracking.
Coupling Approaches
| Approach | Description | Examples |
|---|---|---|
| Loosely-coupled | Independent visual and inertial estimates, fused via Kalman filter | ROVIO |
| Tightly-coupled | Joint optimization over all states (state-of-the-art) | VINS-Mono, OpenVINS, MSCKF |
IMU Preintegration
IMU measurements are summarized into single constraints between keyframes, reducing computational cost while maintaining accuracy. Essential for real-time operation.
┌────────────────┐│ Stereo Camera │────┐└────────────────┘ │ ┌─────────────┐ ├───►│ Tightly- │───► Fused Pose┌────────────────┐ │ │ Coupled │ (6-DoF)│ IMU │────┘ │ Optimizer ││ (200+ Hz) │ └─────────────┘└────────────────┘Challenges
Isaac ROS Visual SLAM (cuVSLAM)
NVIDIA’s GPU-accelerated solution supports both VO and full SLAM modes.
Key Features
- Stereo visual-inertial odometry (SVIO)
- Multi-camera support (up to 32 cameras / 16 stereo pairs)
- Automatic IMU fallback when visual tracking fails
- Sub-1% trajectory error on KITTI benchmark
Performance (Jetson Orin AGX)
- Stereo mode: 2.7 ms per frame
- Stereo-Inertial: 30 FPS camera, 200 Hz IMU
Launch Example
# Launch cuVSLAM with RealSense cameraros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_realsense.launch.py
# Check odometry outputros2 topic echo /visual_slam/tracking/odometryOutput topics:
/visual_slam/tracking/odometry— nav_msgs/Odometry (6-DoF pose)/visual_slam/vis/observations_cloud— Feature point cloud/visual_slam/status— Tracking status
ROS 2 Integration
RTAB-Map Visual Odometry
# Stereo odometryros2 launch rtabmap_ros stereo_odometry.launch.py \ left_image_topic:=/stereo/left/image_rect \ right_image_topic:=/stereo/right/image_rect \ left_camera_info_topic:=/stereo/left/camera_info \ right_camera_info_topic:=/stereo/right/camera_infoSensor Fusion with robot_localization
Combine VO with wheel odometry and IMU for robust state estimation:
┌────────────────┐│ Wheel Encoders │────┐└────────────────┘ │ ┌─────────────┐ ├───►│ EKF / │───► Fused Odometry┌────────────────┐ │ │ UKF ││ Visual Odometry│────┤ └─────────────┘└────────────────┘ │ │┌────────────────┐ ││ IMU │────┘└────────────────┘Applications
| Domain | Use Case | Why VO |
|---|---|---|
| Drones/UAVs | Autonomous flight | GPS-denied environments |
| Ground robots | Navigation, AGVs | Wheel slip compensation |
| Autonomous vehicles | Self-driving | GPS complement, tunnels |
| AR/VR | Head tracking | Low latency, 6-DoF |
| Underwater robots | Inspection | GPS unavailable |
Prerequisites
Related Terms
Sources
- Isaac ROS Visual SLAM — cuVSLAM documentation, multi-camera support, KITTI benchmarks
- cuVSLAM Concepts — GPU-accelerated VO/SLAM architecture
- Visual Odometry Tutorial (Scaramuzza) — Foundational VO theory and algorithms
- RTAB-Map ROS 2 — Visual odometry strategies and ROS 2 integration
- OpenVINS — Open-source VIO with active ROS 2 support
- robot_localization — EKF/UKF sensor fusion for ROS 2