SLAM

Deep Dive

SLAM (Simultaneous Localization and Mapping) is the computational problem of constructing a map of an unknown environment while simultaneously tracking the robot’s location within it. It’s fundamental to autonomous navigation.

The SLAM Problem

                    ┌─────────────┐
    Sensors ───────►│    SLAM     │───────► Map
   (camera,         │  Algorithm  │───────► Robot Pose (x, y, θ)
    LiDAR,          └─────────────┘
    IMU)                  ▲
                          │
                    Odometry (wheel encoders, IMU)

Inputs

Sensor observations: What the robot sees (images, point clouds, depth)
Odometry: Motion estimates from wheels/IMU (often noisy)

Outputs

Map: Representation of the environment
Pose: Robot’s position and orientation in the map

Types of SLAM

Uses cameras as the primary sensor.

Approaches:

Feature-based: Extract and track keypoints (ORB-SLAM, VINS)
Direct: Use raw pixel intensities (LSD-SLAM, DSO)
Deep learning: Learned features and depth (DROID-SLAM)

Pros: Rich information, low-cost sensors, works indoors/outdoors Cons: Sensitive to lighting, texture-poor environments

NVIDIA Solution: Isaac ROS Visual SLAM (cuVSLAM)

Map Representations

Type	Description	Use Case
Occupancy Grid	2D/3D grid of occupied/free cells	Navigation, path planning
Point Cloud	Set of 3D points	3D reconstruction, dense mapping
Feature Map	Sparse 3D landmarks	Visual localization
Mesh	Triangulated surface	Simulation, visualization
TSDF	Truncated Signed Distance Field	Real-time 3D fusion
Neural	Learned implicit representation	NeRF-based mapping

The SLAM Pipeline

┌─────────────────────────────────────────────────────────────────┐
│                       SLAM Pipeline                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. FRONTEND (Real-time)                                        │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │   Feature    │──►│   Tracking   │──►│  Local Map   │        │
│  │  Extraction  │   │  (Frame-to-  │   │   Update     │        │
│  │              │   │   Frame)     │   │              │        │
│  └──────────────┘   └──────────────┘   └──────────────┘        │
│                                                                 │
│  2. BACKEND (Optimization)                                      │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │    Loop      │──►│    Bundle    │──►│  Global Map  │        │
│  │   Closure    │   │  Adjustment  │   │  Correction  │        │
│  │  Detection   │   │   / Pose     │   │              │        │
│  │              │   │   Graph      │   │              │        │
│  └──────────────┘   └──────────────┘   └──────────────┘        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Frontend

Runs at sensor rate (30+ Hz)
Extracts features, tracks motion
Builds local map incrementally

Backend

Runs asynchronously (1-10 Hz)
Detects loop closures (been here before?)
Optimizes full trajectory and map

Loop Closure

The key to drift-free SLAM:

Start ──► ──► ──► ──► ──► ──►
                              │
                     "I've been here!"
                              │
                              ▼
        ◄── ◄── ◄── ◄── Loop Closure ◄──

When the robot recognizes a previously visited location, it can correct accumulated drift by adding a constraint in the pose graph.

SLAM on NVIDIA Jetson

cuVSLAM

NVIDIA’s GPU-accelerated Visual SLAM in Isaac ROS. Supports multi-camera setups (up to 32 cameras) with IMU fusion. Best-in-class performance on KITTI benchmark.

nvblox

Real-time 3D reconstruction using TSDF fusion. Supports multi-sensor input (3D LiDAR + up to 3 cameras). Builds meshes and occupancy grids for Nav2 integration.

Isaac ROS Visual SLAM Example

# Launch cuVSLAM with RealSense camera
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_realsense.launch.py

# Visualize in RViz
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_rviz.launch.py

Output topics:

/visual_slam/tracking/odometry — Robot pose
/visual_slam/vis/observations_cloud — Feature point cloud
/visual_slam/vis/landmarks_cloud — Map landmarks

Challenges

Evaluation Metrics

Metric	Description
ATE (Absolute Trajectory Error)	Global accuracy of estimated trajectory
RPE (Relative Pose Error)	Local drift over fixed intervals
Loop Closure Recall	% of true loops detected
Map Consistency	How well the map aligns with itself

Prerequisites

Coordinate Frames Understanding transforms between frames

Computer Vision Feature detection and matching

Visual Odometry Estimating motion from cameras (no map)

Sensor Fusion Combining multiple sensor modalities

Nav2 ROS 2 navigation stack that uses SLAM maps

Isaac ROS GPU-accelerated SLAM on Jetson

Sources

Isaac ROS Visual SLAM — cuVSLAM documentation, multi-camera support, and KITTI benchmarks
cuVSLAM Concepts — GPU-accelerated SLAM architecture and IMU fusion
Isaac ROS nvblox — 3D reconstruction with TSDF, multi-sensor support
SLAM Toolbox (ROS 2) — Actively maintained 2D SLAM for ROS 2 (Jazzy 2.8.3)
ORB-SLAM3 Paper — Visual-inertial SLAM with multi-map support
DROID-SLAM — Deep learning-based dense SLAM