Skip to content

SLAM

Deep Dive

SLAM (Simultaneous Localization and Mapping) is the computational problem of constructing a map of an unknown environment while simultaneously tracking the robot’s location within it. It’s fundamental to autonomous navigation.

The SLAM Problem

┌─────────────┐
Sensors ───────►│ SLAM │───────► Map
(camera, │ Algorithm │───────► Robot Pose (x, y, θ)
LiDAR, └─────────────┘
IMU) ▲
Odometry (wheel encoders, IMU)

Inputs

  • Sensor observations: What the robot sees (images, point clouds, depth)
  • Odometry: Motion estimates from wheels/IMU (often noisy)

Outputs

  • Map: Representation of the environment
  • Pose: Robot’s position and orientation in the map

Types of SLAM

Uses cameras as the primary sensor.

Approaches:

  • Feature-based: Extract and track keypoints (ORB-SLAM, VINS)
  • Direct: Use raw pixel intensities (LSD-SLAM, DSO)
  • Deep learning: Learned features and depth (DROID-SLAM)

Pros: Rich information, low-cost sensors, works indoors/outdoors Cons: Sensitive to lighting, texture-poor environments

NVIDIA Solution: Isaac ROS Visual SLAM (cuVSLAM)

Map Representations

TypeDescriptionUse Case
Occupancy Grid2D/3D grid of occupied/free cellsNavigation, path planning
Point CloudSet of 3D points3D reconstruction, dense mapping
Feature MapSparse 3D landmarksVisual localization
MeshTriangulated surfaceSimulation, visualization
TSDFTruncated Signed Distance FieldReal-time 3D fusion
NeuralLearned implicit representationNeRF-based mapping

The SLAM Pipeline

┌─────────────────────────────────────────────────────────────────┐
│ SLAM Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. FRONTEND (Real-time) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Feature │──►│ Tracking │──►│ Local Map │ │
│ │ Extraction │ │ (Frame-to- │ │ Update │ │
│ │ │ │ Frame) │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ 2. BACKEND (Optimization) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Loop │──►│ Bundle │──►│ Global Map │ │
│ │ Closure │ │ Adjustment │ │ Correction │ │
│ │ Detection │ │ / Pose │ │ │ │
│ │ │ │ Graph │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Frontend

  • Runs at sensor rate (30+ Hz)
  • Extracts features, tracks motion
  • Builds local map incrementally

Backend

  • Runs asynchronously (1-10 Hz)
  • Detects loop closures (been here before?)
  • Optimizes full trajectory and map

Loop Closure

The key to drift-free SLAM:

Start ──► ──► ──► ──► ──► ──►
"I've been here!"
◄── ◄── ◄── ◄── Loop Closure ◄──

When the robot recognizes a previously visited location, it can correct accumulated drift by adding a constraint in the pose graph.

SLAM on NVIDIA Jetson

cuVSLAM

NVIDIA’s GPU-accelerated Visual SLAM in Isaac ROS. Supports multi-camera setups (up to 32 cameras) with IMU fusion. Best-in-class performance on KITTI benchmark.

nvblox

Real-time 3D reconstruction using TSDF fusion. Supports multi-sensor input (3D LiDAR + up to 3 cameras). Builds meshes and occupancy grids for Nav2 integration.

Isaac ROS Visual SLAM Example

Terminal window
# Launch cuVSLAM with RealSense camera
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_realsense.launch.py
# Visualize in RViz
ros2 launch isaac_ros_visual_slam isaac_ros_visual_slam_rviz.launch.py

Output topics:

  • /visual_slam/tracking/odometry — Robot pose
  • /visual_slam/vis/observations_cloud — Feature point cloud
  • /visual_slam/vis/landmarks_cloud — Map landmarks

Challenges

Evaluation Metrics

MetricDescription
ATE (Absolute Trajectory Error)Global accuracy of estimated trajectory
RPE (Relative Pose Error)Local drift over fixed intervals
Loop Closure Recall% of true loops detected
Map ConsistencyHow well the map aligns with itself

Prerequisites

Sources