Skip to content

Jetson Thor

Practical

Jetson Thor is NVIDIA’s most powerful edge AI computing platform, purpose-built for humanoid robots, autonomous vehicles, and advanced robotics requiring datacenter-class AI at the edge. Announced at GTC 2024, Thor delivers up to 2,070 FP4 TFLOPS with native Transformer Engine support.

Why Thor?

Thor represents a generational leap designed specifically for the foundation model era:

  • Transformer Engine: Native FP8 support for running large vision-language-action (VLA) models
  • Unified memory: Up to 128GB shared between CPU and GPU
  • Multi-modal AI: Run perception, planning, and control models simultaneously
  • Humanoid-ready: Designed for the compute demands of next-gen robots

Specifications

SpecT5000T4000
AI Performance2,070 FP4 TFLOPS (sparse)1,200 FP4 TFLOPS
GPU ArchitectureNVIDIA BlackwellNVIDIA Blackwell
GPU Cores2,560 CUDA, 96 Tensor (5th gen)1,536 CUDA, 64 Tensor
Transformer EngineYes (FP4/FP8)Yes (FP4/FP8)
CPU14-core Arm Neoverse V3AE @ 2.6 GHz12-core Arm Neoverse V3AE
Memory128GB LPDDR5X unified64GB LPDDR5X unified
Memory Bandwidth273 GB/s273 GB/s
Power40W - 130W (configurable)40W - 70W
Process4nm4nm

Architecture

┌──────────────────────────────────────────────────────────────────┐
│ Jetson Thor SoC (T5000) │
├──────────────────────────────────────────────────────────────────┤
│ ┌────────────────────┐ ┌────────────────────────────────────┐ │
│ │ Arm CPU │ │ NVIDIA Blackwell GPU │ │
│ │ Neoverse V3AE │ │ ┌──────────┐ ┌───────────────┐ │ │
│ │ 14 cores @2.6GHz │ │ │ CUDA │ │ Transformer │ │ │
│ │ │ │ │ Cores │ │ Engine │ │ │
│ └────────────────────┘ │ │ 2,560 │ │ FP4/FP8 │ │ │
│ │ └──────────┘ └───────────────┘ │ │
│ ┌────────────────────┐ │ ┌──────────┐ ┌───────────────┐ │ │
│ │ Safety Island │ │ │ Tensor │ │ RT Cores │ │ │
│ │ Lockstep cores │ │ │ Cores │ │ Ray tracing │ │ │
│ │ ASIL-D capable │ │ │ 96 (5th)│ │ │ │ │
│ └────────────────────┘ └──┴──────────┴──┴───────────────┴──┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 128GB LPDDR5X Unified Memory (273 GB/s) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ NVDLA v3 │ │ PVA v3 │ │ Video: 8K60 decode │ │
│ │ (2x) │ │ (2x) │ │ 4K120 encode │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Key Innovations

  • Blackwell GPU: Latest architecture with 5th-gen Tensor Cores
  • Transformer Engine: Hardware-accelerated FP4/FP8 with dynamic switching for LLM/VLA inference
  • Safety Island: Dedicated lockstep cores for functional safety (ASIL-D)
  • Unified Memory: CPU and GPU share up to 128GB pool—no copying overhead

Target Applications

Humanoid Robots

Full-body control, real-time VLA models, multi-camera perception at 800+ TOPS

Autonomous Vehicles

Level 4/5 autonomy with functional safety, sensor fusion, redundant compute

Industrial Manipulation

High-DOF arms, force feedback, real-time path planning with foundation models

Medical Robotics

Surgical assistance, diagnostic AI, safety-critical applications

Software Stack

┌─────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────┤
│ Isaac Lab │ Isaac ROS 4.0 │ Omniverse │ cuMotion │ OSMO │
├─────────────────────────────────────────────────────────────┤
│ TensorRT 10.13 │ cuDNN 9.12 │ CUDA 13.0 │ Triton Server │
├─────────────────────────────────────────────────────────────┤
│ JetPack 7.1 SDK │
├─────────────────────────────────────────────────────────────┤
│ Linux Kernel 6.8 LTS + Ubuntu 24.04 LTS │
└─────────────────────────────────────────────────────────────┘
Terminal window
# JetPack 7.1 - Latest Thor SDK (Jetson Linux 38.4)
# Ubuntu 24.04, CUDA 13.0, TensorRT 10.13
# Transformer Engine support included
# Flash Thor developer kit
sudo ./flash.sh jetson-thor-devkit internal
# Install full SDK
sudo apt update
sudo apt install nvidia-jetpack

Transformer Engine for Robotics

Thor’s Transformer Engine enables running foundation models at the edge:

import torch
import transformer_engine.pytorch as te
# Run GR00T-style VLA model on Thor
class RobotPolicy(torch.nn.Module):
def __init__(self):
super().__init__()
# FP8 automatic mixed precision
self.vision_encoder = te.Linear(768, 1024)
self.transformer = te.TransformerLayer(
hidden_size=1024,
ffn_hidden_size=4096,
num_attention_heads=16,
)
self.action_head = te.Linear(1024, 32) # Joint commands
def forward(self, images, proprioception):
# Runs in FP8 automatically on Thor
x = self.vision_encoder(images)
x = self.transformer(x)
return self.action_head(x)

Thor vs Orin Comparison

AspectJetson Thor (T5000)Jetson AGX Orin
AI Performance2,070 FP4 TFLOPS275 TOPS
GPU ArchitectureBlackwellAmpere
Transformer EngineYes (FP4/FP8)No
Max Memory128GB64GB
Memory Bandwidth273 GB/s204 GB/s
Power Range40-130W15-60W
Foundation ModelsNative supportLimited
TargetHumanoids, L4/5 AVAMRs, drones, industrial

Development Workflow

  1. Simulate in Isaac Sim: Train and validate with Omniverse digital twin
  2. Develop on DGX Spark: Use desktop supercomputer for model development
  3. Deploy to Thor: Seamless transition with JetPack 7.1 compatibility
  4. Scale with OSMO: Orchestrate fleets across edge and cloud

Getting Started

1. Order Developer Kit

Thor developer kits available through NVIDIA partners. Includes:

  • Jetson Thor module (128GB)
  • Developer carrier board
  • Power supply (200W)
  • Cooling solution

2. Flash and Setup

Terminal window
# Download JetPack 7.1 from NVIDIA
# Use SDK Manager or command line
sudo ./flash.sh jetson-thor-devkit internal
# After boot, verify
tegrastats
nvidia-smi

3. Run Benchmark

import torch
import time
# Verify Transformer Engine
device = torch.device('cuda')
x = torch.randn(32, 1024, 4096, device=device, dtype=torch.float16)
# Measure FP8 inference
with torch.cuda.amp.autocast(dtype=torch.float8_e4m3fn):
start = time.time()
for _ in range(100):
y = torch.nn.functional.linear(x, torch.randn(4096, 4096, device=device))
torch.cuda.synchronize()
print(f"FP8 throughput: {100/(time.time()-start):.1f} iter/s")

Learn More

Sources