Tutorial: policy replay¶
This is the original replay path in the repo: record a run, then replay the recorded joint trajectory back in sim.
Looking for the layered LeRobot workflow?
This page is the original umbrella tutorial for record → replay in sim. For the three-step workflow that covers exporting to LeRobot datasets, driving a pre-trained checkpoint in sim, and handing a sim-validated skill off to real hardware, start here:
- LeRobot Export — record an episode and export it into a LeRobot v3 dataset; inspect the parquet
- metadata with standard tooling.
- LeRobot Policy Replay — run
a public ACT checkpoint through
LeRobotPolicyAdapter+run_policyon a non-bundled robot. - Sim-to-Real Handoff — what carries over to real hardware, what doesn't, and a concrete SO-101 backend skeleton.
The page below stays useful for the ReplayTrajectoryPolicy
path: given an events.jsonl from LocalRecorder, open-loop
replay the joint trajectory with no training and no checkpoint.
If you want the simplest possible policy path, this is it. No training, no checkpoint loading, just replaying the trajectory you already recorded.
1. Record an episode¶
This writes:
See recording & export for the
events.jsonl schema.
2. (Optional) Export to LeRobot v3¶
uv pip install -e 'packages/robosandbox-core[lerobot]'
robo-sandbox export-lerobot \
runs/20260418-094533-1a2b3c4d \
/tmp/my_dataset
This writes a LeRobot v3 dataset at /tmp/my_dataset/ with
data/chunk-000/episode_000000.parquet + meta/ + videos/. Pass
this to any LeRobot-compatible training loop.
3. Replay the trajectory¶
The bundled ReplayTrajectoryPolicy treats events.jsonl as an
open-loop action trace and drives the sim through it tick by tick.
From the CLI:
robo-sandbox run --policy runs/20260418-094533-1a2b3c4d \
--task pick_cube_franka \
--max-steps 1000
What happens under the hood:
load_policy(path)inspects the directory. Anevents.jsonlpresent → wraps inReplayTrajectoryPolicy.- The task's scene is loaded into
MuJoCoBackendand settled under gravity. run_policy(sim, policy, max_steps, success=task.success)loops observe → act → step.- The task's success criterion runs against the final observation and is printed at the end.
Example output:
[run --policy] task: pick_cube_franka
[run --policy] policy: runs/20260418-094533-1a2b3c4d
[run --policy] verdict: success
[run --policy] steps: 1000
[run --policy] final_reason: policy_completed_1000_steps
[run --policy] wall: 18.3s
4. Wire your own policy¶
Anything with act(obs: Observation) -> np.ndarray of shape
(n_dof + 1,) (joints + gripper in [0, 1]) satisfies the Policy
protocol:
from robosandbox.policy import Policy, run_policy
class MyAwesomePolicy:
def __init__(self, checkpoint: str):
self._model = load_my_model(checkpoint)
def act(self, obs):
joints, gripper = self._model.infer(obs.rgb, obs.robot_joints)
return np.concatenate([joints, [gripper]])
result = run_policy(sim, MyAwesomePolicy("ckpt.pt"),
max_steps=1000, success=task.success)
# {"success": True, "steps": 1000, "initial_obs": ..., "final_obs": ...}
If you want the CLI to understand your own checkpoint directory, extend
robosandbox.policy.load_policy to dispatch on your checkpoint
format (LeRobot, torchscript, onnx, whatever):
# in your own package
from robosandbox.policy import load_policy as _core_load_policy
def load_policy(path):
p = Path(path)
if (p / "config.json").exists():
return MyAwesomePolicy(p)
return _core_load_policy(p) # fall through to replay
policy.json alternative¶
If a directory does not auto-match, add a policy.json:
action_lookahead > 1 skips that many rows per act() — useful to
replay a 200 Hz recording at 100 Hz.
Action semantics¶
Policy.act(obs) returns a flat (n_dof + 1,) vector:
- first
n_dofentries — target joint positions - last entry — gripper in
[0, 1](0 = open, 1 = closed)
This matches MuJoCoBackend.step(target_joints=..., gripper=...).
Values outside range are clamped by the sim, not the policy.
Tips¶
verdict: unknownin the CLI means the task didn't declare a success criterion. That's fine for free-form exploratory runs.- Policy runs forever — ReplayTrajectoryPolicy repeats its last
action after the trajectory ends. Use
--max-stepsto cap. - Sim lag —
run_policydoes one sim step peractcall. At 200 Hz sim timestep, 1000 steps = 5 sim seconds.
See also¶
- Recording & export —
LocalRecorderlayout +events.jsonlschema. - Real-robot bridge — same
Policyprotocol runs againstRealRobotBackend. - CLI:
robo-sandbox run --policy.