Skills & agents¶
Most of the control flow in RoboSandbox lives in three abstractions:
- Skills — the action vocabulary (
pick,place_on, …). Each is a callable with a JSON schema. - Planner — turns a natural-language task + current observation
into a list of
SkillCalls. - Agent — ReAct loop: plan → execute → (on failure) replan.
Skill protocol¶
class Skill(Protocol):
name: str
description: str
parameters_schema: dict # JSON schema
def __call__(self, ctx, **kwargs) -> SkillResult: ...
There is no base class requirement here. Any object matching the shape
works. The
VLMPlanner converts parameters_schema directly to an OpenAI tool
definition; the stub planner dispatches by name.
ctx is an AgentContext carrying sim, perception, grasp,
motion, and (optionally) recorder. Skills do I/O through ctx.
Skills shipped in v0.1¶
| Skill | Signature | What it does |
|---|---|---|
pick |
pick(object: str) |
Locate object, plan top-down grasp, execute, close gripper. |
place_on |
place_on(target: str) |
Move above target, release. |
push |
push(object: str, direction: str) |
Cartesian push in forward / back / left / right. |
home |
home() |
Return to the home pose from the robot sidecar. |
pour |
pour(target: str) |
Tilt end-effector over target. |
tap |
tap(object: str) |
Touch the top of an object with the fingertip. |
open_drawer |
open_drawer(drawer: str) |
Grasp handle, pull toward base. |
close_drawer |
close_drawer(drawer: str) |
Push drawer back in. |
stack |
stack(object: str, target: str) |
Pick + place in one. |
Every skill returns SkillResult(success, reason, reason_detail,
artifacts). Failure reasons are short structured strings such as
unreachable, not_found, or missed_grasp, so the replan loop has
something useful to work with.
Skills register at the robosandbox.skills entry point — see
packages/robosandbox-core/pyproject.toml. A plugin package can add
new skills without core changes.
Planner protocol¶
class Planner(Protocol):
def plan(
self,
task: str,
obs: Observation,
prior_attempts: list[dict],
) -> tuple[list[SkillCall], int]:
"""Returns (plan, n_model_calls). Empty plan == 'already done'."""
Two implementations ship with core:
VLMPlanner¶
Calls an OpenAI-compatible chat endpoint with tool-calling + image
input. Converts each Skill's parameters_schema into a tool
definition. Passes an RGB frame of the current observation plus a
text summary of scene_objects keys/xyz.
Used by robo-sandbox run --vlm-provider {openai,ollama,custom} and
the llm_guided.py example.
If the model responds with prose instead of tool calls, VLMPlanner
retries once with a "tool calls only" nudge.
StubPlanner¶
This is the deterministic, zero-dependency planner. It is just enough to exercise the agent loop without involving a model. It handles:
pick (up) the <obj>pick (up) the <obj> (and|then|,) (put|place) (it) on (the) <obj2>stack <obj> on (top of) <obj2>push the <obj> forward|back|left|rightpour <obj> into <obj2>tap/press/poke/touch the <obj>open/close the <drawer>(go) home
Object names are fuzzy-matched against scene object IDs (exact → substring → word-overlap).
Used by the browser viewer (no API key required), the benchmark
runner, and tests. If your prompt hits the pattern grammar it works
immediately; for anything outside that grammar, switch to VLMPlanner.
Agent loop¶
IDLE → PLAN → EXECUTE (one skill at a time) → EVALUATE →
│ success │ failure
▼ ▼
next in plan REPLAN ─► (max N times)
│ │
▼ ▼
DONE FAILED
from robosandbox.agent.agent import Agent
from robosandbox.agent.context import AgentContext
from robosandbox.agent.planner import StubPlanner
from robosandbox.skills.pick import Pick
from robosandbox.skills.home import Home
skills = [Pick(), Home()]
agent = Agent(ctx=ctx, skills=skills, planner=StubPlanner(skills))
episode = agent.run("pick up the red cube")
# episode.success, episode.steps, episode.replans, episode.plan, ...
The replan behaviour is:
- On any
SkillResult(success=False), the agent collects the failure (step_idx,skill,args,reason,reason_detail) intoprior_attempts. - The planner is called again with
prior_attemptsso a VLM can avoid repeating the same move. The stub planner ignores it (it's deterministic). - The loop terminates on
replans >= max_replanswithfinal_reason="replan_exhausted".
Composing your own¶
Custom planner: any object with a plan(task, obs, prior_attempts)
-> (list[SkillCall], int) method. See examples/custom_skill.py for
a TapStub that emits one skill call unconditionally.
Custom skill: tutorial. The Tap example in the tutorial is under 40 lines.
Related¶
- Perception & grasping — how skills find objects and compute gripper poses.
- Recording & export — what the agent writes to disk during a run.
- CLI reference —
robo-sandbox runflags.