Tutorial: custom skill¶
Skills are any object matching the Skill protocol. No base class, no
registry required — just match the shape, pass an instance to
Agent(skills=[...]).
The protocol¶
class Skill(Protocol):
name: str
description: str
parameters_schema: dict # JSON schema for arguments
def __call__(self, ctx, **kwargs): ... # returns SkillResult
name— lowercase identifier; matches what a planner emits.description— one-line string the VLM sees in its tool definition.parameters_schema— standard JSON schema. Converted directly to an OpenAI tool definition.__call__— receivesctx: AgentContextplus keyword args parsed from the planner'sSkillCall.arguments. ReturnsSkillResult.
Tap skill — minimal working example¶
from robosandbox.agent.context import AgentContext
from robosandbox.motion.ik import UnreachableError
from robosandbox.skills._common import execute_trajectory
from robosandbox.types import Pose, SkillResult
class Tap:
"""Touch the top of an object with the end-effector, no gripper."""
name = "tap"
description = "Tap the named object by touching its top with the end-effector."
parameters_schema = {
"type": "object",
"properties": {
"object": {"type": "string", "description": "Scene object id to tap."},
},
"required": ["object"],
}
def __call__(self, ctx: AgentContext, object: str) -> SkillResult:
obs = ctx.sim.observe()
target_pose = obs.scene_objects.get(object)
if target_pose is None:
return SkillResult(False, reason="not_found", reason_detail=object)
tx, ty, tz = target_pose.xyz
above = Pose(xyz=(tx, ty, tz + 0.12), quat_xyzw=(1.0, 0.0, 0.0, 0.0))
contact = Pose(xyz=(tx, ty, tz + 0.025), quat_xyzw=(1.0, 0.0, 0.0, 0.0))
try:
for pose in (above, contact, above):
now = ctx.sim.observe()
traj = ctx.motion.plan(
ctx.sim,
start_joints=now.robot_joints,
target_pose=pose,
constraints={"orientation": "z_down"},
)
execute_trajectory(ctx, traj, gripper=0.0)
except UnreachableError as e:
return SkillResult(False, reason="unreachable", reason_detail=str(e))
return SkillResult(True, reason="tapped", artifacts={"object": object})
Routing through a planner¶
Via StubPlanner¶
StubPlanner already recognises "tap/press/poke/touch the <obj>"
and emits SkillCall("tap", {"object": <obj>}) when a tap skill is
in the skill list.
from robosandbox.agent.agent import Agent
from robosandbox.agent.planner import StubPlanner
from robosandbox.skills.home import Home
skills = [Tap(), Home()]
agent = Agent(ctx=ctx, skills=skills, planner=StubPlanner(skills))
episode = agent.run("tap the apple")
Via a tiny custom planner¶
For more control — or to test the skill in isolation — any object
with a plan(task, obs, prior_attempts) -> (list[SkillCall], int)
method is a valid planner:
from robosandbox.agent.planner import SkillCall
class TapStub:
def plan(self, task, obs, prior_attempts):
return ([SkillCall(name="tap", arguments={"object": "apple"})], 0)
agent = Agent(ctx=ctx, skills=[Tap(), Home()], planner=TapStub())
agent.run("anything — ignored")
Via VLMPlanner¶
Drop the Tap() instance into the skill list — VLMPlanner converts
parameters_schema to an OpenAI tool definition automatically.
Description text matters here: it's what the model reads.
Registering as a plugin¶
If you want your skill available to downstream users without them
copying code, ship it in a package with a
robosandbox.skills entry point in pyproject.toml:
Core registers its built-ins the same way — see
packages/robosandbox-core/pyproject.toml.
Error handling conventions¶
SkillResult.reason is a structured string the replan loop reads:
reason |
When to raise |
|---|---|
not_found |
Object isn't in the scene / perception returned nothing. |
unreachable |
Motion planner couldn't IK the target. |
missed_grasp |
Gripper closed but object is no longer held. |
bad_arguments |
(Emitted by Agent on TypeError.) |
unknown_skill |
(Emitted by Agent when planner calls a name it doesn't have.) |
Pass structured reasons so prior-attempt feedback is informative to the VLM on replan.
Worked reference¶
Full runnable script: examples/custom_skill.py. Builds the scene,
runs Tap() in isolation, then runs the same skill through the
Agent loop.
Run:
Expected final line:
See also¶
- Skills & agents for the protocol, the agent state machine, and built-in skill list.
- Perception & grasping
for what
ctx.perception/ctx.grasp/ctx.motionexpose.