API Usage / Workspace & Protocols

Workspace & Protocols

Workspace directory structure, all Markdown protocol file formats, semantics, and read/write relationships.

Workspace Topology

PhyAgentOS stores all state in local Markdown files. In single-machine mode, all files live in one directory; Fleet mode introduces a shared workspace.

flowchart TB subgraph SINGLE["Single-Machine Mode"] direction TB W["~/.PhyAgentOS/workspace/"] W --> E1["ENVIRONMENT.md"] W --> E2["EMBODIED.md"] W --> E3["ACTION.md"] W --> E4["LESSONS.md"] W --> E5["TASK.md"] end subgraph FLEET["Fleet Mode"] direction TB SW["shared_workspace/"] SW --> S1["ENVIRONMENT.md"] SW --> S2["TASK.md"] SW --> S3["ORCHESTRATOR.md"] SW --> S4["LESSONS.md"] SW --> S5["ROBOTS.md"] RW_A["robot_a/"] RW_A --> R1_A["ACTION.md"] RW_A --> R2_A["EMBODIED.md"] RW_B["robot_b/"] RW_B --> R1_B["ACTION.md"] RW_B --> R2_B["EMBODIED.md"] end
FileSingle ModeFleet: SharedFleet: Per-Robot
ENVIRONMENT.mdworkspace/shared_workspace/
EMBODIED.mdworkspace/robot_X/
ACTION.mdworkspace/robot_X/
TASK.mdworkspace/shared_workspace/
LESSONS.mdworkspace/shared_workspace/
ORCHESTRATOR.mdshared_workspace/
ROBOTS.mdshared_workspace/

Single mode: all files in one directory, created by paos onboard.
Fleet mode: the shared workspace holds global state (environment, tasks, orchestration, lessons), while each robot's dedicated workspace holds its own ACTION.md and EMBODIED.md.

ENVIRONMENT.md — Source of Truth

The single authoritative source for environment state. Updated by the Watchdog after every action execution. The Agent reads it before every reasoning cycle.

JSON Example

{
  "schema_version": "v2.0",
  "updated_at": "2025-04-01T12:00:00Z",
  "scene_graph": {
    "nodes": [
      {"id": "table_01", "type": "furniture", "position": [1.2, 0.0, 0.8]},
      {"id": "apple_01", "type": "object", "position": [1.2, 0.75, 0.8], "status": "on_table"}
    ],
    "edges": [
      {"from": "apple_01", "to": "table_01", "relation": "on"}
    ]
  },
  "robots": [
    {
      "robot_id": "franka_001",
      "pose": [0.5, 0.0, 0.0],
      "joint_state": {"joint_1": 0.0, "joint_2": -0.3},
      "gripper": "open",
      "holding": null
    }
  ],
  "objects": [
    {"id": "apple_01", "name": "Red Apple", "category": "fruit", "position": [1.2, 0.75, 0.8]}
  ],
  "perception": {
    "camera_rgb": "artifacts/camera/front_001.jpg",
    "depth": "artifacts/camera/front_001_depth.npy"
  }
}
FieldTypeDescription
schema_versionstringProtocol version (v1 / v2). v2 adds perception and scene_graph.edges
updated_atISO 8601Timestamp of the last file write
scene_graphobjectScene graph: nodes record all entities, edges record spatial relationships
robotsarrayPose and joint state of all robots in the scene
objectsarrayPosition and properties of all interactable objects
perceptionobjectPerception data references (RGB/depth image paths). Added in v2
mapobjectOptional: occupancy grid map data
tfobjectOptional: coordinate frame transform tree

v1 vs v2 Differences

Featurev1v2
Scene graph relationsnodes list onlynodes + edges (spatial relations)
Perception datanoneperception field, supports RGB/depth refs
Multi-robotsingle robot objectrobots array, supports Fleet
Map datanoneoptional map and tf fields

EMBODIED.md — Capability Profile

A Markdown file describing the robot's physical capabilities. Copied by the Watchdog at startup from hal/profiles/*.md into the workspace.

Example

# EMBODIED — Franka Emika Panda

## Identity
- **Robot Model**: Franka Emika Panda
- **DOF**: 7
- **End Effector**: Parallel Jaw Gripper
- **Driver**: rekep_real

## Sensors
- [x] RGB-D Camera (Intel RealSense D435)
- [x] Force-Torque Sensor
- [x] Joint Encoders (7x)

## Supported Actions
| Action Type       | Description                     | Parameters              |
|-------------------|--------------------------------|--------------------------|
| move_to           | Cartesian-space move to target  | target_pose: [x,y,z,r,p,y] |
| pick_up           | Grasp an object by ID           | object_id: string        |
| place             | Place held object at location   | target_position: [x,y,z] |
| target_navigation | Visual navigation to a target   | target_label: string     |
| real_execute      | Execute a natural-language ReKep task | nl_task: string     |

## Physical Constraints
- **Max Reach**: 0.855 m
- **Max Payload**: 3.0 kg
- **Workspace Volume**: ~1.5 m³
The Critic must validate every action against EMBODIED.md: any action exceeding workspace bounds or payload limits is rejected.

ACTION.md — Action Queue

The JSON queue through which the Agent dispatches actions to Track B. The Watchdog polls this file and picks up pending actions.

JSON Example

{
  "queue": [
    {
      "action_id": "act_001",
      "action_type": "move_to",
      "params": {
        "target_pose": [0.8, 0.3, 0.5, 0.0, 1.57, 0.0]
      },
      "status": "completed",
      "robot_id": "franka_001",
      "created_at": "2025-04-01T12:00:00Z",
      "completed_at": "2025-04-01T12:00:02Z"
    },
    {
      "action_id": "act_002",
      "action_type": "pick_up",
      "params": {
        "object_id": "apple_01"
      },
      "status": "pending",
      "robot_id": "franka_001",
      "created_at": "2025-04-01T12:00:03Z"
    }
  ]
}

action_type Enumeration

TypeDescriptionSupported Drivers
move_toCartesian-space motion to target poseAll physical drivers
pick_upGrasp object by object_idrekep_sim, rekep_real
placePlace held object at target positionrekep_sim, rekep_real
target_navigationNavigate to a visual target using perception feedbacksimulation, go2_edu
real_executeExecute a natural-language ReKep grasping taskrekep_real
strategyScripted strategy action (model-free)simulation

status State Machine

pending running completed | failed

Agent writes pending; Watchdog picks it up and changes to running; upon completion changes to completed or failed. Failure reason and exception stack trace are also written.

SESSIONS.md — Runtime Session Queue

In the V2 architecture, this session-level protocol replaces ACTION.md. The runtime Watchdog reads this file to schedule execution sessions.

YAML Example (from templates/SESSIONS.md)

sessions:
  - session_id: sess_pick_apple_001
    target_ref: "sim_franka_tabletop"
    skill_ref: "rekep_pick"
    status: pending
    priority: high
    timeouts:
      session: 120
      skill: 60
    retry:
      max_attempts: 3
      backoff: "exponential"
    routing:
      target_adapter: "SimTargetAdapter"
      skill_runtime: "ReKepPolicyRuntime"
    execution:
      params:
        object_id: "apple_01"
        gripper: "parallel_jaw"
    created_at: "2025-04-01T12:00:00Z"
FieldTypeDescription
session_idstringUnique session ID, format sess_<skill>_<seq>
target_refstringReferences a target id registered in TARGETS.md
skill_refstringReferences a skill id registered in SKILLS.md
statusenumState: pendingrunningsucceeded / failed
priorityenumScheduling priority: high / normal / low
timeoutsobjectsession overall timeout and skill per-step timeout (seconds)
retryobjectmax_attempts and backoff strategy
routingobjectSpecifies the target_adapter and skill_runtime to use
executionobjectExecution parameters, e.g. params (key-value pairs passed to the skill)

Priority Scheduling Rules

  • high priority sessions execute before normal and low
  • Same priority sessions are processed FIFO by created_at timestamp
  • Sessions targeting the same robot (same target_ref) run serially; different robots may run in parallel

TARGETS.md — Runtime Target Registry

Registers all available rollout targets (simulation environments or real-robot instances). The runtime Watchdog uses this file to route sessions.

YAML Example (from templates/TARGETS.md)

targets:
  - id: sim_franka_tabletop
    type: sim
    enabled: true
    backend: mujoco
    supported_skills:
      - rekep_pick
      - rekep_place
      - openvla_manipulation
    adapter:
      class: SimTargetAdapter
      obs_format: mujoco_standard
    perception:
      cameras:
        - name: front
          resolution: [640, 480]
          fps: 30
        - name: wrist
          resolution: [640, 480]
          fps: 30
    config:
      scene_xml: /path/to/tabletop.xml

  - id: real_franka_lab
    type: real_robot
    enabled: true
    supported_skills:
      - rekep_pick
    adapter:
      class: RealRobotTargetAdapter
      driver: franka
    perception:
      cameras:
        - name: front
          device: "/dev/video0"
          resolution: [1280, 720]
    config:
      ip: "192.168.1.100"
      control_mode: "joint_position"
FieldTypeDescription
idstringUnique target ID, referenced by target_ref in SESSIONS.md
typeenumsim or real_robot
backendstringSimulation backend (mujoco, maniskill, isaac_sim)
enabledboolSet to false to temporarily take a target offline
supported_skillsarrayList of skill ids this target supports
adapterobjectSpecifies the class name and obs_format for observations
perceptionobjectcameras array config (resolution, FPS, device path)
configobjectTarget-specific config (scene file, IP address, etc.)

SKILLS.md — Runtime Skill Registry

Registers all available skills, defining each skill's runtime requirements, policy client, and environment contract.

YAML Example (from templates/SKILLS.md)

skills:
  - id: rekep_pick
    category: manipulation
    description: "ReKep-based grasping skill with natural-language target description"
    runtime: ReKepPolicyRuntime
    supported_target_types:
      - sim
      - real_robot
    policy_client:
      type: http
      endpoint: "http://localhost:8765/predict"
    requires:
      sensors:
        - rgb_camera
        - depth_camera
        - joint_states
      environment_outputs:
        - scene_graph
        - perception_data
      strict_environment_contract: true

  - id: openvla_manipulation
    category: manipulation
    description: "OpenVLA-based general-purpose manipulation skill"
    runtime: VLAPolicyRuntime
    supported_target_types:
      - sim
    policy_client:
      type: local
      checkpoint: "/models/openvla-7b"
    requires:
      sensors:
        - rgb_camera
      environment_outputs:
        - scene_graph
      strict_environment_contract: true
FieldTypeDescription
idstringUnique skill ID, referenced by skill_ref in SESSIONS.md
categorystringSkill category: manipulation, navigation, perception
runtimestringSkillRuntime class name that defines the algorithm for executing this skill
supported_target_typesarraySupported target types: sim, real_robot
policy_clientobjectPolicy client config: type (http/local), endpoint or checkpoint
requires.sensorsarrayRequired sensor list (rgb_camera, depth_camera, joint_states)
requires.environment_outputsarrayData types the runtime needs from the environment
requires.strict_environment_contractboolIf true, missing any required sensor causes execution to be rejected

TASK.md — Long-Horizon Task Decomposition

The Agent breaks down long-horizon user instructions into sub-tasks and tracks progress. The Critic evaluates overall completion against this file.

Example

# Task: Clear the Table

| Sub-Task | Action              | Target Device | Status     | Depends On | Result |
|----------|---------------------|---------------|------------|------------|--------|
| 1        | Navigate to table   | franka_001    | ✅ done    | —          | Arrived at table_01 |
| 2        | Grasp apple         | franka_001    | ✅ done    | 1          | Apple picked via ReKep |
| 3        | Place in basket     | franka_001    | ⏳ running | 2          | Needs precise positioning |
| 4        | Grasp cup           | franka_001    | ⬜ pending | 3          | — |
| 5        | Place on cup holder | franka_001    | ⬜ pending | 4          | — |

**Overall Progress**: 2/5 (40%)

The table format tracks each sub-task's ID, action, target device, status, dependencies, and result. The Agent updates status columns after each action; the Critic verifies progress against the user's original goal.

ORCHESTRATOR.md — Global Dashboard

In Fleet mode, the Orchestrator maintains global task assignment and robot scheduling plans in this file.

# Orchestrator Dashboard

## Active Missions
| Mission        | Assigned To  | Priority | Status  |
|----------------|--------------|----------|---------|
| Clear the table| franka_001   | high     | running |
| Patrol area    | go2_edu_001  | normal   | running |

## Robot Pool
| Robot         | Status | Last Heartbeat | Current Mission |
|---------------|--------|----------------|-----------------|
| franka_001    | busy   | 12:00:15       | Clear the table |
| go2_edu_001   | busy   | 12:00:12       | Patrol area     |

## Pending Queue
| Mission         | Requirements            | Priority | Queued At |
|-----------------|------------------------|----------|-----------|
| Water delivery  | mobile + manipulation  | normal   | 12:00:20  |

Active Missions

Currently running missions with priority and assigned robot.

Robot Pool

Live status, heartbeat, and current mission for each robot.

Pending Queue

Missions waiting for a robot to become available, with required capabilities.

LESSONS.md — Failure Memory

A log of Critic rejections. The Critic writes rejection reasons here; the Agent searches this file before planning to avoid repeating known failure patterns.

# LESSONS

## 2025-04-01 12:00:05 — Grasp Failed: Object Out of Workspace
- **Action**: pick_up apple_01
- **Reason**: Target position [1.8, 0.75, 0.8] exceeds Franka max reach (0.855m)
- **Critic Rejection**: EMBODIED.md Physical Constraints validation failed
- **Fix**: Execute move_to first to bring the robot closer

## 2025-04-01 11:55:00 — Navigation Collision: Path Blocked by Obstacle
- **Action**: target_navigation to kitchen_counter
- **Reason**: Direct path blocked by a chair
- **Critic Rejection**: No obstacle consideration; suggested adding intermediate waypoints
- **Fix**: Decompose into multi-segment navigation via hallway_midpoint
Self-evolution core: LESSONS.md is PhyAgentOS's failure experience database. The Agent uses the search_lessons tool to retrieve historical failure patterns and avoid repeating mistakes.

Who Reads, Who Writes

Read/write permission matrix for each component relative to protocol files. R = Read, W = Write.

File Planner (Agent) Critic Watchdog (Track B) Orchestrator Runtime Watchdog
ENVIRONMENT.md R R W R R
EMBODIED.md R R W (at startup)
ACTION.md W R R + W
SESSIONS.md W W R + W
TARGETS.md R R R
SKILLS.md R R R
TASK.md W R R
ORCHESTRATOR.md W
LESSONS.md R W R
ROBOTS.md R R R

File Lifecycle

A complete step-by-step timeline from paos onboard to state write-back.

sequenceDiagram actor User participant CLI as paos onboard participant WD as Watchdog (Track B) participant DRV as Driver participant ENV as ENVIRONMENT.md participant EMB as EMBODIED.md participant AGT as Agent (Track A) participant CRT as Critic participant ACT as ACTION.md participant LSN as LESSONS.md User->>CLI: paos onboard CLI->>ENV: create (empty template) CLI->>EMB: create (empty template) CLI->>ACT: create (empty template) CLI->>LSN: create (empty template) User->>WD: start watchdog WD->>DRV: load driver DRV->>WD: observe() → initial state WD->>ENV: write initial scene graph WD->>EMB: copy from hal/profiles/*.md User->>AGT: paos agent → give task loop Planner-Critic Loop AGT->>ENV: read state AGT->>EMB: read capabilities AGT->>LSN: search past failures AGT->>CRT: propose action CRT->>EMB: validate against constraints alt rejected CRT->>LSN: write rejection reason CRT-->>AGT: reject + feedback else approved CRT->>ACT: write action (status: pending) end end WD->>ACT: poll → pick up pending action WD->>ACT: mark running WD->>DRV: execute(action) DRV->>WD: observe() → new state WD->>ENV: write updated scene graph WD->>ACT: mark completed / failed

1. paos onboard — Initialization

Creates ~/.PhyAgentOS/config.json and workspace/ directory. Generates empty templates: ENVIRONMENT.md, EMBODIED.md, ACTION.md, LESSONS.md.

2. Watchdog Startup — Environment Initialization

The HAL Watchdog starts the driver, calls driver.observe() to get the initial scene state, and writes it to ENVIRONMENT.md. Copies the matching EMBODIED.md from hal/profiles/.

3. Agent Startup — Planner-Critic Loop

Each turn: Planner reads ENVIRONMENT.md + EMBODIED.md + TASK.md + LESSONS.md → generates action plan → Critic validates against EMBODIED.md → if approved, writes ACTION.md (status: pending).

4. Watchdog Polling — Execute Action

Polls ACTION.md, picks up pending action → changes to running → driver calls driver.execute(action) → upon completion changes to completed/failed.

5. State Write-Back — Update ENVIRONMENT.md

The driver calls driver.observe() after each execution. The Watchdog writes the latest scene graph to ENVIRONMENT.md. The Agent reads the updated state in the next cycle.

6. Critic Rejection — Write to LESSONS.md

If the Critic rejects an action (violates EMBODIED.md constraints or no safe path), the rejection reason and context are written to LESSONS.md. The Agent's search_lessons tool retrieves this in later cycles.