Documentation / Architecture

PhyAgentOS Framework Design

Imagine telling a robot "put the apple on the table into the basket." That sentence doesn't become motor commands directly — it goes through six stages: understanding, planning, validation, formatting, execution, and state feedback. PhyAgentOS chains these six stages into a clear, observable pipeline instead of a black box.

Cognitive Orchestration
Workspace Protocol
Safety Validation
HAL Execution
Feedback Loop

One-liner: How Data Flows

Don't rush into the source directory. Burn this main line into your brain first: a user's sentence → Agent's plan → written to file → Watchdog reads → drives execution → state written back to file. The entire loop revolves around a few Markdown files.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fffaf3', 'primaryTextColor': '#1d2a30', 'primaryBorderColor': '#20323a', 'lineColor': '#66757d', 'secondaryColor': '#f5efe6', 'tertiaryColor': '#ffffff', 'fontFamily': 'Inter, system-ui, sans-serif', 'fontSize': '14px', 'textColor': '#1d2a30', 'nodeBkg': '#fffaf3', 'nodeBorder': '#20323a', 'nodeTextColor': '#1d2a30', 'clusterBkg': '#f5efe6', 'clusterBorder': '#20323a', 'titleColor': '#1d2a30', 'edgeLabelBackground': '#ffffff', 'edgeLabelColor': '#66757d', 'arrowheadColor': '#66757d'}}}%% flowchart LR classDef cognitive fill:#dff0e6,stroke:#5f9e77,stroke-width:2px,color:#2d5a3f classDef safety fill:#f8e5b7,stroke:#cc951e,stroke-width:2px,color:#7a5a0a classDef protocol fill:#f6d0c3,stroke:#ca6a49,stroke-width:2px,color:#7a3a28 classDef execute fill:#dbe8f7,stroke:#4b7eb4,stroke-width:2px,color:#2a4a6a classDef file fill:#ffffff,stroke:#66757d,stroke-width:1.5px,color:#1d2a30 subgraph A["🧠 Cognitive Layer"] USER([User Input]):::cognitive AGENT[AgentLoop]:::cognitive TOOL[Embodied Tools]:::cognitive end subgraph B["🛡️ Validation Layer"] CRITIC{Critic}:::safety end subgraph C["📄 Protocol Layer"] EMBODIED([EMBODIED.md]):::file ACTION([ACTION.md]):::protocol ENV([ENVIRONMENT.md]):::file end subgraph D["🔧 Execution Layer"] WATCH[Watchdog]:::execute DRIVER[Driver]:::execute ROBOT([Robot]):::execute end USER --> AGENT --> TOOL --> CRITIC EMBODIED -.-> CRITIC CRITIC -->|Pass| ACTION CRITIC -.->|Reject| ENV ACTION --> WATCH --> DRIVER --> ROBOT ROBOT -.->|Update| ENV

Six Steps: From Sentence to Robot Action

Below is the breakdown in actual execution order. If you get stuck debugging, check these six steps to locate the issue: is it in the understanding layer? The validation layer? Or did the execution layer fail to write back state?

1. User speaks 2. Agent assembles context 3. Decides action 4. Critic validates 5. Write ACTION.md 6. Watchdog executes

Steps 1–3: Cognitive Layer

The user sends a task via CLI or gateway. AgentLoop assembles conversation history and workspace state into a prompt, letting the LLM decide which tool to call. When the physical world is involved, it typically hits execute_robot_action or target_navigation.

Step 4: Critic Validation

The Critic examines EMBODIED.md and ENVIRONMENT.md to check whether the action is legal and reachable. If not, it rejects the action and writes the reason to LESSONS.md.

Step 5: Persist to File

After validation, the action is not RPC'd directly to hardware. Instead, it's formatted as JSON and written to ACTION.md. This action queue is the only interface from the cognitive layer to the execution layer.

Step 6: Execute & Write Back

The HAL Watchdog polls ACTION.md, finds a pending action, and hands it to the Driver. After execution, the Driver writes the latest environment state back to ENVIRONMENT.md, closing the loop.

Files Are Not Logs — They Are the Interface

Many systems use files for logging. PhyAgentOS uses files as protocol. Different processes don't call each other's functions directly — they read the same Markdown. This makes the entire system like a glass house: you can open any file at any time to see the current state.

%%{init: {'theme': 'default', 'themeVariables': { 'primaryColor': '#fffaf3', 'primaryTextColor': '#1d2a30', 'primaryBorderColor': '#20323a', 'lineColor': '#66757d', 'secondaryColor': '#f5efe6', 'tertiaryColor': '#ffffff', 'fontFamily': 'Inter, system-ui, sans-serif', 'textColor': '#1d2a30', 'nodeBkg': '#ffffff', 'nodeBorder': '#20323a', 'nodeTextColor': '#1d2a30', 'clusterBkg': '#f5efe6', 'clusterBorder': '#20323a', 'titleColor': '#1d2a30', 'edgeLabelBackground': '#ffffff', 'edgeLabelColor': '#66757d', 'arrowheadColor': '#66757d'}}}%% flowchart TB classDef sharedFile fill:#f6d0c3,stroke:#ca6a49,stroke-width:2.5px,color:#7a3a28,font-weight:500 classDef robotFile fill:#dff0e6,stroke:#5f9e77,stroke-width:2.5px,color:#2d5a3f,font-weight:500 classDef agentRole fill:#dbe8f7,stroke:#4b7eb4,stroke-width:2px,color:#2a4a6a classDef criticRole fill:#f8e5b7,stroke:#cc951e,stroke-width:2px,color:#7a5a0a classDef watchdogRole fill:#ece3f6,stroke:#7f62ac,stroke-width:2px,color:#5a3d7a classDef wsBox fill:#fffaf3,stroke:#20323a,stroke-width:1.5px,color:#1d2a30 subgraph SHARED["📁 Shared Workspace"] direction TB E([ENVIRONMENT.md]):::sharedFile R([ROBOTS.md]):::sharedFile L([LESSONS.md]):::sharedFile T([TASK.md]):::sharedFile O([ORCHESTRATOR.md]):::sharedFile end subgraph ROBOT["🤖 Robot Workspace"] direction TB A([ACTION.md]):::robotFile M([EMBODIED.md]):::robotFile end AG([Agent]):::agentRole CR([Critic]):::criticRole WD([Watchdog]):::watchdogRole AG -."read environment".->E AG -."read task".->T AG ==>|"write action"| A CR -."read constraints".->M CR -."read state".->E WD -."poll action".->A WD ==>|"update environment"| E WD -."install config".->M
FileOne-linerWritten byRead by
ENVIRONMENT.mdSource of truth for current environment: scene graph, robot poses, navigation stateWatchdog / PerceptionAgent / Critic
EMBODIED.mdWhat this robot can and cannot doWatchdog (from profile)Critic / Agent
ACTION.mdAction queue sent to the robotEmbodiedActionToolHAL Watchdog
LESSONS.mdFailure experience to avoid repeating mistakesCritic rejection logicAgent / You
ROBOTS.mdMulti-robot index with capability summariesEmbodimentRegistryPlanner / Ops

Three-Layer Structure

Memorizing the code directory is useless — building a layered mental model is more important. When you see a piece of code, ask yourself: is this "thinking" for the robot, or "running errands"?

Cognitive Layer

PhyAgentOS/agent, providers, cli

Responsible for understanding tasks, planning actions, calling tools, and making safety judgments. Never touches robot SDKs directly.

Protocol Layer

workspace / workspaces

Uses Markdown to carry shared state, allowing Agent, Watchdog, and Critic to read the same context.

Execution Layer

hal/, drivers, navigation, perception

Responsible for connecting hardware, executing actions, and refreshing environment state. No high-level planning.

Single vs Fleet: Same Codebase, Two Modes

Single

One robot, one workspace. The most comfortable debugging mode — running a local simulator also uses this pattern.

Fleet

Multiple robots share one ENVIRONMENT.md, but each has its own ACTION.md and EMBODIED.md.

Runtime Session Loop

Beyond the single-action ACTION.md loop, PhyAgentOS now has a session-centered runtime. It reads SESSIONS.md, selects a pending session by priority, runs preflight health checks, executes a skill runtime against a rollout target, and writes results back. The serial WatchdogSupervisor chooses high priority before normal before low.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fffaf3', 'primaryTextColor': '#1d2a30', 'primaryBorderColor': '#20323a', 'lineColor': '#66757d', 'secondaryColor': '#f5efe6', 'tertiaryColor': '#ffffff', 'fontFamily': 'Inter, system-ui, sans-serif', 'fontSize': '14px', 'textColor': '#1d2a30', 'nodeBkg': '#fffaf3', 'nodeBorder': '#20323a', 'nodeTextColor': '#1d2a30', 'clusterBkg': '#f5efe6', 'clusterBorder': '#20323a', 'titleColor': '#1d2a30', 'edgeLabelBackground': '#ffffff', 'edgeLabelColor': '#66757d', 'arrowheadColor': '#66757d'}}}%% flowchart LR classDef file fill:#f6d0c3,stroke:#ca6a49,stroke-width:2px,color:#7a3a28 classDef runtime fill:#ece3f6,stroke:#7f62ac,stroke-width:2px,color:#5a3d7a classDef target fill:#dbe8f7,stroke:#4b7eb4,stroke-width:2px,color:#2a4a6a classDef output fill:#dff0e6,stroke:#5f9e77,stroke-width:2px,color:#2d5a3f S([SESSIONS.md]):::file T([TARGETS.md]):::file K([SKILLS.md]):::file SUP[WatchdogSupervisor]:::runtime PF{Preflight}:::runtime SK[SkillRuntime]:::runtime TGT[RolloutTarget]:::target PC[PolicyClient]:::target ART([episode.json]):::output ENV([ENVIRONMENT.md]):::output S --> SUP T --> SUP K --> SUP SUP --> PF PF -->|Pass| SK PF -.->|Reject| S SK --> TGT SK --> PC TGT --> ART SK -.->|Perception| ENV

SESSIONS.md

Session queue with target/skill routing, priority, timeouts, and retry policy. Replaces ACTION.md for the runtime path.

WatchdogSupervisor

Componentized serial supervisor: selects pending sessions, runs preflight, orchestrates skill runtime, writes retry/failure state back.

Artifacts

Episode summaries, perception outputs, and environment state are written to artifacts/runtime/ and ENVIRONMENT.md.

Why Markdown as Protocol?

Because humans need to read it, and models need to read it too. You can open an ENVIRONMENT.md and read it directly, and an LLM can stuff it into a prompt. If it were binary protobuf, you'd cry during debugging.

  • Human-readable: When something goes wrong, just cat the file — no need to attach a debugger.
  • Model-readable: Markdown naturally fits into LLM context.
  • Loose coupling between processes: Agent and Watchdog don't need RPC — they just agree on file formats.