One-liner: How Data Flows
Don't rush into the source directory. Burn this main line into your brain first: a user's sentence → Agent's plan → written to file → Watchdog reads → drives execution → state written back to file. The entire loop revolves around a few Markdown files.
Six Steps: From Sentence to Robot Action
Below is the breakdown in actual execution order. If you get stuck debugging, check these six steps to locate the issue: is it in the understanding layer? The validation layer? Or did the execution layer fail to write back state?
Steps 1–3: Cognitive Layer
The user sends a task via CLI or gateway. AgentLoop assembles conversation history and workspace state into a prompt, letting the LLM decide which tool to call. When the physical world is involved, it typically hits execute_robot_action or target_navigation.
Step 4: Critic Validation
The Critic examines EMBODIED.md and ENVIRONMENT.md to check whether the action is legal and reachable. If not, it rejects the action and writes the reason to LESSONS.md.
Step 5: Persist to File
After validation, the action is not RPC'd directly to hardware. Instead, it's formatted as JSON and written to ACTION.md. This action queue is the only interface from the cognitive layer to the execution layer.
Step 6: Execute & Write Back
The HAL Watchdog polls ACTION.md, finds a pending action, and hands it to the Driver. After execution, the Driver writes the latest environment state back to ENVIRONMENT.md, closing the loop.
Files Are Not Logs — They Are the Interface
Many systems use files for logging. PhyAgentOS uses files as protocol. Different processes don't call each other's functions directly — they read the same Markdown. This makes the entire system like a glass house: you can open any file at any time to see the current state.
| File | One-liner | Written by | Read by |
|---|---|---|---|
ENVIRONMENT.md | Source of truth for current environment: scene graph, robot poses, navigation state | Watchdog / Perception | Agent / Critic |
EMBODIED.md | What this robot can and cannot do | Watchdog (from profile) | Critic / Agent |
ACTION.md | Action queue sent to the robot | EmbodiedActionTool | HAL Watchdog |
LESSONS.md | Failure experience to avoid repeating mistakes | Critic rejection logic | Agent / You |
ROBOTS.md | Multi-robot index with capability summaries | EmbodimentRegistry | Planner / Ops |
Three-Layer Structure
Memorizing the code directory is useless — building a layered mental model is more important. When you see a piece of code, ask yourself: is this "thinking" for the robot, or "running errands"?
Cognitive Layer
PhyAgentOS/agent, providers, cli
Responsible for understanding tasks, planning actions, calling tools, and making safety judgments. Never touches robot SDKs directly.
Protocol Layer
workspace / workspaces
Uses Markdown to carry shared state, allowing Agent, Watchdog, and Critic to read the same context.
Execution Layer
hal/, drivers, navigation, perception
Responsible for connecting hardware, executing actions, and refreshing environment state. No high-level planning.
Single vs Fleet: Same Codebase, Two Modes
Single
One robot, one workspace. The most comfortable debugging mode — running a local simulator also uses this pattern.
Fleet
Multiple robots share one ENVIRONMENT.md, but each has its own ACTION.md and EMBODIED.md.
Runtime Session Loop
Beyond the single-action ACTION.md loop, PhyAgentOS now has a session-centered runtime. It reads SESSIONS.md, selects a pending session by priority, runs preflight health checks, executes a skill runtime against a rollout target, and writes results back. The serial WatchdogSupervisor chooses high priority before normal before low.
SESSIONS.md
Session queue with target/skill routing, priority, timeouts, and retry policy. Replaces ACTION.md for the runtime path.
WatchdogSupervisor
Componentized serial supervisor: selects pending sessions, runs preflight, orchestrates skill runtime, writes retry/failure state back.
Artifacts
Episode summaries, perception outputs, and environment state are written to artifacts/runtime/ and ENVIRONMENT.md.
Why Markdown as Protocol?
Because humans need to read it, and models need to read it too. You can open an ENVIRONMENT.md and read it directly, and an LLM can stuff it into a prompt. If it were binary protobuf, you'd cry during debugging.
- Human-readable: When something goes wrong, just cat the file — no need to attach a debugger.
- Model-readable: Markdown naturally fits into LLM context.
- Loose coupling between processes: Agent and Watchdog don't need RPC — they just agree on file formats.