PhyAgentOS — 工作区与协议

工作区拓扑

PhyAgentOS 将所有状态存储在本地 Markdown 文件中。单机模式下所有文件集中在一个目录；Fleet 模式引入共享工作区。

flowchart TB subgraph SINGLE["单机模式"] direction TB W["~/.PhyAgentOS/workspace/"] W --> E1["ENVIRONMENT.md"] W --> E2["EMBODIED.md"] W --> E3["ACTION.md"] W --> E4["LESSONS.md"] W --> E5["TASK.md"] end subgraph FLEET["Fleet 模式"] direction TB SW["shared_workspace/"] SW --> S1["ENVIRONMENT.md"] SW --> S2["TASK.md"] SW --> S3["ORCHESTRATOR.md"] SW --> S4["LESSONS.md"] RW_A["robot_a/"] RW_A --> R1_A["ACTION.md"] RW_A --> R2_A["EMBODIED.md"] RW_B["robot_b/"] RW_B --> R1_B["ACTION.md"] RW_B --> R2_B["EMBODIED.md"] end

单机模式：所有文件在一个目录中，由 paos onboard 创建。
Fleet 模式：共享工作区持有全局状态（环境、任务、调度），每台机器人的专属工作区持有各自的 ACTION.md 和 EMBODIED.md。

ENVIRONMENT.md — 环境真相源

环境的唯一权威来源。由看门狗在每次动作执行后更新，Agent 每次推理前读取。

JSON 示例

{
  "schema_version": "v2.0",
  "updated_at": "2025-04-01T12:00:00Z",
  "scene_graph": {
    "nodes": [
      {"id": "table_01", "type": "furniture", "position": [1.2, 0.0, 0.8]},
      {"id": "apple_01", "type": "object", "position": [1.2, 0.75, 0.8], "status": "on_table"}
    ],
    "edges": [
      {"from": "apple_01", "to": "table_01", "relation": "on"}
    ]
  },
  "robots": [
    {
      "robot_id": "franka_001",
      "pose": [0.5, 0.0, 0.0],
      "joint_state": {"joint_1": 0.0, "joint_2": -0.3},
      "gripper": "open",
      "holding": null
    }
  ],
  "objects": [
    {"id": "apple_01", "name": "红苹果", "category": "fruit", "position": [1.2, 0.75, 0.8]}
  ],
  "perception": {
    "camera_rgb": "artifacts/camera/front_001.jpg",
    "depth": "artifacts/camera/front_001_depth.npy"
  }
}

字段	类型	说明
`schema_version`	string	协议版本号（v1 / v2）。v2 新增 `perception` 和 `scene_graph.edges`
`updated_at`	ISO 8601	文件最后写入时间戳
`scene_graph`	object	场景图：`nodes` 记录所有实体，`edges` 记录空间关系
`robots`	array	场景中所有机器人的姿态和关节状态
`objects`	array	场景中所有可交互物体的位置和属性
`perception`	object	感知数据引用（RGB/深度图像路径）。v2 新增
`map`	object	可选：占用栅格地图数据
`tf`	object	可选：坐标系变换树

v1 vs v2 差异

特性	v1	v2
场景图关系	仅 nodes 列表	nodes + edges（空间关系）
感知数据	无	`perception` 字段，支持 RGB/深度引用
多机器人	robot 单对象	`robots` 数组，支持 Fleet
地图数据	无	`map` 和 `tf` 可选字段

EMBODIED.md — 能力画像

描述机器人物理能力的 Markdown 文件。由看门狗在启动时从 hal/profiles/*.md 复制到工作区。

示例

# EMBODIED — Franka Emika Panda

## Identity
- **Robot Model**: Franka Emika Panda
- **DOF**: 7
- **End Effector**: Parallel Jaw Gripper
- **Driver**: rekep_real

## Sensors
- [x] RGB-D Camera (Intel RealSense D435)
- [x] Force-Torque Sensor
- [x] Joint Encoders (7x)

## Supported Actions
| Action Type       | Description              | Parameters              |
|-------------------|--------------------------|-------------------------|
| move_to           | 笛卡尔空间移动到目标点   | target_pose: [x,y,z,r,p,y] |
| pick_up           | 抓取指定物体             | object_id: string       |
| place             | 在指定位置放置物体       | target_position: [x,y,z] |
| target_navigation | 视觉导航到目标物         | target_label: string    |
| real_execute      | 执行自然语言 ReKep 任务  | nl_task: string         |

## Physical Constraints
- **Max Reach**: 0.855 m
- **Max Payload**: 3.0 kg
- **Workspace Volume**: ~1.5 m³

Critic 在校验动作时必须对照 EMBODIED.md：任何超出工作空间范围或载荷限制的动作都会被拒绝。

ACTION.md — 动作队列

Agent 向 Track B 派发动作的 JSON 队列。看门狗轮询此文件取走待处理动作。

JSON 示例

{
  "queue": [
    {
      "action_id": "act_001",
      "action_type": "move_to",
      "params": {
        "target_pose": [0.8, 0.3, 0.5, 0.0, 1.57, 0.0]
      },
      "status": "completed",
      "robot_id": "franka_001",
      "created_at": "2025-04-01T12:00:00Z",
      "completed_at": "2025-04-01T12:00:02Z"
    },
    {
      "action_id": "act_002",
      "action_type": "pick_up",
      "params": {
        "object_id": "apple_01"
      },
      "status": "pending",
      "robot_id": "franka_001",
      "created_at": "2025-04-01T12:00:03Z"
    }
  ]
}

action_type 枚举

类型	说明	适用驱动
`move_to`	笛卡尔空间运动到目标位姿	所有物理驱动
`pick_up`	抓取指定 object_id 的物体	rekep_sim, rekep_real
`place`	在目标位置放置当前持有物体	rekep_sim, rekep_real
`target_navigation`	导航到视觉目标（利用感知反馈）	simulation, go2_edu
`real_execute`	执行自然语言 ReKep 抓取任务	rekep_real
`strategy`	脚本化策略动作（无模型）	simulation

status 状态机

pending → running → completed | failed

Agent 写入 pending；看门狗取走改为 running；执行完毕改为 completed 或 failed。同时写入失败原因和异常堆栈。

SESSIONS.md — Runtime 会话队列

V2 架构中替代 ACTION.md 的会话级协议。Runtime 看门狗读取此文件调度执行会话。

YAML 示例（来自 templates/SESSIONS.md）

sessions:
  - session_id: sess_pick_apple_001
    target_ref: "sim_franka_tabletop"
    skill_ref: "rekep_pick"
    status: pending
    priority: high
    timeouts:
      session: 120
      skill: 60
    retry:
      max_attempts: 3
      backoff: "exponential"
    routing:
      target_adapter: "SimTargetAdapter"
      skill_runtime: "ReKepPolicyRuntime"
    execution:
      params:
        object_id: "apple_01"
        gripper: "parallel_jaw"
    created_at: "2025-04-01T12:00:00Z"

字段	类型	说明
`session_id`	string	会话唯一 ID，格式 `sess_<skill>_<序号>`
`target_ref`	string	引用 TARGETS.md 中已注册的 target id
`skill_ref`	string	引用 SKILLS.md 中已注册的 skill id
`status`	enum	状态：`pending` → `running` → `succeeded` / `failed`
`priority`	enum	优先级：`high` / `normal` / `low`
`timeouts`	object	`session` 整体超时和 `skill` 单步超时（秒）
`retry`	object	`max_attempts` 最大重试次数和 `backoff` 退避策略
`routing`	object	指定使用的 `target_adapter` 和 `skill_runtime`
`execution`	object	执行参数，如 `params`（传入 skill 的键值对）

优先级调度规则

high 优先级会话在 normal 和 low 之前执行
同级优先级按 created_at 时间戳 FIFO 处理
同机器人（target_ref 相同）的会话串行执行，不同机器人的会话可并行

TARGETS.md — Runtime Target 注册表

注册所有可用的 rollout target（仿真环境或真机机器人）。Runtime 看门狗根据此文件路由会话。

YAML 示例（来自 templates/TARGETS.md）

targets:
  - id: sim_franka_tabletop
    type: sim
    enabled: true
    backend: mujoco
    supported_skills:
      - rekep_pick
      - rekep_place
      - openvla_manipulation
    adapter:
      class: SimTargetAdapter
      obs_format: mujoco_standard
    perception:
      cameras:
        - name: front
          resolution: [640, 480]
          fps: 30
        - name: wrist
          resolution: [640, 480]
          fps: 30
    config:
      scene_xml: /path/to/tabletop.xml

  - id: real_franka_lab
    type: real_robot
    enabled: true
    supported_skills:
      - rekep_pick
    adapter:
      class: RealRobotTargetAdapter
      driver: franka
    perception:
      cameras:
        - name: front
          device: "/dev/video0"
          resolution: [1280, 720]
    config:
      ip: "192.168.1.100"
      control_mode: "joint_position"

字段	类型	说明
`id`	string	Target 唯一 ID，SESSIONS.md 中 `target_ref` 引用此值
`type`	enum	`sim` 或 `real_robot`
`backend`	string	仿真后端（`mujoco`、`maniskill`、`isaac_sim`）
`enabled`	bool	是否启用此 target。设为 `false` 可临时下线
`supported_skills`	array	此 target 支持的 skill id 列表
`adapter`	object	指定 `class` 类名和 `obs_format` 观察格式
`perception`	object	`cameras` 阵列配置（分辨率、FPS、设备路径）
`config`	object	Target 专属配置（场景文件、IP 地址等）

SKILLS.md — Runtime Skill 注册表

注册所有可用的技能，定义每项技能的运行时要求、策略客户端和环境契约。

YAML 示例（来自 templates/SKILLS.md）

skills:
  - id: rekep_pick
    category: manipulation
    description: "基于 ReKep 的抓取技能，支持自然语言目标描述"
    runtime: ReKepPolicyRuntime
    supported_target_types:
      - sim
      - real_robot
    policy_client:
      type: http
      endpoint: "http://localhost:8765/predict"
    requires:
      sensors:
        - rgb_camera
        - depth_camera
        - joint_states
      environment_outputs:
        - scene_graph
        - perception_data
      strict_environment_contract: true

  - id: openvla_manipulation
    category: manipulation
    description: "基于 OpenVLA 的通用操作技能"
    runtime: VLAPolicyRuntime
    supported_target_types:
      - sim
    policy_client:
      type: local
      checkpoint: "/models/openvla-7b"
    requires:
      sensors:
        - rgb_camera
      environment_outputs:
        - scene_graph
      strict_environment_contract: true

字段	类型	说明
`id`	string	Skill 唯一 ID，SESSIONS.md 中 `skill_ref` 引用此值
`category`	string	技能分类：`manipulation`、`navigation`、`perception`
`runtime`	string	SkillRuntime 类名，定义执行该技能的算法运行时
`supported_target_types`	array	支持的 target 类型：`sim`、`real_robot`
`policy_client`	object	策略客户端配置：`type`（http/local）、`endpoint` 或 `checkpoint`
`requires.sensors`	array	所需传感器列表（rgb_camera、depth_camera、joint_states）
`requires.environment_outputs`	array	runtime 需要从环境获取的数据类型
`requires.strict_environment_contract`	bool	如果为 true，缺少任何要求的传感器则拒绝执行

TASK.md — 长程任务拆解

Agent 将用户的长程指令拆解为子任务，记录进度，Critic 据此评估整体完成度。

示例

# 任务：整理桌面

| 子任务 | 动作 | 目标 | 状态 | 备注 |
|--------|------|------|------|------|
| 1 | 导航到桌子 | 到达 table_01 前方 | ✅ done | |
| 2 | 抓取苹果 | 拿起 apple_01 | ✅ done | 使用 ReKep |
| 3 | 放置到果篮 | 将苹果放入篮子 | ⏳ running | 需要精确定位 |
| 4 | 抓取杯子 | 拿起 cup_01 | ⬜ pending | |
| 5 | 放到杯架 | 放置到 cup_holder | ⬜ pending | |

**总体进度**: 2/5 (40%)

ORCHESTRATOR.md — 全局调度面板

Fleet 模式下，Orchestrator 在此文件中维护全局任务分配和机器人调度计划。

# 调度面板

## 活跃任务
| 任务 | 分配至 | 优先级 | 状态 |
|------|--------|--------|------|
| 清理桌面 | franka_001 | high | running |
| 巡逻监控 | go2_edu_001 | normal | running |

## 资源池
| 机器人 | 当前状态 | 最后心跳 | 当前任务 |
|--------|----------|----------|----------|
| franka_001 | busy | 12:00:15 | 清理桌面 |
| go2_edu_001 | busy | 12:00:12 | 巡逻监控 |

## 待分配
| 任务 | 要求 | 优先级 | 排队时间 |
|------|------|--------|----------|
| 送水服务 | mobile+manipulation | normal | 12:00:20 |

LESSONS.md — 失败经验记录

Critic 拒绝动作时写入的经验日志。Agent 在执行前搜索此文件避免重复已知失败模式。

# LESSONS

## 2025-04-01 12:00:05 — 抓取失败：物体超出工作空间
- **动作**: pick_up apple_01
- **原因**: 目标位置 [1.8, 0.75, 0.8] 超出 Franka 最大臂展 (0.855m)
- **Critic 拒绝**: EMBODIED.md Physical Constraints 校验失败
- **修复**: 先执行 move_to 将机器人导航到更近位置

## 2025-04-01 11:55:00 — 导航碰撞：路径被障碍物阻断
- **动作**: target_navigation to kitchen_counter
- **原因**: 直接路径上有椅子阻挡
- **Critic 拒绝**: 未考虑障碍物，建议添加中间路径点
- **修复**: 拆解为经过 hallway_midpoint 的多段导航

自进化核心：LESSONS.md 是 PhyAgentOS 的失败经验库。Agent 通过 search_lessons 工具检索历史失败模式，避免重复错误。

谁读谁写速查表

各组件对协议文件的读写权限矩阵。R = 读取，W = 写入。

文件	Planner (Agent)	Critic	Watchdog (Track B)	Orchestrator	Runtime Watchdog
`ENVIRONMENT.md`	R	R	W	R	R
`EMBODIED.md`	R	R	W (启动时写入)	—	—
`ACTION.md`	W	R	R + W	—	—
`SESSIONS.md`	W	—	—	W	R + W
`TARGETS.md`	R	—	—	R	R
`SKILLS.md`	R	—	—	R	R
`TASK.md`	W	R	—	R	—
`ORCHESTRATOR.md`	—	—	—	W	—
`LESSONS.md`	R	W	—	R	—
`ROBOTS.md`	R	—	R	R	—

文件生命周期

从 paos onboard 到状态回写的完整时序说明。

1. paos onboard — 初始化

创建 ~/.PhyAgentOS/config.json 和 workspace/ 目录。生成空模板：ENVIRONMENT.md、EMBODIED.md、ACTION.md、LESSONS.md。

2. Watchdog 启动 — 环境初始化

HAL 看门狗启动驱动，调用 driver.observe() 获取初始场景状态，写入 ENVIRONMENT.md。从 hal/profiles/ 复制对应 EMBODIED.md。

3. Agent 启动 — 进入 Planner-Critic 循环

每轮：Planner 读取 ENVIRONMENT.md + EMBODIED.md + TASK.md + LESSONS.md → 生成动作计划 → Critic 对照 EMBODIED.md 校验 → 通过则写入 ACTION.md（status: pending）。

4. Watchdog 轮询 — 执行动作

轮询 ACTION.md，取走 pending 动作 → 改为 running → 驱动调用 driver.execute(action) → 执行完毕改为 completed/failed。

5. 状态回写 — 更新 ENVIRONMENT.md

驱动在每次执行后调用 driver.observe()，Watchdog 将最新场景图写入 ENVIRONMENT.md。Agent 在下一轮循环读取最新状态。

6. Critic 拒绝 — 写入 LESSONS.md

如果 Critic 拒绝动作（违反 EMBODIED.md 约束或无安全路径），将拒绝原因和上下文写入 LESSONS.md。Agent 的 search_lessons 工具在后续轮次中检索此记录。