PhyAgentOS — Troubleshooting

Debugging Methodology

When something goes wrong, verify each layer in order. Don't skip steps — each layer depends on the one before it.

1

Can it import?

python -c "from hal.drivers.simulation import SimulationDriver" — verify all dependencies resolve.

2

Can it start?

Launch the component on its own. Does it reach a ready state without immediately crashing?

3

Can it execute?

Feed it a minimal valid input. Does it produce output without errors?

4

Can it write back?

Verify the output lands in the correct state file with the expected format.

Installation Issues

Symptom	Cause	Solution
`pip install` fails	Python version below 3.11	Check with `python --version`. Install Python ≥ 3.11.
`paos: command not found`	Editable install not applied or in wrong environment	Run `pip install -e .` again in the project root. Verify with `which paos`.
Conda environment conflicts	Dependency version clashes from old env	Create a fresh conda environment: `conda create -n phyagentos python=3.11` then reinstall.
`ModuleNotFoundError: hal`	PYTHONPATH doesn't include project root	Ensure you run commands from the project root directory, or set `export PYTHONPATH=.`

Watchdog Issues

Symptom	Cause	Solution
Driver not found	Driver name typo or plugin not registered	Verify driver name. For plugins, run the deploy script and check `PhyAgentOS_plugin.toml`.
Profile not installed	`get_profile_path()` returns None or invalid path	Check the driver's `get_profile_path()` implementation. The EMBODIED.md file must exist at the returned path.
Connection timeout	Driver `connect()` failed — network or hardware issue	Check network connectivity to the robot. Verify IP/port in driver config. Try pinging the robot.
ACTION.md format error	Invalid JSON structure or `status` not `"pending"`	Verify JSON is valid. The `status` field must be `"pending"` for the Watchdog to consume the action.
Port conflicts	Multiple Watchdogs on the same port	Only one Watchdog instance per workspace. Use different workspaces or `--driver-config` with different ports per instance.

Agent Issues

Symptom	Cause	Solution
LLM API call fails	Missing or invalid `api_key` in config.json	Verify `api_key` is set in `~/.PhyAgentOS/config.json`. Check your LLM provider's API status.
Tool not available	Tool not configured in tools config	Check the `tools` section in config.json. Ensure the tool name matches the registered tool class.
Critic rejects all actions	EMBODIED.md doesn't match robot capabilities, or LESSONS.md blocking patterns exist	Check EMBODIED.md for accuracy — joint ranges, payload limits, workspace boundaries. Check LESSONS.md for accumulated rejection patterns. Try resetting LESSONS.md if it's polluted.
Agent stuck / no response	Watchdog not running or not consuming ACTION.md	Verify the Watchdog is running and consuming actions. Check ACTION.md status — if stuck at `executing`, the Watchdog may be hung on a long-running action.

Fleet Issues

Symptom	Cause	Solution
Multiple Watchdog port conflict	Two Watchdog instances sharing a network port	Use `--driver-config` to assign unique ports per instance. Each robot should have its own workspace.
ROBOTS.md not updating	Watchdog hasn't refreshed or write permissions issue	Restart the affected Watchdog. Check write permissions on the shared workspace directory.
Wrong ACTION.md target	`robot_id` in action parameters doesn't match intended robot	Verify `robot_id` in each action matches the intended robot's config. Check ORCHESTRATOR.md for task assignment.
Shared state not visible	ENVIRONMENT.md in shared workspace is stale or locked	Check ENVIRONMENT.md in the shared workspace. Verify each Watchdog is writing to its `robots.<robot_id>` key.

Runtime Issues

Symptom	Cause	Solution
Preflight fails	Sensor config YAML paths incorrect or calibration files missing	Verify paths in `configs/runtime/sensors/*.yaml` point to real files. Check calibration files exist.
Session marked `rejected`	TARGETS.md perception config incompatible, or skill requirements unmet	Check `perception` section in TARGETS.md. Verify SKILLS.md declares the sensors the skill needs. Check preflight error details.
No `episode.json` generated	Session didn't complete — check status in SESSIONS.md	Look at SESSIONS.md for the session's final status. Check LOG.md for error messages.
Perception outputs missing	Pipeline doesn't cover required output channels	Verify the perception YAML config includes all required output channels. Check the model's output keys match the pipeline config.

File Inspection Checklist

When you can't identify the root cause, inspect each state file in order. This is the most reliable fallback debugging method.

ENVIRONMENT.md — Is the robot state correct? Look for stale poses, wrong connection_state, or missing sensor data. If state is frozen, the Watchdog may have stopped updating.
ACTION.md — Are there pending actions? If the queue is growing, the Watchdog isn't consuming them. Check the Watchdog terminal for errors.
EMBODIED.md — Does the profile match the actual robot? Joint ranges, control modes, and payload limits must reflect reality. A mismatched profile causes the Critic to reject valid actions.
LESSONS.md — Any recent rejections? Each rejection includes a timestamp, robot_id, and reason. Scan for patterns — repeated rejections of the same action type indicate a systemic config issue.
SESSIONS.md (runtime path only) — Check session statuses. If all are rejected, the preflight or config is the problem. If timeout, check execute_timeout values.
LOG.md (runtime path only) — Look at the most recent entries. LOG.md records every session outcome with error details.

Pro tip: Use watch -n 1 'cat ACTION.md' to monitor a file in real time. Combine with separate Watchdog and Agent terminals to see the full loop.

Log Locations

Component	Log Location	What to look for
Watchdog	Terminal output (stdout)	Action consumption cycles, driver connect/disconnect events, execution errors
Agent	Terminal output (stdout)	LLM response times, tool call traces, Critic validation results
Runtime (LOG.md)	`<workspace>/LOG.md`	Session outcome table, error messages, return values
Gateway	Terminal output (stdout)	Channel connection/disconnection, message routing logs
Channels	Depends on channel implementation	Refer to the channel's own logging configuration (file or stdout)