Debugging Methodology
When something goes wrong, verify each layer in order. Don't skip steps — each layer depends on the one before it.
1
Can it import?
python -c "from hal.drivers.simulation import SimulationDriver" — verify all dependencies resolve.
2
Can it start?
Launch the component on its own. Does it reach a ready state without immediately crashing?
3
Can it execute?
Feed it a minimal valid input. Does it produce output without errors?
4
Can it write back?
Verify the output lands in the correct state file with the expected format.
Installation Issues
| Symptom | Cause | Solution |
|---|---|---|
pip install fails |
Python version below 3.11 | Check with python --version. Install Python ≥ 3.11. |
paos: command not found |
Editable install not applied or in wrong environment | Run pip install -e . again in the project root. Verify with which paos. |
| Conda environment conflicts | Dependency version clashes from old env | Create a fresh conda environment: conda create -n phyagentos python=3.11 then reinstall. |
ModuleNotFoundError: hal |
PYTHONPATH doesn't include project root | Ensure you run commands from the project root directory, or set export PYTHONPATH=. |
Watchdog Issues
| Symptom | Cause | Solution |
|---|---|---|
| Driver not found | Driver name typo or plugin not registered | Verify driver name. For plugins, run the deploy script and check PhyAgentOS_plugin.toml. |
| Profile not installed | get_profile_path() returns None or invalid path |
Check the driver's get_profile_path() implementation. The EMBODIED.md file must exist at the returned path. |
| Connection timeout | Driver connect() failed — network or hardware issue |
Check network connectivity to the robot. Verify IP/port in driver config. Try pinging the robot. |
| ACTION.md format error | Invalid JSON structure or status not "pending" |
Verify JSON is valid. The status field must be "pending" for the Watchdog to consume the action. |
| Port conflicts | Multiple Watchdogs on the same port | Only one Watchdog instance per workspace. Use different workspaces or --driver-config with different ports per instance. |
Agent Issues
| Symptom | Cause | Solution |
|---|---|---|
| LLM API call fails | Missing or invalid api_key in config.json |
Verify api_key is set in ~/.PhyAgentOS/config.json. Check your LLM provider's API status. |
| Tool not available | Tool not configured in tools config | Check the tools section in config.json. Ensure the tool name matches the registered tool class. |
| Critic rejects all actions | EMBODIED.md doesn't match robot capabilities, or LESSONS.md blocking patterns exist | Check EMBODIED.md for accuracy — joint ranges, payload limits, workspace boundaries. Check LESSONS.md for accumulated rejection patterns. Try resetting LESSONS.md if it's polluted. |
| Agent stuck / no response | Watchdog not running or not consuming ACTION.md | Verify the Watchdog is running and consuming actions. Check ACTION.md status — if stuck at executing, the Watchdog may be hung on a long-running action. |
Fleet Issues
| Symptom | Cause | Solution |
|---|---|---|
| Multiple Watchdog port conflict | Two Watchdog instances sharing a network port | Use --driver-config to assign unique ports per instance. Each robot should have its own workspace. |
| ROBOTS.md not updating | Watchdog hasn't refreshed or write permissions issue | Restart the affected Watchdog. Check write permissions on the shared workspace directory. |
| Wrong ACTION.md target | robot_id in action parameters doesn't match intended robot |
Verify robot_id in each action matches the intended robot's config. Check ORCHESTRATOR.md for task assignment. |
| Shared state not visible | ENVIRONMENT.md in shared workspace is stale or locked | Check ENVIRONMENT.md in the shared workspace. Verify each Watchdog is writing to its robots.<robot_id> key. |
Runtime Issues
| Symptom | Cause | Solution |
|---|---|---|
| Preflight fails | Sensor config YAML paths incorrect or calibration files missing | Verify paths in configs/runtime/sensors/*.yaml point to real files. Check calibration files exist. |
Session marked rejected |
TARGETS.md perception config incompatible, or skill requirements unmet | Check perception section in TARGETS.md. Verify SKILLS.md declares the sensors the skill needs. Check preflight error details. |
No episode.json generated |
Session didn't complete — check status in SESSIONS.md | Look at SESSIONS.md for the session's final status. Check LOG.md for error messages. |
| Perception outputs missing | Pipeline doesn't cover required output channels | Verify the perception YAML config includes all required output channels. Check the model's output keys match the pipeline config. |
File Inspection Checklist
When you can't identify the root cause, inspect each state file in order. This is the most reliable fallback debugging method.
- ENVIRONMENT.md — Is the robot state correct? Look for stale poses, wrong connection_state, or missing sensor data. If state is frozen, the Watchdog may have stopped updating.
- ACTION.md — Are there pending actions? If the queue is growing, the Watchdog isn't consuming them. Check the Watchdog terminal for errors.
- EMBODIED.md — Does the profile match the actual robot? Joint ranges, control modes, and payload limits must reflect reality. A mismatched profile causes the Critic to reject valid actions.
- LESSONS.md — Any recent rejections? Each rejection includes a timestamp, robot_id, and reason. Scan for patterns — repeated rejections of the same action type indicate a systemic config issue.
-
SESSIONS.md (runtime path only) — Check session statuses. If all are
rejected, the preflight or config is the problem. Iftimeout, check execute_timeout values. - LOG.md (runtime path only) — Look at the most recent entries. LOG.md records every session outcome with error details.
Pro tip: Use
watch -n 1 'cat ACTION.md' to monitor a file in real time. Combine with separate Watchdog and Agent terminals to see the full loop.
Log Locations
| Component | Log Location | What to look for |
|---|---|---|
| Watchdog | Terminal output (stdout) | Action consumption cycles, driver connect/disconnect events, execution errors |
| Agent | Terminal output (stdout) | LLM response times, tool call traces, Critic validation results |
| Runtime (LOG.md) | <workspace>/LOG.md |
Session outcome table, error messages, return values |
| Gateway | Terminal output (stdout) | Channel connection/disconnection, message routing logs |
| Channels | Depends on channel implementation | Refer to the channel's own logging configuration (file or stdout) |