feat: add subagent supervision, Docker isolation, final SKILL.md wiring #27
No reviewers
Labels
No labels
community-feedback
enhancement
skill-upgrade
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
magnus/agent-skills!27
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/supervision-docker-skill-wiring"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 3 — Closes #22
references/subagent-experiment-supervision.mdreferences/docker-experiment-isolation.mdscripts/Dockerfile— test image for the full suiteSKILL.mdupdates:Test results: 22/22 passing (supervision + Docker build)
First-Pass Code Review — Jasper (automated)
Overall
Solid PR with well-structured references and thorough test coverage. The supervision architecture and Docker isolation docs are production-ready in terms of content. A few issues to address before merge.
Security
fix_pip_installuses log-derived module name directly inpip install— If an attacker controls experiment log content, they could inject arbitrary package names. Mitigation: validate the extracted module name against a known-safe pattern (e.g.,^[a-zA-Z_][a-zA-Z0-9_.-]*$).subprocess.run(..., check=False)means install failures are silently swallowed. Addcheck=Trueor at minimum log the failure.Logic Errors
apply_fix()dispatcher — The supervision loop callsapply_fix(fix, experiment_cmd)but noapply_fixfunction is defined in the diff. The fix functions are namedfix_reduce_batch_size,fix_gradient_clipping, etc. A dispatch dict mapping fix names to functions is needed.fix_pip_installhas incompatible signature — Returnsstrwhile all other fix functions returnlist(modified cmd). This would break a genericapply_fixdispatcher. It also doesn't modify the command at all — only installs the package and returns a message string, but the retry loop expects a modified command list.supervise_experiment— The monitoring loop openslog_pathin write mode ("w") which truncates the file. If the subprocess is slow to start writing (e.g., long Python import time), a fast supervisor cycle could read an empty truncated log. Not a critical bug but worth noting.Style & Quality
test_supervision_protocol.sh— The failure-count test has a fragile||chain: ifgrep -c '| \*\*'matches markdown formatting incidentally, the test passes regardless of count. Consider a more precise regex for the failure table row detection.set -euo pipefailon all scripts, structured pitfall tables, cross-referencing between files.Suggestions
apply_fixdispatch dict to the supervision codefix_pip_installsignature with the other fix functionsscripts/apply_fix.pyas a standalone utilityReview by Jasper (automated first-pass)
Jasper (automated review)
First-pass code review for PR #27.
Security
subagent-experiment-supervision.mdshows a Telegram notification pattern with inline placeholders. Recommend env-var-based pattern instead.fix_fallback_cpu()mutates global os.environ — Modifyingos.environ["CUDA_VISIBLE_DEVICES"]is a global side effect. Usesubprocess.Popen(env=...)with a modified copy instead.Logic / Correctness
--storage-opt size=50GBis storage-driver-dependent — Only supported on devicemapper/btrfs/zfs. On overlay2 (default on modern Linux), not supported atdocker runtime. Consider noting this limitation.NUM_GPUSis never defined infor trial in {1..8}; do GPU_ID=$((trial % NUM_GPUS)).start_timeis captured once before the retry loop, so timeout counts across all attempts including time spent applying fixes. May be intentional but should be documented.Style / Docs
detect-compute.pyreferenced but not in this PR — SKILL.md lists it under Available Resources but it is not included here, creating a broken reference.fix_reduce_batch_size()limited CLI variant coverage — Only handles--batch-sizeand--batch_size. Many frameworks use-b,--batch, or env vars. The fallback (appending--batch_size 16) may not work with all scripts.Review by Jasper (automated review bot). Findings for reference — no action required on already-merged PR.