feat: add data-scientist skill — PhD-level data science expertise for agents #21
No reviewers
Labels
No labels
community-feedback
enhancement
skill-upgrade
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
magnus/agent-skills!21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/data-scientist-skill"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
New skill: data-scientist — PhD-level expertise in data science, statistics, and machine learning, packaged as an Agent Skills compatible skill.
Files
data-scientist/SKILL.md— Decision framework, core competencies, statistical philosophy, 7-question classifier, communication standardsdata-scientist/references/*.md— 5 reference documents: statistical methodology, experimental design, causal inference (DAGs + potential outcomes), regression modeling, Bayesian workflowdata-scientist/scripts/*.py— 5 automation scripts with Python default and--engine rflag for R outputdata-scientist/assets/*.md— Analysis report and experimental plan templatesAGENTS.md— Added trigger table entryREADME.md— Added skill index entryKey design decisions
--engine rflagFirst-Pass Review: data-scientist skill
Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs for a data science analysis workflow. A few things to clean up before merge:
1. AGENTS.md table formatting (lines 60-62)
The three new/changed rows use
||(double pipe) instead of|(single pipe) at the start. All other rows in the table — including the header and the surrounding unchanged rows — use single|. The double pipe creates an extra empty column in the rendered markdown table.Fix: Change
||to|on lines 60-62.2. power-analysis.py: Dead code in solve_power_prop (line 88)
p_bar = (p1 + ratio * p2) / (1 + ratio)is computed but never used in the function body. If the intent was a pooled-variance formula, consider using it; otherwise remove the dead variable.3. power-analysis.py:
/ 1no-op (line 91)p1 * (1 - p1) / 1— dividing by 1 is a no-op that makes the formula harder to read. Theratioadjustment already handles unequal groups. Recommend simplifying top1 * (1 - p1).4. Scripts: missing executable permissions
All 5 scripts have
#!/usr/bin/env python3shebangs but lack executable bits (chmod +x). Consider adding and committing.5. (Minor) supports-metadata naming
The
metadata.skillslist uses domain labels (statistics,machine-learning) that are not actual skill names in this repo. This is valid per spec (they are node labels, not cross-references), but worth noting for consistency with other skills in this repo.Review by Jasper (automated review)
First-Pass Review: data-scientist skill
Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs. A few things to address:
1. AGENTS.md table formatting: Three rows use
||instead of|.2. power-analysis.py: Dead
p_barvariable + confusing/ 1no-op.3. Scripts: Missing executable permissions on all 5 scripts.
4. metadata.skills: Domain labels (statistics, machine-learning) not actual skill names.
See inline comments for line-level details.
Formatting: Double pipe
||creates an extra empty column in the rendered markdown table. All other rows use single|. Change||→|on this and the two similar rows below.Same issue —
||should be|for consistent table rendering.Same issue —
||should be|.Power analysis script review — two minor issues.
Dead code:
p_baris computed here but never used in the function body. If the intent was a pooled-variance formula, consider using it instead of the unpooled version. Otherwise remove this line.No-op:
p1 * (1 - p1) / 1— dividing by 1 has no effect and makes the formula harder to read. Theratioparameter already handles unequal group sizes. Recommend simplifying top1 * (1-p1).