magnus/agent-skills

Fork 0

feat: add data-scientist skill — PhD-level data science expertise for agents #21

Merged

magnus merged 1 commit from feature/data-scientist-skill into main

2026-05-22 16:36:38 -04:00

jasper commented

2026-05-22 16:35:38 -04:00

Contributor

Summary

New skill: data-scientist — PhD-level expertise in data science, statistics, and machine learning, packaged as an Agent Skills compatible skill.

Files

data-scientist/SKILL.md — Decision framework, core competencies, statistical philosophy, 7-question classifier, communication standards
data-scientist/references/*.md — 5 reference documents: statistical methodology, experimental design, causal inference (DAGs + potential outcomes), regression modeling, Bayesian workflow
data-scientist/scripts/*.py — 5 automation scripts with Python default and --engine r flag for R output
data-scientist/assets/*.md — Analysis report and experimental plan templates
AGENTS.md — Added trigger table entry
README.md — Added skill index entry

Key design decisions

Decision framework classifies questions into 7 types (advice, analysis, research, design, review, methodology, clarify) with appropriate rigor per type
Causal inference covers both graphical (DAGs/Pearl) and potential outcomes (Rubin) traditions
Scripts support Python + R via --engine r flag
No personal infrastructure references — ready for global distribution
SKILL.md is 220 lines / ~3092 tokens (under spec limits)

## Summary New skill: **data-scientist** — PhD-level expertise in data science, statistics, and machine learning, packaged as an Agent Skills compatible skill. ### Files - `data-scientist/SKILL.md` — Decision framework, core competencies, statistical philosophy, 7-question classifier, communication standards - `data-scientist/references/*.md` — 5 reference documents: statistical methodology, experimental design, causal inference (DAGs + potential outcomes), regression modeling, Bayesian workflow - `data-scientist/scripts/*.py` — 5 automation scripts with Python default and `--engine r` flag for R output - `data-scientist/assets/*.md` — Analysis report and experimental plan templates - `AGENTS.md` — Added trigger table entry - `README.md` — Added skill index entry ### Key design decisions - Decision framework classifies questions into 7 types (advice, analysis, research, design, review, methodology, clarify) with appropriate rigor per type - Causal inference covers both graphical (DAGs/Pearl) and potential outcomes (Rubin) traditions - Scripts support Python + R via `--engine r` flag - No personal infrastructure references — ready for global distribution - SKILL.md is 220 lines / ~3092 tokens (under spec limits)

jasper added 1 commit

2026-05-22 16:35:38 -04:00

feat: add data-scientist skill 487f8923dc

PhD-level data science expertise with decision framework, five reference
documents (statistical methodology, experimental design, causal inference,
regression modeling, Bayesian workflow), five automation scripts (power
analysis, assumption diagnostics, model comparison, effect size calculator,
experimental design generator), and two report templates.

Python default with --engine r flag for R output. Dual language support.

magnus merged commit 975f295c94 into main

2026-05-22 16:36:38 -04:00

magnus referenced this pull request from a commit

2026-05-22 16:36:39 -04:00

Merge pull request 'feat: add data-scientist skill — PhD-level data science expertise for agents' (#21) from feature/data-scientist-skill into main

jasper reviewed

2026-05-22 16:39:03 -04:00

jasper left a comment

First-Pass Review: data-scientist skill

Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs for a data science analysis workflow. A few things to clean up before merge:

1. AGENTS.md table formatting (lines 60-62)

The three new/changed rows use || (double pipe) instead of | (single pipe) at the start. All other rows in the table — including the header and the surrounding unchanged rows — use single |. The double pipe creates an extra empty column in the rendered markdown table.

Fix: Change || to | on lines 60-62.

2. power-analysis.py: Dead code in solve_power_prop (line 88)

p_bar = (p1 + ratio * p2) / (1 + ratio) is computed but never used in the function body. If the intent was a pooled-variance formula, consider using it; otherwise remove the dead variable.

3. power-analysis.py: `/ 1` no-op (line 91)

p1 * (1 - p1) / 1 — dividing by 1 is a no-op that makes the formula harder to read. The ratio adjustment already handles unequal groups. Recommend simplifying to p1 * (1 - p1).

4. Scripts: missing executable permissions

All 5 scripts have #!/usr/bin/env python3 shebangs but lack executable bits (chmod +x). Consider adding and committing.

5. (Minor) supports-metadata naming

The metadata.skills list uses domain labels (statistics, machine-learning) that are not actual skill names in this repo. This is valid per spec (they are node labels, not cross-references), but worth noting for consistency with other skills in this repo.

Review by Jasper (automated review)

## First-Pass Review: data-scientist skill Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs for a data science analysis workflow. A few things to clean up before merge: ### 1. AGENTS.md table formatting (lines 60-62) The three new/changed rows use `||` (double pipe) instead of `|` (single pipe) at the start. All other rows in the table — including the header and the surrounding unchanged rows — use single `|`. The double pipe creates an extra empty column in the rendered markdown table. **Fix:** Change `||` to `|` on lines 60-62. ### 2. power-analysis.py: Dead code in solve_power_prop (line 88) `p_bar = (p1 + ratio * p2) / (1 + ratio)` is computed but never used in the function body. If the intent was a pooled-variance formula, consider using it; otherwise remove the dead variable. ### 3. power-analysis.py: `/ 1` no-op (line 91) `p1 * (1 - p1) / 1` — dividing by 1 is a no-op that makes the formula harder to read. The `ratio` adjustment already handles unequal groups. Recommend simplifying to `p1 * (1 - p1)`. ### 4. Scripts: missing executable permissions All 5 scripts have `#!/usr/bin/env python3` shebangs but lack executable bits (`chmod +x`). Consider adding and committing. ### 5. (Minor) supports-metadata naming The `metadata.skills` list uses domain labels (`statistics`, `machine-learning`) that are not actual skill names in this repo. This is valid per spec (they are node labels, not cross-references), but worth noting for consistency with other skills in this repo. --- *Review by Jasper (automated review)*

jasper reviewed

2026-05-22 16:39:10 -04:00

jasper left a comment

First-Pass Review: data-scientist skill

Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs. A few things to address:

1. AGENTS.md table formatting: Three rows use || instead of |.
2. power-analysis.py: Dead p_bar variable + confusing / 1 no-op.
3. Scripts: Missing executable permissions on all 5 scripts.
4. metadata.skills: Domain labels (statistics, machine-learning) not actual skill names.

See inline comments for line-level details.

## First-Pass Review: data-scientist skill Overall, this is a well-constructed skill. The structure follows Agent Skills conventions, the SKILL.md is comprehensive at ~3K tokens, and the 5 scripts cover real needs. A few things to address: **1. AGENTS.md table formatting:** Three rows use `||` instead of `|`. **2. power-analysis.py:** Dead `p_bar` variable + confusing `/ 1` no-op. **3. Scripts:** Missing executable permissions on all 5 scripts. **4. metadata.skills:** Domain labels (statistics, machine-learning) not actual skill names. See inline comments for line-level details.

AGENTS.md

Formatting: Double pipe || creates an extra empty column in the rendered markdown table. All other rows use single |. Change || → | on this and the two similar rows below.

**Formatting:** Double pipe `||` creates an extra empty column in the rendered markdown table. All other rows use single `|`. Change `||` → `|` on this and the two similar rows below.

Same issue — || should be | for consistent table rendering.

Same issue — `||` should be `|` for consistent table rendering.

Same issue — || should be |.

Same issue — `||` should be `|`.

jasper reviewed

2026-05-22 16:39:15 -04:00

jasper left a comment

Power analysis script review — two minor issues.

data-scientist/scripts/power-analysis.py

Dead code: p_bar is computed here but never used in the function body. If the intent was a pooled-variance formula, consider using it instead of the unpooled version. Otherwise remove this line.

**Dead code:** `p_bar` is computed here but never used in the function body. If the intent was a pooled-variance formula, consider using it instead of the unpooled version. Otherwise remove this line.

No-op: p1 * (1 - p1) / 1 — dividing by 1 has no effect and makes the formula harder to read. The ratio parameter already handles unequal group sizes. Recommend simplifying to p1 * (1-p1).

**No-op:** `p1 * (1 - p1) / 1` — dividing by 1 has no effect and makes the formula harder to read. The `ratio` parameter already handles unequal group sizes. Recommend simplifying to `p1 * (1-p1)`.

No reviewers

No labels

community-feedback

enhancement

skill-upgrade

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

magnus/agent-skills!21

No description provided.

Rows
Columns

feat: add data-scientist skill — PhD-level data science expertise for agents #21

Summary

Files

Key design decisions

First-Pass Review: data-scientist skill

1. AGENTS.md table formatting (lines 60-62)

2. power-analysis.py: Dead code in solve_power_prop (line 88)

3. power-analysis.py: / 1 no-op (line 91)

4. Scripts: missing executable permissions

5. (Minor) supports-metadata naming

First-Pass Review: data-scientist skill

3. power-analysis.py: `/ 1` no-op (line 91)