Data-Scientist Skill: Researched PyTorch + Scikit-Learn + DS Coding Workflow References #23
Labels
No labels
community-feedback
enhancement
skill-upgrade
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
magnus/agent-skills#23
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Researched References: PyTorch + Scikit-Learn + Data Science Coding Workflow
The data-scientist skill currently provides strong methodology guidance (which test to use, how to structure an analysis) but lacks integrated code-level expertise — the agent has no researched reference to reach for when it needs to write a PyTorch training loop, compose an sklearn pipeline, or set up an experiment directory.
Deliverables
Three new reference documents, researched from current documentation and best practices (not generated from parametric knowledge):
1.
references/pytorch-integration.mdResearched from pytorch.org/docs/stable and current best practices. Covers:
torch.device,.to(device),acceleratorpattern, MPS support detectionDataset, collate functions, worker configuration, shuffling, pin_memorystate_dict, full model save,torch.save/torch.load, checkpointing with optimizer statetorch.cuda.amp.autocast+GradScalerDistributedDataParalleloverview (when needed)torchvision.modelsfeature extraction vs fine-tuning, layer freezingSource validation: Verify each pattern against current PyTorch 2.x docs. Include version compatibility notes.
2.
references/sklearn-integration.mdResearched from scikit-learn.org/stable and current best practices. Covers:
Pipeline,make_pipeline, named steps,set_paramsfor grid searchColumnTransformer.make_column_selectorStandardScaler,OneHotEncoder,OrdinalEncoder,SimpleImputer, iterative imputationGridSearchCV,RandomizedSearchCV, customParameterGrid, nested CVBaseEstimator+TransformerMixinprotocoljoblib.dump/load, version compatibility between sklearn versionsFunctionTransformerCalibratedClassifierCVfor probability calibrationSource validation: Verify against current sklearn 1.6+ API. Note deprecations and removals.
3.
references/data-science-coding-workflow.mdResearched from DS project conventions and reproducibility best practices. Covers:
Source validation: Reference established case studies and DS project templates (Cookiecutter Data Science, MLflow examples, DVC documentation).
Files Modified
data-scientist/SKILL.md— add all three references to Available Resources section; update compatibility to explicitly call out PyTorch/sklearn integrationdata-scientist/README.md— add reference summaries if it's being kept in syncOut of Scope
Researched, Not Generated
Each reference must be validated against current documentation (pytorch.org, scikit-learn.org, established blog posts from authoritative sources). The goal is to provide the agent with information that is more current than its training cutoff and more specific to the DS campaign workflow than general knowledge.
Triage — Jasper (automated)
Label:
skill-upgrade— adds researched reference documents to the existing data-scientist skill.Assessment: Well-scoped issue. The three proposed references address a genuine gap: the skill currently has strong methodology guidance but no code-level references for PyTorch training loops, sklearn pipelines, or DS project workflows.
Notes:
Suggested approach:
Size: Medium (~3 reference docs + minor SKILL.md updates)
Triage
Label:
skill-upgrade— well-scoped enhancement to the data-scientist skill adding three researched reference documents for code-level expertise.Assessment: Well-structured issue with clear deliverables, source validation requirements, and explicit out-of-scope boundaries. The "researched, not generated" constraint is correctly emphasized.
Recommendation: Ready for implementation. Each reference should be validated against live documentation (pytorch.org, scikit-learn.org, DS project templates) rather than generated from parametric knowledge.
— Jasper (automated)
Delivered via PR #26:
66/66 validation tests passing. Every API name cross-checked against current docs.
Closing.