Awesome Claude Code & Data Science AI/ML Skills Suite
One-stop technical guide for automated data profiling and EDA, ML pipeline scaffolds, SHAP-based feature engineering, model evaluation dashboards, A/B test design, and time-series anomaly detection — with links to code and templates.
What this collection delivers (quick overview)
This repo bundles production-ready templates and Claude-assisted code snippets to accelerate common data science tasks: automated exploratory data analysis (EDA), repeatable ML pipeline scaffolding, model evaluation dashboards, SHAP-driven feature engineering, statistical A/B test design, and time-series anomaly detection. Each module is designed to be modular, testable, and easy to integrate into CI/CD workflows.
Expect pragmatic examples (Pandas, scikit-learn, PyTorch/LightGBM hooks, Streamlit dashboards), unit-testable functions, and opinionated defaults for metrics, data validation, and logging. The goal is repeatability: what you run locally should translate to staging and production with minimal friction.
Explore the code and ready-made templates on GitHub: awesome Claude code and skills. Use the examples as scaffolds for your team’s ML lifecycle and for rapid prototyping of robust workflows.
Automated data profiling and EDA: make data quality first-class
Automated data profiling reduces the friction of getting from raw tables to actionable insights. A robust EDA module performs column-type inference, missing-value summarization, cardinality checks, and distribution plots automatically. When integrated into your CI, automated EDA flags upstream data regressions before they reach model training.
In practice, create a pipeline step that runs lightweight profiling on ingest: calculate summary statistics (mean, median, std), quantiles, null ratios, unique counts, and simple correlations. Store the results as JSON or Parquet to enable historical data-drift analysis and visual dashboards. Use fast, vectorized operations to keep runtime low on large datasets.
Combine automated profiling with rule-based and model-based validators. Rules detect schema breaks and extreme outliers; model-based validators (e.g., simple isolation forests or distance-based detectors) find structural changes. The repo includes a template EDA script and a Streamlit page that converts profiling outputs into an interactive report — see the automated data profiling EDA examples for copy-paste-ready patterns.
Machine learning pipeline scaffold and model evaluation dashboard
Scaffolding a repeatable ML pipeline means separating concerns: ingestion, validation, preprocessing/feature engineering, modeling, evaluation, and deployment. Use a lightweight orchestration layer (Makefiles, Airflow, or Prefect) and containerized tasks for reproducibility. Keep transformations versioned and deterministic using pipelines or sklearn's ColumnTransformer-style objects.
Model evaluation should be automated and multi-dimensional: track cross-validation metrics, holdout performance, confusion matrices, calibration curves, per-group fairness checks, and latency/resource usage. A model evaluation dashboard—built with Streamlit, Dash, or Grafana—lets data scientists and stakeholders quickly verify production-readiness and inspect failure modes.
For featured-snippet-friendly answers: a minimal evaluation checklist includes (1) cross-validated ROC AUC or logloss, (2) per-class precision/recall, (3) calibration error, (4) feature importance and explanation artifacts, and (5) a small set of production smoke tests. You can find a complete dashboard template and prebuilt eval scripts in the repo's dashboard folder: model evaluation dashboard.
Feature engineering with SHAP and explainability
SHAP (SHapley Additive exPlanations) should be part of the feature-engineering toolbox, not just a post-hoc explanation. Use SHAP value distributions to detect weak or noisy features and to drive automated selection: rank features by mean absolute SHAP and consider interactions where SHAP interaction values are large.
When engineering new features, iterate with unit tests: create synthetic cases where a desired behavior is verifiable (e.g., monotonic relationship) and assert that SHAP attributions reflect the signal. This practice prevents regressions in pipelines that do automated feature synthesis or transformations driven by heuristics.
Integrate SHAP artifacts into your evaluation dashboard so product owners can inspect why a model makes certain predictions. For models where latency matters, precompute SHAP summaries (global explanations and typical local explanations) and store them with prediction logs to keep online inference fast while preserving interpretability.
Statistical A/B test design and model comparison
Designing an A/B test for model changes requires more than comparing overall accuracy. Define primary and secondary KPIs, calculate required sample size using expected effect size, baseline conversion, desired power (commonly 80–90%), and significance level (often 0.05). Account for multiple comparisons and pre-specify stopping rules to avoid p-hacking.
Use sequential testing or Bayesian approaches when rollouts are continuous. Frequentist fixed-horizon tests need conservative adjustments; Bayesian lift estimation can provide interval estimates and posterior probabilities that are more intuitive for product decisions. Log metrics at the user-level and include covariates in adjustment models when randomization is imperfect.
For comparing models, prefer paired methods where possible: bootstrap paired differences or use stratified evaluation across segments. A/B test design should feed directly into production monitors: treat the experiment as a deployment candidate and continue to collect post-launch telemetry for at least one full business cycle.
Time-series anomaly detection and drift monitoring
Time-series anomaly detection needs contextual awareness: separate trend, seasonality, and residuals before flagging anomalies. Use decomposition (STL), rolling statistics, or model-based residual analysis (ARIMA/Prophet/LSTM) to produce clean residuals on which anomaly detectors operate. Thresholding should be adaptive to account for seasonality and business cycles.
Combine statistical detectors (z-score on residuals), machine-learning detectors (autoencoders, isolation forest), and rules (sudden drop in volume). Create an ensemble of detectors and a scoring function that weights signals by recent reliability to reduce false positives. Alerting should include context: recent trends, related features, and likely root-cause candidates.
Drift monitoring is complementary: monitor feature distributions and model input statistics for population shifts. Keep a windowed baseline and compute MD (Mahalanobis), KL-divergence, or Earth Mover’s Distance for continuous monitoring. Tie drift alerts to retraining policies and retrain only when validation metrics indicate meaningful performance degradation.
Production tips, observability, and commonsense engineering
Operationalize with observability in mind: log inputs, predictions, SHAP summaries, and evaluation metrics. Use lightweight telemetry and structured logs for reproducible debugging. Prefer idempotent tasks, small stateless services for inference, and a single source of truth for model artifacts and metadata (model registry or artifact storage).
Automate data and model validation gates into CI: fail the pipeline if new data violates schema or if model performance drops below a defined threshold on a validation slice. Add canary releases with gradual traffic ramps and rollback triggers based on automated health checks.
Keep a playbook for incidents: how to revert a model, how to isolate data issues, and how to warm a fallback model. The repo includes templates for runbooks and remediation scripts to reduce MTTR — scan the "ops" folder for example automation and dashboard configs.
- Run profiling: python scripts/profile.py --input data.csv --output profiles.json
- Build pipeline: make build && make test
- Launch dashboard: streamlit run dashboard/app.py
Resources and links
Clone the repository to get templates, tests, and dashboard examples: Data Science AI ML skills suite. The repo organizes modules into eda, pipeline, explainability, experiments, and ops directories to make adoption straightforward.
If you want a ready-to-run demo, check the dashboard folder and the Streamlit entry points. The code uses standard dependencies (Pandas, scikit-learn, shap, lightgbm, streamlit) so it’s easy to reproduce in a virtualenv or Docker container.
Refer to the included examples to see how SHAP artifacts are persisted alongside model artifacts and how evaluation reports are generated automatically after training runs — those examples are linked directly in the repo: feature engineering with SHAP.
FAQ (top three user questions)
How do I set up an automated EDA pipeline for my dataset?
Start with a lightweight profiler that extracts types, null counts, unique counts, quantiles, and basic correlations. Persist outputs (JSON/Parquet) and integrate the step into your ingestion pipeline. Add rule-based checks (schema/thresholds) and a simple model-based validator to catch structural changes. Automate alerts and store a history for drift analysis.
How can SHAP be used for feature engineering and selection?
Use aggregated SHAP values (mean absolute SHAP) to rank features by contribution. Inspect SHAP interactions to discover candidate pairwise features. Validate engineered features with unit tests and monitor their SHAP contributions over time; remove features whose importance degrades or that introduce instability.
What's the recommended approach for designing A/B tests and evaluating model performance?
Predefine primary KPIs and compute required sample size for desired power and significance. Use paired comparisons when possible, adjust for multiple tests, and pre-specify stopping rules. For evaluation, automate cross-validation, per-segment metrics, calibration checks, and include fairness and resource usage as part of your acceptance criteria.
Semantic core (primary, secondary, clarifying clusters)
Use this core to seed meta, anchor text, and on-page optimization. Grouped by intent and frequency.
Primary (high intent):
- awesome Claude code and skills
- Data Science AI ML skills suite
- automated data profiling EDA
- machine learning pipeline scaffold
- model evaluation dashboard
Secondary (task/feature focused):
- feature engineering with SHAP
- SHAP values feature importance
- statistical A/B test design
- time-series anomaly detection
- EDA pipeline automation
Clarifying / LSI (supporting phrases):
- exploratory data analysis
- data quality checks
- cross-validation metrics
- model explainability
- data drift detection
- isolation forest anomaly
- calibration curve ROC AUC
- production monitoring dashboard