Build & Defend a Real ML Classifier: Logistic Regression and Decision Trees on Messy Financial Data
βBuild your first real ML model. Evaluate it. Defend it.β
Take a raw, imbalanced loan CSV from pandas to a peer-reviewed model card β using scikit-learn 1.4, stratified cross-validation, and a precision-recall tradeoff your manager can act on.
One-time Β· Lifetime access Β· Certificate included
- β6 modules of content
- β52 concept slides
- β18 practical exercises
- β24 quiz questions
- βCapstone project
- βLearnAspire certificate
Learning Outcomes
What you'll learn
The day after you finish
The day after completing this course, you will be able to open a raw financial CSV at work, build a scikit-learn Pipeline with imputation, encoding, and either a LogisticRegression or DecisionTreeClassifier, evaluate it with stratified k-fold cross-validation, produce a full classification_report() and learning curve plot, and hand your manager a written model card that states which model to deploy and why β all without asking anyone for help.
Who this is for
- Primary: Backend Python developer or data analyst with 2β3 years of experience who writes clean application code and SQL daily but has never shipped a trained model to production
- Secondary: Junior ML engineer in their first 12 months who completed an introductory ML course but can't yet make defensible model selection decisions on messy real-world data
- Tertiary: Data engineering or analytics lead who needs to evaluate model-building work done by their team and understand what 'good' ML evaluation evidence looks like
Prerequisites
- Fluent in Python 3: functions, list comprehensions, imports, debugging stack traces, and reading third-party library documentation without hand-holding
- Comfortable with pandas DataFrames and SQL: can filter, group, join, and inspect datasets; understands mean, variance, and correlation at the level of writing a data quality report
- No prior ML experience required β but you must know what a for loop is without being told
Curriculum
6 modules Β· full breakdown
π Part of: Data & ML Engineering Path
Capstone Project
The CapitalRoute Model Card: A Peer-Reviewed, Production-Ready Classification Report
Maya submits her final deliverable to the CapitalRoute risk committee. Using the complete CapitalRoute loan dataset (4,200 rows, 18% default rate), learners produce a single Jupyter notebook containing: data loading and leakage audit, a full sklearn Pipeline for both a tuned LogisticRegression (C optimised via GridSearchCV) and a tuned DecisionTreeClassifier (max_depth and min_samples_leaf optimised via GridSearchCV with StratifiedKFold(5)), stratified 5-fold CV results for both models, a precision-recall curve comparison plot, a learning curve plot for the final chosen model, and a structured 5-section model card written in Markdown inside the notebook. The peer-review step uses a graded rubric: reviewers check for leakage-free features, correct stratified splitting, proper CV methodology, honest metric reporting, and a business-grounded recommendation.
What you'll deliver
A Jupyter notebook (loan_default_model_card.ipynb) that runs clean from top to bottom on Python 3.11 + scikit-learn 1.4.2 + pandas 2.1.4, containing: the full ML Pipeline, GridSearchCV results for both models, a matplotlib precision-recall comparison plot, a learning curve plot, a sklearn classification_report() for the chosen model on the held-out test set, and a Markdown model card section stating dataset, features, CV F1 scores, precision-recall tradeoff rationale, and a one-paragraph deployment recommendation with stated risk