Python ML Engineering for Java Developers: Ship a Fraud Detection Pipeline with scikit-learn, pandas, and FastAPI
“Your Java Brain Is an Asset — Ship Your First ML Pipeline”
Build, evaluate, serialize, and serve a production-grade RandomForest fraud classifier on imbalanced real-world data — without ever Googling DataFrame syntax again.
One-time · Lifetime access · Certificate included
- ✓6 modules of content
- ✓48 concept slides
- ✓18 practical exercises
- ✓24 quiz questions
- ✓Capstone project
- ✓LearnAspire certificate
Learning Outcomes
What you'll learn
The day after you finish
The day after completing this course, Maya Chen — or you — can open a Jupyter notebook or a Python script, connect to a Postgres transaction database via SQLAlchemy, load 500K rows into pandas, build a scikit-learn Pipeline with ColumnTransformer and SMOTE, tune it with RandomizedSearchCV, print a stratified classification report showing per-class precision and recall, serialize the fitted Pipeline to fraud_pipeline_v1.pkl with joblib, and run a FastAPI server that accepts a JSON transaction payload and returns a fraud probability — then push the entire repo to GitHub before the morning standup.
Who this is for
- Primary: Backend or full-stack Java developer with 3–6 years of Spring Boot and Kubernetes experience who has completed an ML MOOC but has never shipped production Python code
- Secondary: Data engineer with a JVM background who needs to own model training and evaluation pipelines rather than just data movement
- Tertiary: ML team lead or engineering manager evaluating whether a transitioning Java developer can own an end-to-end Python ML deliverable
Prerequisites
- 3+ years of Java in production — OOP, generics, collections, and exception handling are assumed fluent
- Familiarity with REST API design (Spring MVC or JAX-RS) — FastAPI endpoint structure is taught by contrast, not from scratch
- Relational database fluency (SQL joins, ResultSets, or JPA) — pandas merge patterns are mapped to these directly
- Completion of any ML MOOC covering gradient descent, regularization, and bias-variance tradeoff — hyperparameter concepts are not re-explained
- Git and GitHub basics — every module produces a committed artifact; branching strategy is not taught
- Python environment setup: Python 3.12 installed, pip or conda available — environment creation is covered in Module 1 but not debated
Curriculum
6 modules · full breakdown
📚 Part of: Python and AI for Java DevelopersCourse 2 of 3
Capstone Project
Kestrel Finance Fraud Detection API: End-to-End Pipeline from Postgres to FastAPI
Maya has two days before the board demo. She must produce a working fraud classifier that ingests Kestrel Finance's synthetic 500K-row Postgres transaction log (0.3% fraud prevalence, 6 numeric features, 3 categorical features, 4% missing values), trains a tuned RandomForestClassifier inside a scikit-learn Pipeline with ColumnTransformer preprocessing and SMOTE oversampling applied correctly within cross-validation folds, evaluates it with 5-fold stratified cross-validation and a held-out test set, and exposes it as a FastAPI endpoint — all version-pinned, Dockerized, and pushed to a public GitHub repository with a CI badge.
What you'll deliver
A GitHub repository containing: (1) fraud_pipeline_v1.pkl — a joblib-serialized fitted Pipeline with preprocessing and classifier; (2) classification_report.txt — sklearn output showing per-class precision, recall, F1, and support on the held-out test set; (3) confusion_matrix.png — a seaborn heatmap of the test-set confusion matrix with fraud/legitimate labels; (4) a running FastAPI service (main.py + Dockerfile) accepting POST /predict with a typed Pydantic schema and returning fraud_probability and predicted_class; (5) feature_engineering_decisions.md — a one-page technical memo explaining the four feature engineering choices made, the RandomizedSearchCV parameter grid, and why each final hyperparameter value was selected over alternatives.