Data Engineering with Python, dbt-core, and Airflow: Build a Production ELT Pipeline from PostgreSQL to Snowflake
βFrom SQL queries to scheduled pipelines β without a DevOps team.β
Design, test, and schedule a daily ELT pipeline that extracts from PostgreSQL, transforms with dbt-core 1.8, orchestrates via Airflow 2.9, and loads validated data into Snowflake β version-controlled, failure-tolerant, and demonstrable to a hiring manager.
One-time Β· Lifetime access Β· Certificate included
- β6 modules of content
- β48 concept slides
- β18 practical exercises
- β24 quiz questions
- βCapstone project
- βLearnAspire certificate
Learning Outcomes
What you'll learn
The day after you finish
The day after completing this course, you will open your terminal, run `docker-compose up` to start your local Airflow 2.9 environment, trigger the cartflow_daily_elt DAG manually for yesterday's date, watch it extract from PostgreSQL via SQLAlchemy, execute your dbt incremental models, pass all schema.yml tests, load the validated mart tables into Snowflake, and receive a Slack message confirming success β then push the final version to your public GitHub repo and paste the link into your resume or send it to your engineering team.
Who this is for
- Primary: Data analyst with 2β4 years of SQL experience who has been handed ownership of a data pipeline with no data engineering support
- Secondary: Junior backend developer who writes SQL against production databases and needs to build scheduled data products without a dedicated data team
- Tertiary: Analytics engineer or BI developer who needs to understand how dbt models get triggered, tested, and monitored in a real orchestration layer
Prerequisites
- SQL proficiency at the level of multi-table JOINs, window functions, and aggregate queries β you write SQL daily and do not need syntax reminders
- Python fundamentals: functions, loops, file I/O, and working with pandas DataFrames β you have used pandas to clean or reshape data at least once
- Git basics: you can clone a repo, commit changes, and push to GitHub β no branching strategy required
- No prior Airflow, dbt, Snowflake, or orchestration experience required β the course starts from zero on all three tools
Curriculum
6 modules Β· full breakdown
π Part of: Data & ML Engineering Path
Capstone Project
CartFlow Commerce Production ELT Pipeline β Public GitHub Repository
Build and document a complete, runnable ELT pipeline for CartFlow Commerce that extracts incremental order and customer_segment records from PostgreSQL 16 using SQLAlchemy 2.0, stages them to Snowflake via COPY INTO, applies three dbt-core 1.8 incremental models (stg_orders, stg_customers, fct_daily_order_summary) with full schema.yml test coverage, orchestrates the entire run as a single Airflow 2.9 DAG with Slack failure alerts, and handles at least two real data quality failure scenarios β a late-arriving shipment record and a null customer_id batch β with documented resolution logic. All code runs inside Docker Desktop using the official Airflow docker-compose.yaml. The repo includes a pinned requirements.txt, a dbt docs output committed as static HTML, a Great Expectations 0.18 data quality report for the fct_daily_order_summary table, and a README that explains the incremental strategy, schema design decisions, and how to run the full pipeline from a cold clone.
What you'll deliver
A public GitHub repository with: (1) an Airflow DAG file (cartflow_daily_elt.py) that passes `airflow dags test` with no import errors, (2) a dbt project with at least three models and passing `dbt test` output committed as a log file, (3) a Great Expectations checkpoint report showing row count, null rate, and uniqueness assertions on fct_daily_order_summary, (4) a dbt docs index.html committed to /docs, and (5) a README of at least 600 words covering architecture decisions, failure handling, and how to reproduce the pipeline locally
Portfolio value
Design and operationalize a production ELT pipeline (CartFlow Commerce) that extracts incrementally from PostgreSQL via SQLAlchemy, transforms with dbt-core incremental models and schema tests, orchestrates via Airflow with observability, and handles real data quality failuresβproving readiness to own data infrastructure from ingestion through analytics.