AI & Machine LearningπŸ’» Technical CourseLearnAspire Certified

Data Engineering with Python, dbt-core, and Airflow: Build a Production ELT Pipeline from PostgreSQL to Snowflake

β€œFrom SQL queries to scheduled pipelines β€” without a DevOps team.”

Design, test, and schedule a daily ELT pipeline that extracts from PostgreSQL, transforms with dbt-core 1.8, orchestrates via Airflow 2.9, and loads validated data into Snowflake β€” version-controlled, failure-tolerant, and demonstrable to a hiring manager.

Intermediate12h6 modules48 slides18 exercises24 quiz Qs
πŸ”₯ Launch Price β€” 63% off. Limited time.
β‚Ή2,999β‚Ή7,999

One-time Β· Lifetime access Β· Certificate included

Sign in to Enroll
7-day money-back guarantee
  • βœ“6 modules of content
  • βœ“48 concept slides
  • βœ“18 practical exercises
  • βœ“24 quiz questions
  • βœ“Capstone project
  • βœ“LearnAspire certificate

Learning Outcomes

What you'll learn

β†’You will be able to write a SQLAlchemy 2.0 extraction script that pulls incremental order and customer records from a PostgreSQL 16 source using server-side cursors, handles connection failures with retry logic, and writes Parquet staging files ready for Snowflake ingestion
β†’You will be able to build a dbt-core 1.8 project with incremental models, ref() dependencies, and schema.yml tests (not_null, unique, accepted_values, relationships) that catches real data quality failures β€” null customer_ids, duplicate order keys, invalid status codes β€” before they reach the BI layer
β†’You will be able to diagnose and fix the three most common production pipeline failures: a SQLAlchemy connection pool timeout, a dbt incremental merge conflict on a type-mismatched key, and an Airflow TaskInstance entering a zombie state β€” using actual terminal tracebacks and the Airflow 2.9 UI
β†’You will be able to author an Airflow 2.9 DAG that sequences extraction, dbt run, dbt test, and Snowflake load tasks with task-level retry logic, an on_failure_callback that posts to a Slack webhook, and a catchup=False schedule that prevents backfill storms after downtime
β†’You will be able to deliver a public GitHub repository containing a fully documented, version-controlled ELT pipeline β€” with a dbt docs site, a data quality report, and a README that explains every schema design decision β€” that you can link on your resume as evidence of production-pattern thinking

The day after you finish

The day after completing this course, you will open your terminal, run `docker-compose up` to start your local Airflow 2.9 environment, trigger the cartflow_daily_elt DAG manually for yesterday's date, watch it extract from PostgreSQL via SQLAlchemy, execute your dbt incremental models, pass all schema.yml tests, load the validated mart tables into Snowflake, and receive a Slack message confirming success β€” then push the final version to your public GitHub repo and paste the link into your resume or send it to your engineering team.

Who this is for

  • Primary: Data analyst with 2–4 years of SQL experience who has been handed ownership of a data pipeline with no data engineering support
  • Secondary: Junior backend developer who writes SQL against production databases and needs to build scheduled data products without a dedicated data team
  • Tertiary: Analytics engineer or BI developer who needs to understand how dbt models get triggered, tested, and monitored in a real orchestration layer

Prerequisites

  • SQL proficiency at the level of multi-table JOINs, window functions, and aggregate queries β€” you write SQL daily and do not need syntax reminders
  • Python fundamentals: functions, loops, file I/O, and working with pandas DataFrames β€” you have used pandas to clean or reshape data at least once
  • Git basics: you can clone a repo, commit changes, and push to GitHub β€” no branching strategy required
  • No prior Airflow, dbt, Snowflake, or orchestration experience required β€” the course starts from zero on all three tools

Curriculum

6 modules Β· full breakdown

πŸ“Š Part of: Data & ML Engineering Path

Step 1 β€” ML Basics
β†’Step 2 β€” Data Pipelines
β†’Step 3 β€” RAG & Search
β†’Step 4 β€” AI Systems
← Previous: Step 1 β€” ML BasicsNext in path: Step 3 β€” RAG & Search β†’
πŸ†

Capstone Project

CartFlow Commerce Production ELT Pipeline β€” Public GitHub Repository

Build and document a complete, runnable ELT pipeline for CartFlow Commerce that extracts incremental order and customer_segment records from PostgreSQL 16 using SQLAlchemy 2.0, stages them to Snowflake via COPY INTO, applies three dbt-core 1.8 incremental models (stg_orders, stg_customers, fct_daily_order_summary) with full schema.yml test coverage, orchestrates the entire run as a single Airflow 2.9 DAG with Slack failure alerts, and handles at least two real data quality failure scenarios β€” a late-arriving shipment record and a null customer_id batch β€” with documented resolution logic. All code runs inside Docker Desktop using the official Airflow docker-compose.yaml. The repo includes a pinned requirements.txt, a dbt docs output committed as static HTML, a Great Expectations 0.18 data quality report for the fct_daily_order_summary table, and a README that explains the incremental strategy, schema design decisions, and how to run the full pipeline from a cold clone.

What you'll deliver

A public GitHub repository with: (1) an Airflow DAG file (cartflow_daily_elt.py) that passes `airflow dags test` with no import errors, (2) a dbt project with at least three models and passing `dbt test` output committed as a log file, (3) a Great Expectations checkpoint report showing row count, null rate, and uniqueness assertions on fct_daily_order_summary, (4) a dbt docs index.html committed to /docs, and (5) a README of at least 600 words covering architecture decisions, failure handling, and how to reproduce the pipeline locally

Portfolio value

Design and operationalize a production ELT pipeline (CartFlow Commerce) that extracts incrementally from PostgreSQL via SQLAlchemy, transforms with dbt-core incremental models and schema tests, orchestrates via Airflow with observability, and handles real data quality failuresβ€”proving readiness to own data infrastructure from ingestion through analytics.