AI & Machine LearningπŸ’» Technical CourseLearnAspire Certified

5 Production AI Agents with LangGraph 0.2, CrewAI 0.80 & LangSmith: Deploy, Observe, and Govern Real IT Agents

β€œFive agents deployed. Production-grade. Running in your org.”

Build and deploy five containerised, observable AI agents β€” incident triage, infra audit, security posture, knowledge retrieval, and CI code review β€” running inside your organisation's existing infrastructure within one working day.

Advanced12h6 modules41 slides18 exercises24 quiz Qs
FreeΒ· No credit card required
Sign in to Enroll for Free β†’
7-day money-back guarantee
  • βœ“6 modules of content
  • βœ“41 concept slides
  • βœ“18 practical exercises
  • βœ“24 quiz questions
  • βœ“Capstone project
  • βœ“LearnAspire certificate

Learning Outcomes

What you'll learn

β†’You will be able to design and implement a LangGraph 0.2 StateGraph with TypedDict-annotated reducers, conditional edge routing, and max_iterations circuit-breaker logic that prevents infinite loops under real PagerDuty alert payloads
β†’You will be able to write OpenAI tool-calling schema v2 definitions that survive model version updates by validating against a JSON Schema registry at agent startup, and recover gracefully when the LLM hallucinates a tool name β€” with the exact ValueError trace and the correction
β†’You will be able to deploy a CrewAI 0.80 multi-agent crew with async task dependencies, shared Terraform state context, and deterministic task sequencing that avoids the race condition that silently drops audit findings when two agents write to the same output buffer
β†’You will be able to wire a PostgreSQL-backed conversation memory store using LangChain 0.3's RunnableWithMessageHistory, surviving container restarts, with TTL-based eviction and a schema migration path that does not require downtime
β†’You will be able to instrument all five agents with LangSmith traces, per-run token cost attribution mapped to organisational cost centres, and a Grafana dashboard consuming LangSmith's REST export β€” satisfying a finance team's requirement for LLM spend reporting without exposing raw API keys in logs

The day after you finish

The day after completing this course, you will open your organisation's PagerDuty account, point the LangGraph incident triage agent at a real alert queue using your own API key stored in Docker secrets (not os.environ), trigger a test incident, and watch LangSmith capture the full execution trace β€” including tool call latency, token spend, and the generated resolution runbook β€” then share that LangSmith trace URL with your team lead as evidence the agent is observable and ready for on-call rotation evaluation.

Who this is for

  • Primary: Solution architects and senior IT engineers (5–12 years) who have used LLM APIs and prompt engineering but have never deployed a production AI agent with tool calling, persistent memory, and cost governance
  • Secondary: Platform engineers and DevOps leads responsible for evaluating and operationalising AI tooling inside an existing CI/CD or ITSM stack
  • Tertiary: Engineering managers and CTOs who will review, fund, or approve AI agent deployments and need to understand production-readiness criteria

Prerequisites

  • Python 3.10+ proficiency: comfortable writing async functions, TypedDict schemas, decorators, and reading tracebacks without external help
  • Hands-on experience with at least one LLM API (OpenAI, Anthropic, or Bedrock): has made tool-calling or function-calling API requests and read the raw JSON response
  • Docker fluency: can write a Dockerfile, build an image, run a container with volume mounts and environment variables, and read docker-compose logs
  • Familiarity with at least one of: FastAPI, PagerDuty API, GitHub Actions, Terraform CLI, or AWS CLI β€” the course integrates all five but does not teach HTTP fundamentals
  • Has read LangChain or LangGraph documentation and found it insufficient for production deployment decisions β€” this course assumes that baseline, not replaces it

Curriculum

6 modules Β· full breakdown

πŸ€– Part of: AI Engineering Path

Step 1 β€” Foundations
β†’Step 2 β€” Core Skills
β†’Step 3 β€” RAG
β†’Step 4 β€” LangGraph RAG
β†’Step 5 β€” Agent Systems
β†’Step 6 β€” Production
β†’Step 7 β€” MCP
β†’Step 8 β€” Enterprise
← Previous: Step 4 β€” LangGraph RAGNext in path: Step 6 β€” Production β†’
πŸ†

Capstone Project

Arvex Agent Platform: Five-Agent Production Deployment with Unified Observability and Cost Governance

Deploy all five agents β€” LangGraph incident triage, CrewAI infrastructure audit, ReAct security posture, PostgreSQL-backed knowledge retrieval, and GitHub Actions code review β€” behind a single FastAPI gateway running in Docker Compose, instrumented with LangSmith, with per-agent token budgets enforced via a shared cost-governance middleware, a unified /health endpoint reporting agent status, and a one-page ADR for each agent documenting the tool schema decisions, memory backend trade-offs, and fallback behaviour. The deployment targets the Arvex Technologies constraint set: $2,000/month aggregate LLM budget, no dedicated MLOps team, existing PostgreSQL and Redis infrastructure, and a requirement that no agent can page the on-call engineer more than once per incident without human confirmation.

What you'll deliver

A Docker Compose stack (docker-compose.prod.yml) containing all five agent services, a FastAPI gateway with JWT-authenticated /invoke/{agent_id} endpoints, a LangSmith project dashboard with per-agent cost attribution, a PostgreSQL init.sql schema for the knowledge agent memory store, a GitHub Actions workflow file (.github/workflows/code-review-agent.yml) triggering the code review agent on pull_request events, five Architecture Decision Records (one per agent) in Markdown format ready for internal RFC submission, and a requirements.txt with pinned versions for every dependency