5 Production AI Agents with LangGraph 0.2, CrewAI 0.80 & LangSmith: Deploy, Observe, and Govern Real IT Agents
βFive agents deployed. Production-grade. Running in your org.β
Build and deploy five containerised, observable AI agents β incident triage, infra audit, security posture, knowledge retrieval, and CI code review β running inside your organisation's existing infrastructure within one working day.
- β6 modules of content
- β41 concept slides
- β18 practical exercises
- β24 quiz questions
- βCapstone project
- βLearnAspire certificate
Learning Outcomes
What you'll learn
The day after you finish
The day after completing this course, you will open your organisation's PagerDuty account, point the LangGraph incident triage agent at a real alert queue using your own API key stored in Docker secrets (not os.environ), trigger a test incident, and watch LangSmith capture the full execution trace β including tool call latency, token spend, and the generated resolution runbook β then share that LangSmith trace URL with your team lead as evidence the agent is observable and ready for on-call rotation evaluation.
Who this is for
- Primary: Solution architects and senior IT engineers (5β12 years) who have used LLM APIs and prompt engineering but have never deployed a production AI agent with tool calling, persistent memory, and cost governance
- Secondary: Platform engineers and DevOps leads responsible for evaluating and operationalising AI tooling inside an existing CI/CD or ITSM stack
- Tertiary: Engineering managers and CTOs who will review, fund, or approve AI agent deployments and need to understand production-readiness criteria
Prerequisites
- Python 3.10+ proficiency: comfortable writing async functions, TypedDict schemas, decorators, and reading tracebacks without external help
- Hands-on experience with at least one LLM API (OpenAI, Anthropic, or Bedrock): has made tool-calling or function-calling API requests and read the raw JSON response
- Docker fluency: can write a Dockerfile, build an image, run a container with volume mounts and environment variables, and read docker-compose logs
- Familiarity with at least one of: FastAPI, PagerDuty API, GitHub Actions, Terraform CLI, or AWS CLI β the course integrates all five but does not teach HTTP fundamentals
- Has read LangChain or LangGraph documentation and found it insufficient for production deployment decisions β this course assumes that baseline, not replaces it
Curriculum
6 modules Β· full breakdown
π€ Part of: AI Engineering Path
Capstone Project
Arvex Agent Platform: Five-Agent Production Deployment with Unified Observability and Cost Governance
Deploy all five agents β LangGraph incident triage, CrewAI infrastructure audit, ReAct security posture, PostgreSQL-backed knowledge retrieval, and GitHub Actions code review β behind a single FastAPI gateway running in Docker Compose, instrumented with LangSmith, with per-agent token budgets enforced via a shared cost-governance middleware, a unified /health endpoint reporting agent status, and a one-page ADR for each agent documenting the tool schema decisions, memory backend trade-offs, and fallback behaviour. The deployment targets the Arvex Technologies constraint set: $2,000/month aggregate LLM budget, no dedicated MLOps team, existing PostgreSQL and Redis infrastructure, and a requirement that no agent can page the on-call engineer more than once per incident without human confirmation.
What you'll deliver
A Docker Compose stack (docker-compose.prod.yml) containing all five agent services, a FastAPI gateway with JWT-authenticated /invoke/{agent_id} endpoints, a LangSmith project dashboard with per-agent cost attribution, a PostgreSQL init.sql schema for the knowledge agent memory store, a GitHub Actions workflow file (.github/workflows/code-review-agent.yml) triggering the code review agent on pull_request events, five Architecture Decision Records (one per agent) in Markdown format ready for internal RFC submission, and a requirements.txt with pinned versions for every dependency