Build a Voice Agent from scratch

▎ What you'll build

You leave with the real thing - not notes.

✓A working real-time voice agent you can speak to and that answers back in a natural voice

✓A full STT to LLM to TTS pipeline wired up with streaming, not batch

✓Working interruption / barge-in so you can cut the agent off mid-sentence

✓At least one tool / function call mid-call

✓A runnable repo you take home plus a clear path to put it on a phone number via Twilio/SIP

▎ The 3 hours, block by block

Hands-on the whole way.

Block 1

Anatomy of a voice agent

The cascading stack STT (Deepgram/Whisper) to LLM to TTS (ElevenLabs/Cartesia) and the speech-to-speech alternative (OpenAI Realtime)
Where latency hides - time-to-first-token, time-to-first-audio
Get accounts, keys, and a starter repo running

Block 2

Build the pipeline

Stand up streaming STT to LLM to TTS with Pipecat or LiveKit Agents
Get turn-taking and interruption working with VAD and endpointing
Add one function / tool call

Block 3

Make it real and ship-aware

Put the agent on a phone number conceptually via Twilio/SIP
Build vs buy - frameworks like Pipecat/LiveKit vs turnkey Vapi/Retell
Debugging, measuring latency, what breaks in production, Q&A

Who it's for

Software engineers who can read and write code and want hands-on voice AI, not slides
Founders evaluating whether to build a voice product in-house or buy
Anyone shipping conversational or phone-based AI who needs to understand real latency and turn-taking trade-offs

What to bring

A laptop with Python 3.10+ (or Node), a terminal, and a code editor
A working microphone and headphones - headphones prevent echo during testing
API keys created beforehand - an LLM key, an STT key (Deepgram), a TTS key (ElevenLabs or Cartesia); most have free trial credit
Optional - a Twilio account for the phone-number portion

▎ By the end

What's true when you walk out.

✓You have a voice agent running on your own machine that you can hold a real back-and-forth conversation with

✓You understand cascading vs speech-to-speech architectures and can reason about which to use

✓You can name where latency comes from and the concrete levers to control it

▎ Tools you'll touch

PipecatLiveKit AgentsOpenAI Realtime APIDeepgramWhisperElevenLabsCartesiaSilero VADTwilio/SIPPython

▎ Who teaches

Your instructor

Workshop instructor

I've spent 22+ years building software for enterprises - full-stack apps, backend systems, and lately RAG pipelines and agentic AI solutions. I've shipped the hard stuff for big companies. These workshops are that experience, distilled into one hands-on room so you can ship your own.

▎ Questions

Before you sign up.

Do I need machine learning experience?

No. If you can clone a repo, run a command, and read code, you can keep up. We use hosted models via APIs - no model training required.

Will the agent actually work or is it a toy demo?

It is a real, runnable pipeline with streaming and interruption - the same architecture production agents use. It will not be hardened for scale in 3 hours, but it is an honest foundation and you keep the code.

Why build from scratch when platforms like Vapi exist?

To understand what is actually happening so you can make an informed build-vs-buy call. We cover where turnkey platforms win and where building wins.

Sat 22 Aug 2026 · 3 hours · 10 AM - 1 PM · Chennai · 10 seats. Drop your email and we'll tell you the moment booking opens.