Back to blog
·4 min read

How We Built Voice AI Agents That Actually Answer Business Calls

A technical look at building production voice AI using Twilio ConversationRelay, LLM orchestration, and agentic tool-calling — from architecture decisions to sub-second response latency.

Voice AILLMTwilioArchitecture

The Problem

Every small business owner knows the pain: a customer calls after hours, gets voicemail, and calls your competitor instead. Traditional IVR systems ("Press 1 for sales, press 2 for support...") frustrate callers and feel robotic. Hiring a 24/7 receptionist costs $3,000-5,000/month. We built HelloCalls to solve this — an AI voice agent platform where businesses deploy intelligent phone agents that answer calls naturally, book appointments, qualify leads, and route calls by intent. Here's how the technology actually works.

The Architecture

At its core, the system connects three things in real-time: speech recognition (what the caller says), an LLM (deciding what to respond), and text-to-speech (speaking the response back). The challenge is doing this fast enough that the conversation feels natural.

Twilio ConversationRelay

We use Twilio's ConversationRelay protocol — a bidirectional WebSocket connection that handles the telephony layer. When a call comes in: 1. Twilio answers and opens a WebSocket to our server 2. Caller speech is transcribed to text (STT) on Twilio's side 3. We receive the text, send it to our LLM pipeline 4. Our LLM generates a response (streamed) 5. We send the response text back through the WebSocket 6. Twilio converts it to speech (TTS) and plays it to the caller This happens in under a second. The key insight: streaming the LLM response rather than waiting for the complete answer. The caller starts hearing the response while the LLM is still generating it.

The LLM Gateway

We don't rely on a single LLM provider. We built an intelligent routing layer that scores providers based on:
  • Cost (30% weight) — voice calls burn tokens fast
  • Health (30%) — if a provider is returning errors, skip it
  • Latency (20%) — for voice, speed is everything
  • Task match (15%) — some models handle tool-calling better
  • Capabilities (5%) — streaming support, context window
The router tries the best-scored provider first and automatically fails over if there's an error. In practice, this means Groq handles most voice calls (ultra-low latency), with OpenRouter as fallback for complex reasoning tasks.

Agentic Tool-Calling

The AI doesn't just talk — it takes action. During a live call, the agent can:
  • Book appointments with calendar validation (check availability, respect business hours, prevent double-booking)
  • Capture leads with contact details and intent classification
  • Transfer calls to specific departments or phone numbers
  • Route by intent using a hybrid approach: fast keyword matching first, LLM classification as fallback with confidence scoring
  • Query knowledge bases to answer business-specific questions
Each tool is defined with a schema that the LLM understands. When the model decides to use a tool, we execute it server-side and feed the result back into the conversation.

What We Learned

Latency is everything. In a phone call, even 200ms of extra delay feels wrong. We optimized every layer: connection pooling to LLM providers, streaming responses, pre-warming TTS engines. Interruption handling matters. Humans don't wait for the AI to finish speaking before they respond. ConversationRelay handles "barge-in" — when the caller starts talking, the AI stops and listens. Industry-specific prompts are essential. A dental office receptionist AI needs different knowledge than a plumber's dispatch. We built 20+ industry templates with pre-configured system prompts, tool configurations, and knowledge modules. Multi-language support is non-negotiable for Canada. We added auto-language detection — the AI identifies the caller's language within the first few seconds and switches to respond in kind. 15+ languages supported.

The Result

HelloCalls is live in the App Store and Google Play. Businesses can deploy a voice AI agent in minutes, not months. The same engineering approach — real-time systems, multi-provider resilience, agentic tool-calling — is available in every product we build. If you're building something that needs voice AI, or any AI integration into your existing product, let's talk.
CA

Charles Adewoye

AI Solutions Architect & Founder

Want to discuss this?

Book a free discovery call and let's talk about your project.

Book a Discovery Call