Track: WebRTC and Real-Time Applications |
Building a Low-Latency Voice Assistant Using LLMs: Insights and Challenges |
At Telnyx, we provide APIs enabling dynamic phone call interactions. With the rise of Large Language Models (LLMs), we integrated them into our voice flows, enhancing customer applications. Our Voice Assistant combines transcription, response generation, and speech synthesis. Initially, we faced significant latency issues and poor interruption handling. Through various optimizations, including LLM streaming, service colocation, improved transcription, and a custom text-to-speech system, we reduced latency to 900-1,000ms. We also improved user experience with advanced end-of-speech detection and noise handling. During this presentation, we will explore our progress, the challenges we faced, and the innovative solutions we implemented to build a high-performance Voice Assistant. Today, our system delivers low-latency, high-quality voice interactions, allowing customers to focus on their business logic. |
|
Presentation Video |