Track: WebRTC and Real-Time Applications Keynote |
Engaging a New Species: How Multimodal LLMs Are Set to Transform RTC Infrastructure |
The advent of multimodal large language models has introduced a new kind of participant into human real-time communication (RTC) infrastructure. While these models lack ears, they can still listen; though they have no mouth, they can speak. This raises a compelling question: how will the interfaces for integrating these models differ from the traditional microphones and speakers we use today? And how will these models, as new "customers" of RTC infrastructure, behave compared to humans? For example, the codecs designed for human auditory processing may need to evolve. LLMs could "speak" at speeds far beyond human capability or process several seconds of speech in just one second, provided the data arrives simultaneously. What new requirements will these capabilities place on RTC infrastructure, and how will it adapt? With over 700 billion minutes of real-time audio and video running through Agora’s RTC infrastructure annually, we are finely tuned for human-to-human communication. We invite you to join this keynote as we explore the potential of real-time communication using large language models and examine the possibilities and challenges in this innovative form of conversation. |
|
Presentation Video |
Presentation Notes |
ZHAO-ZHONG-ENGAGING-A-NEW-SPECIES-MULTIMODAL-LLMS2.pdf |