Tuesday October 8, 2024 | ||||
time | Room 1 | Room 2 | ||
8:00AM | Breakfast | |||
9:00AM | Keynote Conference Greetings Carol Davids Single Presenter Conference greetings! This is the 20th annual RTC Conference. The conference greetings will include physical layer information including - where the rooms are and where you can find breakfast, lunch and snacks; (2) the inter-networking layer - including the networking opportunities provided by the IEEE and IIT booths - get student resumes here, the Tuesday evening social hour, the Poster sessions; (3) the media channels - what you will find in the various tracks and keynotes; and shout-outs to our generous sponsors and exhibitors who make this conference possible! Bring your \\\'coffee and\\\' to the West Ballroom for a brief conference orientation, then proceed to the first talk of your choice in either room 1 or room 2. | |||
9:30AM | Programmable Real-Time Networks Cloud Based Decision Making for an Autonomous Vehicle Sofia Yang Single Presenter Suresh Borkar Single Presenter Autonomous Vehicle is a representative example of real time networks based advanced and intelligent applications. The 5G wireless based Vehicle to Everything (V2X) network utilizes distributed storage and computing capabilities enabled by the Cloud, Mobile Edge Computing (MEC), and (local) vehicle resident computer. This presentation provides the treatment of two scenarios - 1) avoiding accident with a non-moving obstruction or a moving vehicle in front and 2) traffic light handling. The vehicle resident camera and radar fusion is used as the source for sensor data. The brake system is the primary actuator. The algorithms are simulated based on use of one of two resources – onboard computer or the Cloud, thus bounding the analysis for the MEC resource situation. A graphic interface based simulation is implemented for validation and to demonstrate proof of concept for the two scenarios. The simulation module is built using Animation APIs from the Python library Matplotlib. The results between the two options – vehicle based on-board computer and the network cloud are discussed and compared. Pictorial representations of the vehicle as it approaches the target are presented. | Research Track Blockchain Sharding in IEEE 802.11ax Networks Rana Hegazy Single Presenter Blockchain technology has extended to secure systems and applications beyond cryptocurrency. Sharding the Blockchain technique increases the system throughput and decreases the delay while enhancing the system security. In this presentation, the Medium Access Control protocol for the wirelessly connected Internet of Things devices through IEEE 802.11ax is updated to include the validation process required by Ethereum sharding. The simulation results, where nodes compete to transmit their data, align with the theoretical equations presented in the paper. Lastly, the results show linear enhancement in the performance of the system as the number of shards increases. | ||
10:00AM | Programmable Real-Time Networks Are Cloud Hyperscalers Disrupting the Telecom Market Besma Smida Panelist Prasenjit Banerjee Moderator Raja K Kolli Panelist Shruthi Sreenivasa Murthy Panelist Venkat Chintha Panelist Cloud providers are expanding their services into areas traditionally dominated by network operators, such as content delivery networks (CDNs), edge computing, and 5G services. Major cloud providers, with their vast resources and advanced technologies, can leverage their market influence to compete directly with network operators, potentially eroding the latter’s market share.In this panel discussion , we are going to ask a our experts , practicing in the field of wireless network communication on the following pressing questions and learn about their perspectives and much more - What approach can balance the risk of depending on public cloud providers on critical infrastructure ? How can it be effectively accomplished ? - Does Edge computing via CDN help with reducing latency ? - Can Telecom providers leverage edge computing to process IoT data locally, ensuring faster response times and reducing the burden on central servers.? - Is edge computing via public cloud infrastructure the answer to applications requiring high bandwidth, such as video streaming and gaming, - Can security and privacy be at risk when it comes to location based services using a public cloud infrastructure ? | Research Track Towards Trustworthy Neural Network Intrusion Detection for Web SQL Injection Qianru Zhou Single Presenter Due to the pervasion of Artificial Intelligence, neural network has been widely used in various domains. However, trust issues have rise for the decision made by neural network models, because they are opaque and cannot explain their decisions. Numerous explainable artificial intelligence methods have been proposed to solve this question, but most of them can provide vague explanations. In security-centered domains such as cybersecurity, which relay on binary string analysis, even one bit’s misinterpretion will cause tremendous misunderstanding and misleading. Thus, formal and rigorous explanations are imperative. This paper proposes an rigorous explainable Web SQL Injection intrusion detection based on neural network models. Prime Implicant explanations that are 100% loyal to the model are extracted. Explanation performance are presented and compared with current explainable AI methodology SHAP in terms of precision and time overhead in detail. It is evident that proposed explainable neural network model are tractable and scalable. | ||
10:30AM | Research Track Performance Evaluation of 5G Enabled Smart Surveillance: Case of Adama City Hailu K. Belay Single Presenter This study examines the performance of smart security systems with 5G capabilities in detail, considering Adama City’s development strategy for smart cities. Installing considerable cabling may become logistically challenging, especially in large or dynamic spaces. Furthermore, connected systems may experience latency problems that impair their capacity to monitor in real time. Using empirical data and fundamental signal processing formulations at various frequency resources, the performance has been assessed in terms of spectrum and energy efficiency through a targeted analysis that uses Adama City as a case study. The results also show that, despite challenges related to frequency, cameras operating at higher 5G frequencies have greater capacity than those operating at sub-6 GHz. It has also been addressed that less energy is used than with traditional fixed power ramping when the beams of such cameras are adaptively focused based on the distance of the last cell edge user rather than the maximum cell radius. | |||
11:00AM | Break | |||
11:15AM | Programmable Real-Time Networks Keynote Artificial Intelligence in Mobility Network Management - A Service Provider’s Experience Jennifer Yates Single Presenter Driving towards a vision of autonomous networks is enabling service providers to efficiently scale with network expansion and to continually drive increased customer experience and new services. Artificial intelligence (AI), machine learning (ML) and closed loop network control are at the heart of this – enabling computers to make intelligent decisions and automatically respond to changing network conditions without human engagement. This presentation will focus on how AI, ML and closed loop network control are being used today at the mobility network edge to enhance customer experience, optimize network performance, deliver more environmentally friendly networks and drive network efficiencies. | |||
12:00PM | Lunch | |||
1:00PM | Programmable Real-Time Networks AI – RAN Eric Hagerson Single Presenter Artificial Intelligence has rapidly improved as AI and machine learning models become more complex, accurate and useful. As a result, AI is increasing being applied to more complex operations. Recently, there has been growing talk about leveraging AI in wireless networks, with the latter continuing to deploy advanced 5G technologies and the development of 6G already underway. As part of these discussions, one area of focus has been about how AI can greatly enhance Radio Access Networks. This potential paradigm shift in telecommunications has the potential to allow for dynamic, automated management of RANs thereby allowing carrier networks to respond to multiple inputs in real time to ensure optimal network performance. AI-RAN promises greater efficiencies, from both a cost and network perspective, as well as increased flexibility. It has the potential to dramatically enhance real-world network experiences.This session will present an overview of AI-RAN as well as discuss recently announced plans by T-Mobile to develop and integrate AI-RAN technology into its 5G network. | Research Track Zero-Knowledge Proofs in Speaker Verification Jeovane Honorio Alves Single Presenter As blockchain technology continues to grow in popularity, decentralized finance (DeFi) attracted attention by everyday users. However, with the rise of DeFi comes significant security and privacy challenges, especially the risk of losing funds due to compromised private keys. Enhancing security with user-friendly authentication methods is essential to address these concerns.Voice authentication has the potential to provide an additional security layer for blockchain systems, but implementing it in DeFi environments without compromising decentralization and privacy is complex. Moreover, advances in deep learning and voice cloning introduce new risks for voice-based systems.In this talk, we introduce ZK Verify Voice Authentication, a novel solution that integrates zero-knowledge proofs (zk-SNARKs) to ensure privacy-preserving speaker verification for the XRP Ledger (XRPL). By using voice embeddings as digital signatures and storing only their hashes on the blockchain, our system allows users to prove their identity without revealing sensitive voice data. This approach strengthens security and privacy for DeFi participants while maintaining ease of use, offering a secure and accessible solution for the broader blockchain ecosystem. | ||
1:30PM | Programmable Real-Time Networks Changes in the Messaging Environment - RCS Deployment by Apple and Impacts to Industry Andy Rollins Single Presenter For two+ decades, Short Message Service (SMS) and Multimedia Messaging Service (MMS) have been the basis of text communication for wireless customers. Rich Communication Services (RCS) has been in standards for a decade, however due to multiple issues it has not really taken off. With the iOS 18 launch, Apple adds support for RCS, enabling iPhone users to enjoy a richer messaging experience with Android users. This talk with discuss the current status of RCS support in the industry, and the impacts observed up to the presentation. | Research Track Secure platform for remote medical learning using WebRTC and facial recognition authentication Siré Eugène ZABOLO Single Presenter This article introduces an innovative distance medical learning platform designed to address the challenges of medical training in poorly equipped regions. Using WebRTC technology, this solution gives final-year medical students access to advanced teaching resources, such as real-time observation of surgical operations. Security is ensured by a dual authentication system, combining AI-based facial recognition (utilizing DeepFace) with traditional password identification. The architecture incorporates WebRTC, Socket.io, and the Internet of Things (IoT) to facilitate real-time communication and the dissemination of medical content. Uvicorn/Gunicorn servers are used to ensure a robust and scalable infrastructure. The platform aims to overcome geographical barriers and provide practical and interactive learning opportunities while guaranteeing the confidentiality of sensitive medical data. This approach represents a significant advancement in democratizing access to quality medical education, opening new perspectives for medical education in the digital age. | ||
2:00PM | Programmable Real-Time Networks Quantum Cryptography for Authenticion 5G and 6G Networks Dr. Biswaranjan Senapati, FBCS Single Presenter Join us in this talk to explore recent research on the application of quantum cryptography in 5G and 6G networks, particularly in the context of authentication, and its implications for smart grid, satellite communication, and bullet-proof communication.Notable technical developments throughout the development of mobile networks have revolutionized communication and information access. Every generation of networks, from the launch of 2G to the anticipated arrival of 6G, has significantly increased speed, capacity, security, and functionality.To highlight the innovations and structural changes brought about by each generation of networks, this talk examines the fundamental core network components of 2G, 3G, 4G, 5G, and the soon-to-be 6G networks. We will discuss security considerations, a recent comparison of 2G to 6G enabling technologies, and the application of quantum computing in industrial applications, particularly in manufacturing, healthcare, and government applications. We will outline the components of 6G and quantum technologies, including distributed ledger technology (DLT), physical layer security, distributed AI/ML, visible light communication (VLC), THz, and quantum computing. Additionally, we will discuss the evolution of network security from 2G to 6G and the application of quantum computing. | Research Track Toward Real-Time Video Streaming Over WebRTC Data Channels to Support Supplementary Video Codecs and Formats in the Browser David Diaz Single Presenter Although the broad availability of WebRTC in browsers enables many peer-to-peer media streaming applications, its limited codec and video format support constrains its use for applications requiring wide color gamuts, deep color, HDR, and other advanced video capabilities. In this paper, we present an implementation of an alternative solution that uses WebRTC data channels to enable streaming and playback of video codecs and formats not supported by browser WebRTC implementations. By encapsulating video frames in MP4 fragments and pushing them from source to recipient via data channel, we take advantage of the Media Source Extensions available in the browser to offer broader video codec and format support. We show results of the utilization of this method, which we find to be a viable alternative to traditional WebRTC video streams for unsupported formats and advanced video capabilities, given favorable network conditions and configuration, and we evaluate its performance and caveats. | ||
2:30PM | Break | |||
2:45PM | Keynote VoiceTech Training Machine Learning Classification Models for Creating Real-Time Data Points of Medical Conditions David vonThenen Single Presenter Nikki-Rae Alkema, PT, DPT Single Presenter As machine learning (ML) continues to revolutionize medical diagnostics, this field explores the transformative potential of leveraging AI technologies to assist healthcare professionals in creating tools for collecting data points leading to recognizing medical conditions, such as Parkinson\\\'s Disease. Through a collaborative presentation blending expertise in data science and human movement analysis, we will discuss methodologies and insights crucial to training machine learning models.This session will explore the underpinnings of image, video, and audio recognition techniques for multi-modal applications. Our focus will be on the practical implications for healthcare, emphasizing the integration of AI into clinical workflows to enhance diagnostic accuracy and efficiency. As the world moves further into the digital space, areas like telehealth must embrace advancements and tools to assist medical professionals. This discussion will bridge the gap between theory and practice, demonstrating the future possibilities of AI in improving patient care.Expect live demonstrations showcasing applications of ML models in clinical scenarios. Attendees will leave equipped with actionable insights and access to comprehensive code resources, empowering them to implement recognition solutions in their own domains, subjects, and areas of interest. | |||
3:30PM | Break | |||
3:45PM | VoiceTech How Susceptible are LLMs to Logical Fallacies? Dan Pluth Single Presenter This work investigates the rational thinking capability of Large Language Models (LLMs) in multi-round argumentative debates by exploring the impact of fallacious arguments on their logical reasoning performance. More specifically, we present Logic Competence Measurement Benchmark (LOGICOM), a diagnostic benchmark to assess the robustness of LLMs against logical fallacies. LOGICOM involves two agents: a persuader and a debater engaging in a multi-round debate on a controversial topic, where the persuader tries to convince the debater of the correctness of its claim. First, LOGICOM assesses the potential of LLMs to change their opinions through reasoning. Then, it evaluates the debater’s performance in logical reasoning by contrasting the scenario where the persuader employs logical fallacies against one where logical reasoning is used. We use this benchmark to evaluate the performance of GPT-3.5 and GPT-4 using a dataset containing controversial topics, claims, and reasons supporting them. Our findings indicate that both GPT-3.5 and GPT-4 can adjust their opinion through reasoning. However, when presented with logical fallacies, GPT-3.5 and GPT-4 are erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used. Finally, we introduce a new dataset containing over 5k pairs of logical vs. fallacious arguments. The source code is publicly available. | Research Track Enhancing Real-Time Multilingual Communication in Virtual Meetings Through Optimized WebRTC Broadcasting Dr. Biswaranjan Senapati, FBCS Single Presenter Communication and collaboration are pivotal in today’s globalized and remote work environments. Traditional video conferencing solutions often struggle with diverse linguistic needs and high latency, leading to communication challenges. This paper presents a solution to overcome these limitations by optimizing WebRTC broadcasting for RealTIME, high-quality, and low-latency multilingual communication in virtual meetings. Recent improvements in WebRTC standards and the demise of flash technology have sped up the adoption of WebRTC for a range of media-based uses, such as broadcasting and video conferences. However, WebRTC’s lack of standardized signaling mechanisms has given rise to a variety of methods for building effective networks for users. The goal of this research is to develop a scalable architecture for video broadcasting to numerous users by optimizing peer-to-peer networks while addressing multilingual translation challenges. A modified partial mesh model with location-based signaling has been proposed in response to scalability issues with traditional mesh-based networks. Peers create connections in this model based on proximity, which lowers latency and boosts reliability. In comparison to standard models, the experimental results show that the modified partial mesh network significantly reduces broadcasters’ bandwidth consumption. It is acknowledged that there are scalability issues.A hybrid strategy that combines server-based solutions and the partial mesh model is examined as a workable way to solve this and scale beyond a certain user threshold. Overall, this paper focuses on efficient and scalable WebRTC broadcasting solutions, enhancing real-time multilingual communication in virtual meetings and addressing the challenges faced by traditional video conferencing systems. | ||
4:15PM | VoiceTech Building Multiple Natural Language Processing Models to Work In Concert Together David vonThenen Single Presenter 1.5 billion messages are sent in Slack every week. At Zoom\\\'s peak, 300 million virtual meetings occurred on their platform daily. Facebook hosts 260 million conversations on any given day. The amount of information and data exchanged on platforms like Facebook, TikTok, and ChatGPT is almost incomprehensible. These conversations are transforming social networks into conversation data brokers used to identify trends, associations, and changes in the world.To collect this data, we must first build Natural Language Processing (NLP) models to break down these conversations and classify what\\\'s being said to understand their context. This session will focus on creating and collecting datasets, using those datasets to develop machine learning models, and then covering strategies for leveraging multiple machine learning models for data mining.We will cover how to obtain and process conversation data from multiple audio and video input sources and how to use the NLP models created in this session to extract information or metadata (e.g., sentence classification, entity recognition, etc.). During this talk, we will have live demos and provide code/resources for everything covered in this session. | Research Track Poster Session As we celebrate the 20th anniversary of the Real Time Communications Conference (RTC), we are excited to include a Poster Session in this year\\\'s program! This session is designed to highlight the research results and/or projects of students, researchers, and industry practitioners . There will be cash prizes awarded to the most highly rated poster presentations! This is a good opportunity to engage face-to-face with the conference participants as they share their research work, methods and plans for early stage projects, on-going work, or final projects, and receive valuable questions, observations and comments from our speakers, sponsors and attendees. It is also a good way to meet future research partners and future workmates or employers. | ||
4:45PM | VoiceTech TranscribeX: LLM-Enhanced ASR Transcription Grace LeFevre Single Presenter When transcribing telephony audio, Automatic Speech Recognition (ASR) engines often produce noisy output with high word error rates (WER). This impacts the efficacy of downstream analyses that process this transcribed text (intent determination, sentiment analysis, etc.). In this talk, we present two experiments demonstrating how a Large Language Model (LLM) can be used to improve the quality of telephony-based transcripts.In Experiment 1, we introduce an LLM choice method: providing an LLM with two or more ASR-generated transcripts for an audio file and instructing it to select the best transcription. In addition, the LLM is also prompted with information about domain and comparative ASR performance. Tested on an internal dataset of customer experience surveys, this approach yields a 1.7% WER improvement over the best-performing ASR. The method’s usefulness can be further maximized by focusing on documents with the highest ASR disagreement, achieving WER improvements of up to 5% on data subsets with high ASR disagreement.In Experiment 2, we further test the LLM choice method on a dataset taken from the same domain but in a different distribution. We find that the method is less effective on this dataset overall but still useful for some documents. In particular, short documents in this dataset benefit from the LLM choice method, with a 2% WER improvement over the best-performing ASR for 1-5 word transcripts.These experiments provide a proof-of-concept that ASR transcriptions of telephony audio can be improved via an LLM enhancement approach like the LLM choice method we propose. However, maximizing the performance gains from this approach requires a targeted improvement strategy specific to the domain and distribution of a dataset. | |||
5:15PM | Reception NA | |||
6:30PM |
Wednesday October 9, 2024 | |||
time | Room 1 | Room 2 | |
8:00AM | Breakfast | ||
9:00AM | Internet of Things Makak: Community-Driven Microclimate Sensor Development for Wild Rice Conservation Blaine Rothrock Single Presenter Current environmental challenges have profound local consequences and often benefit from the collection of fine-grained microclimate data. Advances in wireless sensor networks and the Internet of Things have led to technologies nominally suited to support remote sensing; however, in practice long-running deployments of in-field environmental sensors are rare, and a community-driven approach even more so.In this presentation I will detail the development of a sensor for the conservation of manoomin, the Ojibwe word for wild rice. Manoomin grows in the western great lakes region of North America and is affected by various environmental factors including climate change, agricultural development, and pollution. However the specifics of these effects are not fully understood. Manoomin has served as a pillar of culture and suspense for Objiwe for generations. Having data to support traditional ecological knowledge is critical to influence policy change in the conservation of this vital asset.We have developed a prototype sensor which is deployed within the wild rice beds of lakes in Wisconsin and Minnesota in direct-collaboration with Ojibwe nations and the Great Lakes Indian Fish and Wildlife Commission (GLIFWC). In addition, we have fueled our design space with an interview study from 13 microclimate and field-ecology experts. This sensor, Makak, the Ojibwe word for “container”, measures humidity, surface and depth temperature, relative water level, and explores the detection of boat wake –all metrics deemed important by community experts. Additionally, the device is low-cost, delivers near real-time data over LTE-M including the ability to validate on site with BLE, and promotes data sovereignty.Makak is currently in its first season of development with nine devices in the field. We wish to share our experience of community-driven IoT development and communicate lessons learned, technical considerations, and advocate for similar approaches to ecological applications of IoT. | WebRTC and Real-Time Applications Delivering stereoscopic, wraparound and Damien Stolarz Single Presenter Our team develops high-resolution, high-color, ultra-low-latency, surround-sound conferencing software used for secure remote editing and finishing of films and TV shows.We were tasked with porting our software to support Apple Vision Pro, so that our studio clients could use it for their secure pre-release content workflows, which tend to be technically challenging due to bitrate, latency and input device requirements.We’ll present the techniques we used and various challenges we overcame to stream stereoscopic and 360-degree video over WebRTC to the Apple Vision Pro using the Janus SFU. We’ll demonstrate the hardware and briefly dive into some of the considerations in delivering effective “spatial” content over WebRTC.We will address: • How we ported libwebRTC to AVP including developing custom Audio Device Module, Camera Capture (CameraSession, CameraCapturer and CameraPreview components), and Custom Renderers • Different stereo/dimensional formats (over/under, side-by-side, 360 spherical projection) and making them work with studio standard format capture devices • Developing multiple custom renderers to supports different color spaces and pixel formats in 2D, immersive, 180/360 and spatial modes using a MetalLayer-based renderer and a Compositor service | |
9:30AM | Internet of Things Enhancing API Security in IoT Applications Prasenjit Banerjee Single Presenter As IoT continues to transform industries, securing APIs becomes critical to protecting data integrity and system functionality. This presentation will address the unique challenges and best practices for API security in IoT applications, covering topics such as implementing OAuth 2.0 and role-based access control for secure authentication and authorization, ensuring TLS/SSL and end-to-end encryption for data protection, preventing DoS attacks through effective rate limiting, and maintaining data integrity with rigorous input validation and data sanitization techniques. Additionally, it will explore real-time API monitoring and anomaly detection using machine learning, and principles of secure API design illustrated with industry case studies. Attendees will gain practical insights and strategies to enhance API security in their IoT projects, enabling them to safeguard their systems against evolving threats. | WebRTC and Real-Time Applications Building a Low-Latency Voice Assistant Using LLMs: Insights and Challenges Enzo Piacenza Single Presenter At Telnyx, we provide APIs enabling dynamic phone call interactions. With the rise of Large Language Models (LLMs), we integrated them into our voice flows, enhancing customer applications. Our Voice Assistant combines transcription, response generation, and speech synthesis. Initially, we faced significant latency issues and poor interruption handling. Through various optimizations, including LLM streaming, service colocation, improved transcription, and a custom text-to-speech system, we reduced latency to 900-1,000ms. We also improved user experience with advanced end-of-speech detection and noise handling.During this presentation, we will explore our progress, the challenges we faced, and the innovative solutions we implemented to build a high-performance Voice Assistant. Today, our system delivers low-latency, high-quality voice interactions, allowing customers to focus on their business logic. | |
10:00AM | Internet of Things Street-level urban heat forecast and mapping using IoT-based weather sensors Peiyuan Li Single Presenter Many real-time IoT applications exist in manufacturing, transportation, and agriculture, with less effort on climate and weather. Accurately modeling urban microclimates is challenging due to the high surface heterogeneity of urban land cover and the vertical structure of street morphology. The use of machine learning (ML) techniques and data from street-level IoT-based smart sensors have become popular in recent years as an emerging approach for urban climate studies. By providing real-time and precise weather conditions, these sensors can drastically improve the predictive capabilities of current weather forecast models for urban heat. This real-time monitoring and enhanced predictive capability enable more accurate and timely responses to urban climate challenges. In this study, we developed a modeling protocol that leverages state-of-the-art climate modeling techniques, high-precision lidar-based urban morphology, and real-time data streams from IoT sensors. The protocol was tested in Chicago to map air temperature at a hyper-local neighborhood scale. Additionally, we investigated sensitivity by comparing results from two machine learning algorithms: Gaussian Process Regression and Graph Neural Network, based on the nature of point-scale measurements by IoT sensors. We further tested the model’s reliability on out-of-sample locations to explore implications for feature engineering, data quality control, and strategic data collection. The improved predictive capabilities also contribute to better urban management and decision-making, enabling city planners to immediately optimize strategies of traffic management, emergency response, and public health monitoring. This study aims to help urban climate modelers effectively leverage emerging street-level observations in real-time, gain insights dynamically into next-gen urban climate modeling, and guide observation efforts to build a holistic understanding of urban microclimate dynamics. | WebRTC and Real-Time Applications Voice and Conversational AI In Production with RTT less than 300ms Varun Singh Single Presenter Large Language Models (LLMs) are transforming voice interactions, enabling multi-turn conversations that are engaging and practical. In this talk, we’ll explore how at Daily we integrate WebRTC with LLMs for real-time voice-to-voice communication using our open standard, RTVI-AI.RTVI-AI defines real-time APIs for applications such as voice chats with LLMs, enterprise workflows in healthcare, video avatars, and voice-driven user interfaces. Daily's open-source voice engine integrates speech-to-text, LLMs, and text-to-speech, optimized for low-latency performance—currently at 500ms, aiming for 300ms.This presentation will demonstrate how we leverage these technologies to create seamless, real-time voice interactions for various use cases. | |
10:30AM | Break | ||
10:45AM | Next Generation Emergency Communications Services Keynote Plotting a Rational, Managed Evolution to the Next Generation Adan K. Pope Single Presenter In this keynote session, Adan Pope takes an outside-in approach to innovation, exploring the waves and hype cycles of technology evolution and their impact on the next generation of emergency communications. Adan will address the challenges faced when adopting and adapting new capabilities from a fast-paced and tumultuous Industry 4.0 machine (IoT, AI/ML, AR/VR, cyber, simulation and automation) powered by the long-term evolution of telecommunications (5G/6G, satellite, quantum). Adan will discuss the value of creating an industry point of view for your organization that will drive a rational, managed technology evolution based on empathy for the public/societal needs and transforming the “customer” experience more than chasing shiny objects. | ||
11:30AM | Lunch | ||
12:30PM | Next Generation Emergency Communications Services Why is native multimedia for Public Safety so important? James Kinney Single Presenter How can an agency utilize native Video or emerging technologies like "IMS Datachannel" to save lives? NENA has a working group on this topic describing "Native Multimedia to Public Safety". In this talk you will learn about how this type of communication differs from over-the-top, and how thinking outside the box with video, for example, can get the call takers new tools to share and gather information, ultimately saving more lives. A list of resources from related NENA work will be provided. | WebRTC and Real-Time Applications Going Live with Conversational AI: Overcoming Hurdles to Reliability and Scalability Qianze Zhang Single Presenter As conversational AI comes closer to achieving truly natural real-time interactions via voice and video, there are a few major challenges to rolling this functionality out to the public including the last mile, latency, time-to-market and the cost to scale. This presentation delves into what is necessary to bridge the gap between laboratory performance and reliable, cost-effective real-time conversational AI experiences in diverse real-world environments. The first challenge lies in the inherent instability and fluctuations of everyday internet connections—particularly in the last mile. Issues with bandwidth fluctuations, signal dropouts, higher congestion and packet-loss make it difficult to deliver a reliable and consistent connection for real-time communication between humans and AI. Another key network-related challenge is latency. The average latency in turn-taking in a typical human conversation between people in the same room is 208ms, which should be the goal for conversational AI. That said, achieving that level of ultra-low latency over the internet is complex and requires an end-to-end approach for optimizing latency considering devices, network, and infrastructure. Scalability and cost-effectiveness, particularly for video-enabled AI conversations, represents another major hurdle. A hybrid device-cloud architecture that leverages the strengths of both can help to deliver high-quality AI interactions while optimizing resource utilization. Finally, how can developers meet the demand for rapid deployment and iteration in the fast-paced AI landscape? An open-source approach can help to accelerate development, foster collaboration, and increase adoption in AI systems. Attend this presentation to gain insights on the intersection of real-time communication and human-computer interaction along with a better understanding of how to address the major challenges to deploying reliable conversational AI in your application. | |
1:00PM | Next Generation Emergency Communications Services Cybersecurity Standards Framework in NG9-1-1 Brandon Abley Single Presenter This session will provide an overview of the standards-based cybersecurity framework in NG9-1-1. This includes features of the i3 Standard for Next Generation 9-1-1, NG-PSAP, the new completed NG-SEC v2 standard, and the PSAP Credentialing Agency (PCA initiative). The update on the PCA Public Key Infrastructure (PKI) session will include deployment process and modifications to the overall business model. | WebRTC and Real-Time Applications How to measure webrtc call quality and making sense of webrtc stats Pratim Mallick Single Presenter This talk is about how we can quantitatively measure webrtc call quality using the metrics on publisher and subscriber side. This talks about how we have derived a formula that combines metrics like - packetloss, jitter, ratio of expected to actual bitrate sent - to map the network connection quality into a Mean Opinion score (MOS) that ranges from 0 - 5In addition to this we have a formula that takes into account the end user issues like audio concealments, video freeze and pauses to come up with a formula that helps to quantify the audio-video experience of a webrtc user.This talk will focus deeply on the internal stats of webrtc and explain the most important ones that are needed to understand the quality of webrtc calls | |
1:30PM | Next Generation Emergency Communications Services A self-contained system for obtaining the floor number of a 9-1-1 call from a mobile telephone in a multi-story building Cary Davids Single Presenter When a person dials 9-1-1 from a multi-story building using a mobile phone, their floor and room number are not available if they are unable to give the information orally to the Operator. This has been an outstanding problem for many years, since more than 80% of 9-1-1 calls are placed using mobile phones. GPS and triangulation of cell towers are usually able to provide the street address of the building, and this information can be passed to the First Responder. However, the floor number is not obtainable with these methods. We have developed a system for finding the floor number of a 9-1-1 caller in a multi-story building, using the difference in barometric pressure between the caller\\\'s mobile phone and the mobile phone or tablet of the First Responder as they enter the building in question. Accuracies of ±1 floor are easily obtainable using an average floor-to-floor distance in the building. A video of an iPhone application using this method will be shown. | WebRTC and Real-Time Applications Beyond audio and video: expanding the scope of WebRTC Jerod Venema Single Presenter Audio + video streaming is more of an expectation. Like telecoms, WebRTC providers are becoming "pipes" for communication, which is not the highest value for the end customer.How can we as an industry expand our view on what we "are" to include genAI, LLMs, etc, to take advantage of the data flowing through our pipes, rather than just owning the pipe itself.I am to suggest that pure WebRTC providers will still have a small niche, but the opportunity in WebRTC lies in expanding. beyond audio and video (or even data) into doing things WITH the data on behalf of the end customer. | |
2:00PM | Break | ||
2:15PM | WebRTC and Real-Time Applications Keynote Engaging a New Species: How Multimodal LLMs Are Set to Transform RTC Infrastructure Dr. Shawn Zhong Single Presenter Tony Bin Zhao Single Presenter The advent of multimodal large language models has introduced a new kind of participant into human real-time communication (RTC) infrastructure. While these models lack ears, they can still listen; though they have no mouth, they can speak. This raises a compelling question: how will the interfaces for integrating these models differ from the traditional microphones and speakers we use today? And how will these models, as new "customers" of RTC infrastructure, behave compared to humans?For example, the codecs designed for human auditory processing may need to evolve. LLMs could "speak" at speeds far beyond human capability or process several seconds of speech in just one second, provided the data arrives simultaneously. What new requirements will these capabilities place on RTC infrastructure, and how will it adapt?With over 700 billion minutes of real-time audio and video running through Agora’s RTC infrastructure annually, we are finely tuned for human-to-human communication. We invite you to join this keynote as we explore the potential of real-time communication using large language models and examine the possibilities and challenges in this innovative form of conversation. | ||
3:00PM | Break | ||
3:15PM | Next Generation Emergency Communications Services Unpacking the Complexities of NG911 Brandon Abley Panelist Carol Davids Moderator Eric Hagerson Panelist James Kinney Panelist Mark Fletcher Panelist Next-generation 911 (NG911) is often perceived as a single entity, but its true nature is far more complex. The primary function is to route emergency calls and data from various networks—commercial, Wi-Fi, cellular, or even legacy landlines—managed by different organizations, both public and private, to Emergency Services. Additionally, a new type of emergency communication has also emerged with the rise of the Internet of Things (IoT), where any smart device equipped with a specialized application can sense and report any emergent event leveraging multiple data networks.This intricate network ecosystem can lead to data collection, validation, and credibility inconsistencies—all critical elements directly impacting life safety decisions. Every entity involved—from origination networks and NG911 Service Providers to Public Safety Answering Points (PSAPs), call takers, dispatchers, and ultimately field responders—depends on the entire network infrastructure to deliver timely, accurate, and reliable data to perform their roles effectively.Join this esteemed panel of experts as they explore the development, challenges, and opportunities associated with each component of this transformative architecture. They will discuss the current state of NG911, the impact of the innovative solutions their companies are implementing, and the evolving regulatory environment that guides these advancements. Don’t miss this engaging session on the future of NG911 and how industry leaders are navigating its complexities. | WebRTC and Real-Time Applications Advanced Reference Structures for Scalable Video in Real-Time Applications Erik Språng Single Presenter Video bitstreams for RTC, especially when it comes to multi-way conferencing, often have temporal and/or spatial scalability which allows a back-end component to selectively forward a suitable frame rate and resolution to each receiver based on their current circumstances.This presentation outlines how such features are accomplished by setting up “reference structures”, i.e. how video frames use the reconstructed state from previous frames in such a way as to allow some frames to be skipped or facilitate efficient transmission of multiple resolutions. We’ll also cover the fixed set of structures that are supported by the WebRTC API today via “scalability modes”.Then we’ll have a look at what is coming down the line from the latest W3C working group discussions. In particular WebCodecs and RtpTransport that will allow an application to unbundle video encoding from the WebRTC monolith and create fully custom per-frame adaptive reference structures, unlocking many new use cases. This is somewhat of a paradigm shift, and we\\\'d love to hear feedback from the developer community! | |
3:45PM | WebRTC and Real-Time Applications Zero Code and Self Evolving Applications James Kinney Single Presenter Quick technological review and demo of UnityForge, a commercial research project on a zero-code application framework for generating collaborative enterprise applications using natural language. | ||
4:15PM |