Track: VoiceTech |
TranscribeX: LLM-Enhanced ASR Transcription |
When transcribing telephony audio, Automatic Speech Recognition (ASR) engines often produce noisy output with high word error rates (WER). This impacts the efficacy of downstream analyses that process this transcribed text (intent determination, sentiment analysis, etc.). In this talk, we present two experiments demonstrating how a Large Language Model (LLM) can be used to improve the quality of telephony-based transcripts. In Experiment 1, we introduce an LLM choice method: providing an LLM with two or more ASR-generated transcripts for an audio file and instructing it to select the best transcription. In addition, the LLM is also prompted with information about domain and comparative ASR performance. Tested on an internal dataset of customer experience surveys, this approach yields a 1.7% WER improvement over the best-performing ASR. The method’s usefulness can be further maximized by focusing on documents with the highest ASR disagreement, achieving WER improvements of up to 5% on data subsets with high ASR disagreement. In Experiment 2, we further test the LLM choice method on a dataset taken from the same domain but in a different distribution. We find that the method is less effective on this dataset overall but still useful for some documents. In particular, short documents in this dataset benefit from the LLM choice method, with a 2% WER improvement over the best-performing ASR for 1-5 word transcripts. These experiments provide a proof-of-concept that ASR transcriptions of telephony audio can be improved via an LLM enhancement approach like the LLM choice method we propose. However, maximizing the performance gains from this approach requires a targeted improvement strategy specific to the domain and distribution of a dataset. |
|
Presentation Video |
Presentation Notes |
LEFEVRE_TranscribeXLLMEnhancedASRTranscription.pdf |