Track: VoiceTech |
Digits Micro-Model: Enhancing Digit Recognition with Domain-Specific ASR |
Digit recognition is of utmost importance in processing payment information, phone numbers, and various numerical data. Accurate and efficient digit recognition plays a crucial role in ensuring seamless user experiences and preventing errors in critical tasks. Therefore, in this project, our primary goal is to train a domain-specific Kaldi Automatic Speech Recognition (ASR) model that can recognize digits of up to five digits. Recent advancements in ASR have often focused on the power of large-scale, domain-general models. However, in very constrained domains, a domain-specific "micro" model may outperform general-purpose models. Micro models are a lightweight mechanism compared to a large-scale, general model. Using a general ASR model, like Whisper or Amazon Transcribe, to do digit recognition is akin to breaking open a peanut with a sledgehammer. While the results will likely be sufficient, there are more effective approaches. For this reason, we train a Kaldi model on open-source single-digit utterances and test its ability to recognize variable-length digit strings, with a maximum length of five. To achieve robust digit recognition, we also curate a dataset that not only encompasses digits of various lengths, but also contains training observations that discern numerical digits pronounced by humans in diverse manners. For instance, the number 653 may be articulated as "six hundred and fifty-three," "six fifty-three," or even "sixty-three five." This diversity in digit lengths and pronunciation styles ensures that the model can effectively handle different numeric representations encountered in real-world scenarios. Our dataset comprises 14,000 instances collected from three diverse data sources, providing a comprehensive and representative collection of real-world numeric patterns. Through this project, we aim to contribute to the advancement of domain-specific ASR models, fostering more efficient and accurate digit recognition in critical applications. |
|
Presentation Video |
Presentation Notes |
CHHABLANI-DIGIT-MICRO-MODEL.pptx |