Optimizing Contextual Speech Recognition Utilizing Vector Quantization for Environment friendly Retrieval

This paper was accepted to the IEEE Spoken Language Expertise Workshop (SLT) 2024.

Neural contextual biasing permits speech recognition fashions to leverage contextually related data, resulting in improved transcription accuracy. Nonetheless, the biasing mechanism is usually primarily based on a cross-attention module between the audio and a list of biasing entries, which suggests computational complexity can pose extreme sensible limitations on the scale of the biasing catalogue and consequently on accuracy enhancements. This work proposes an approximation to cross-attention scoring primarily based on vector quantization and permits compute- and memory-efficient use of huge biasing catalogues. We suggest to make use of this method collectively with a retrieval primarily based contextual biasing strategy. First, we use an environment friendly quantized retrieval module to shortlist biasing entries by grounding them on audio. Then we use retrieved entries for biasing. Because the proposed strategy is agnostic to the biasing methodology, we examine utilizing full cross-attention, LLM prompting, and a mixture of the 2. We present that retrieval primarily based shortlisting permits the system to effectively leverage biasing catalogues of a number of hundreds of entries, leading to as much as 71% relative error price discount in private entity recognition. On the identical time, the proposed approximation algorithm reduces compute time by 20% and reminiscence utilization by 85-95%, for lists of as much as a million entries, when in comparison with commonplace dot-product cross-attention.

Main Menu

What's Hot

Pricing Construction and Key Options

March Patch Tuesday: Three excessive severity holes in Microsoft Workplace

When KaOS Linux dropped KDE Plasma, I nervous – now I am loving the brand new default desktop

Optimizing Contextual Speech Recognition Utilizing Vector Quantization for Environment friendly Retrieval

Run Tiny AI Fashions Domestically Utilizing BitNet A Newbie Information

From Textual content to Tables: Characteristic Engineering with LLMs for Tabular Knowledge

How Agent Expertise Create Specialised AI With out Coaching – O’Reilly

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Pricing Construction and Key Options

March Patch Tuesday: Three excessive severity holes in Microsoft Workplace

When KaOS Linux dropped KDE Plasma, I nervous – now I am loving the brand new default desktop

Run Tiny AI Fashions Domestically Utilizing BitNet A Newbie Information

Main Menu

Subscribe to Updates

What's Hot

Optimizing Contextual Speech Recognition Utilizing Vector Quantization for Environment friendly Retrieval

Related Posts