*Equal Contributors
Figuring out errors (i.e., miscues) made whereas studying aloud is often approached post-hoc by evaluating computerized speech recognition (ASR) transcriptions to the goal studying textual content. Nonetheless, post-hoc strategies carry out poorly when ASR inaccurately transcribes verbatim speech. To enhance on present strategies for studying error annotation, we suggest a novel end-to-end structure that comes with the goal studying textual content through prompting and is educated for each improved verbatim transcription and direct miscue detection. Our contributions embrace: first, demonstrating that incorporating studying textual content by means of prompting advantages verbatim transcription efficiency over fine-tuning, and second, displaying that it’s possible to enhance speech recognition duties for end-to-end miscue detection. We carried out two case studies—children’s read-aloud and grownup atypical speech—and discovered that our proposed methods enhance verbatim transcription and miscue detection in comparison with present state-of-the-art.