SpeakStream: Streaming Textual content-to-Speech with Interleaved Knowledge

With the growing integration of speech front-ends and huge language fashions (LLM),
there’s a must discover architectures that combine these modalities.
Whereas end-to-end fashions have been explored extensively, cascaded fashions that stream outputs from LLMs to TTS appear to be oddly under-explored, although they’re doubtlessly a lot easier.
Utilizing conventional text-to-speech programs to transform LLM outputs to audio, nonetheless, poses a technical drawback as a result of they want whole utterances to generate sytlistic audio.
On this paper we current a ‘streaming’ TTS that may generate audio from streaming textual content utilizing a novel decoder-only structure that interleaves textual content and speech.
The mannequin is educated utilizing next-step prediction on interleaved knowledge that’s generated from force-alignment of textual content transcripts to speech.
Duing inference our system processes textual content incrementally whereas producing constant speech output, making it appropriate for real-time purposes like conversational AI brokers the place an LLM can stream textual content to a TTS system.
Outcomes reveal that our method matches the standard of batch TTS programs whereas enabling streaming capabilities.

Main Menu

What's Hot

AI use is altering how a lot firms pay for cyber insurance coverage

AI-Powered Cybercrime Is Surging. The US Misplaced $16.6 Billion in 2024.

Setting Up a Google Colab AI-Assisted Coding Surroundings That Really Works

SpeakStream: Streaming Textual content-to-Speech with Interleaved Knowledge

Setting Up a Google Colab AI-Assisted Coding Surroundings That Really Works

We ran 16 AI Fashions on 9,000+ Actual Paperwork. Here is What We Discovered.

Quick Paths and Sluggish Paths – O’Reilly

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

AI use is altering how a lot firms pay for cyber insurance coverage

AI-Powered Cybercrime Is Surging. The US Misplaced $16.6 Billion in 2024.

Setting Up a Google Colab AI-Assisted Coding Surroundings That Really Works

Pricing Breakdown and Core Characteristic Overview

Main Menu

Subscribe to Updates

What's Hot

SpeakStream: Streaming Textual content-to-Speech with Interleaved Knowledge

Related Posts