Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Wiz Uncovers Vital Entry Bypass Flaw in AI-Powered Vibe Coding Platform Base44

    July 30, 2025

    AI vs. AI: Prophet Safety raises $30M to interchange human analysts with autonomous defenders

    July 30, 2025

    A Deep Dive into Picture Embeddings and Vector Search with BigQuery on Google Cloud

    July 30, 2025
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Stream multi-channel audio to Amazon Transcribe utilizing the Net Audio API
    Machine Learning & Research

    Stream multi-channel audio to Amazon Transcribe utilizing the Net Audio API

    Oliver ChambersBy Oliver ChambersJune 10, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Stream multi-channel audio to Amazon Transcribe utilizing the Net Audio API
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Multi-channel transcription streaming is a function of Amazon Transcribe that can be utilized in lots of instances with an internet browser. Creating this stream supply has it challenges, however with the JavaScript Net Audio API, you’ll be able to join and mix totally different audio sources like movies, audio information, or {hardware} like microphones to acquire transcripts.

    On this put up, we information you thru easy methods to use two microphones as audio sources, merge them right into a single dual-channel audio, carry out the required encoding, and stream it to Amazon Transcribe. A Vue.js utility supply code is offered that requires two microphones linked to your browser. Nevertheless, the flexibility of this method extends far past this use case—you’ll be able to adapt it to accommodate a variety of units and audio sources.

    With this method, you will get transcripts for 2 sources in a single Amazon Transcribe session, providing value financial savings and different advantages in comparison with utilizing a separate session for every supply.

    Challenges when utilizing two microphones

    For our use case, utilizing a single-channel stream for 2 microphones and enabling Amazon Transcribe speaker label identification to determine the audio system could be sufficient, however there are a couple of issues:

    • Speaker labels are randomly assigned at session begin, which means you’ll have to map the ends in your utility after the stream has began
    • Mislabeled audio system with related voice tones can occur, which even for a human is tough to tell apart
    • Voice overlapping can happen when two audio system discuss on the similar time with one audio supply

    By utilizing two audio sources with microphones, you’ll be able to tackle these considerations by ensuring every transcription is from a set enter supply. By assigning a tool to a speaker, our utility is aware of upfront which transcript to make use of. Nevertheless, you would possibly nonetheless encounter voice overlapping if two close by microphones are choosing up a number of voices. This may be mitigated through the use of directional microphones, quantity administration, and Amazon Transcribe word-level confidence scores.

    Answer overview

    The next diagram illustrates the answer workflow.

    Software diagram for 2 microphones

    We use two audio inputs with the Net Audio API. With this API, we are able to merge the 2 inputs, Mic A and Mic B, right into a single audio knowledge supply, with the left channel representing Mic A and the suitable channel representing Mic B.

    Then, we convert this audio supply to PCM (Pulse-Code Modulation) audio. PCM is a standard format for audio processing, and it’s one of many codecs required by Amazon Transcribe for the audio enter. Lastly, we stream the PCM audio to Amazon Transcribe for transcription.

    Stipulations

    You need to have the next stipulations in place:

    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Sid": "DemoWebAudioAmazonTranscribe",
          "Effect": "Allow",
          "Action": "transcribe:StartStreamTranscriptionWebSocket",
          "Resource": "*"
        }
      ]
    }
    

    Begin the appliance

    Full the next steps to launch the appliance:

    1. Go to the basis listing the place you downloaded the code.
    2. Create a .env file to arrange your AWS entry keys from the env.pattern file.
    3. Set up packages and run bun set up (if you happen to’re utilizing node, run node set up).
    4. Begin the online server and run bun dev (if you happen to’re utilizing node, run node dev).
    5. Open your browser in http://localhost:5173/.
      Application running on http://localhost:5173

      Software operating on http://localhost:5173 with two linked microphones

    Code walkthrough

    On this part, we study the necessary code items for the implementation:

    1. Step one is to listing the linked microphones through the use of the browser API navigator.mediaDevices.enumerateDevices():
    const units = await navigator.mediaDevices.enumerateDevices()
    return units.filter((d) => d.form === 'audioinput')
    
    1. Subsequent, you should acquire the MediaStream object for every of the linked microphones. This may be carried out utilizing the navigator.mediaDevices.getUserMedia() API, which permits entry the consumer’s media units (corresponding to cameras and microphones). You may then retrieve a MediaStream object that represents the audio or video knowledge from these units:
    const streams = []
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        deviceId: system.deviceId,
        echoCancellation: true,
        noiseSuppression: true,
        autoGainControl: true,
      },
    })
    
    if (stream) streams.push(stream)
    1. To mix the audio from the a number of microphones, you should create an AudioContext interface for audio processing. Inside this AudioContext, you should use ChannelMergerNode to merge the audio streams from the totally different microphones. The join(vacation spot, src_idx, ch_idx) methodology arguments are:
      • vacation spot – The vacation spot, in our case mergerNode.
      • src_idx – The supply channel index, in our case each 0 (as a result of every microphone is a single-channel audio stream).
      • ch_idx – The channel index for the vacation spot, in our case 0 and 1 respectively, to create a stereo output.
    // occasion of audioContext
    const audioContext = new AudioContext({
           sampleRate: SAMPLE_RATE,
    })
    // that is used to course of the microphone stream knowledge
    const audioWorkletNode = new AudioWorkletNode(audioContext, 'recording-processor', {...})
    // microphone A
    const audioSourceA = audioContext.createMediaStreamSource(mediaStreams[0]);
    // microphone B
    const audioSourceB = audioContext.createMediaStreamSource(mediaStreams[1]);
    // audio node for 2 inputs
    const mergerNode = audioContext.createChannelMerger(2);
    // join the audio sources to the mergerNode vacation spot.  
    audioSourceA.join(mergerNode, 0, 0);
    audioSourceB.join(mergerNode, 0, 1);
    // join our mergerNode to the AudioWorkletNode
    merger.join(audioWorkletNode);
    
    1. The microphone knowledge is processed in an AudioWorklet that emits knowledge messages each outlined variety of recording frames. These messages will include the audio knowledge encoded in PCM format to ship to Amazon Transcribe. Utilizing the p-event library, you’ll be able to asynchronously iterate over the occasions from the Worklet. A extra in-depth description about this Worklet is offered within the subsequent part of this put up.
    import { pEventIterator } from 'p-event'
    ...
    
    // Register the worklet
    attempt {
      await audioContext.audioWorklet.addModule('./worklets/recording-processor.js')
    } catch (e) {
      console.error('Did not load audio worklet')
    }
    
    //  An async iterator 
    const audioDataIterator = pEventIterator<'message', MessageEvent>(
      audioWorkletNode.port,
      'message',
    )
    ...
    
    // AsyncIterableIterator: Each time the worklet emits an occasion with the message `SHARE_RECORDING_BUFFER`, this iterator will return the AudioEvent object that we want.
    const getAudioStream = async operate* (
      audioDataIterator: AsyncIterableIterator>,
    ) {
      for await (const chunk of audioDataIterator) {
        if (chunk.knowledge.message === 'SHARE_RECORDING_BUFFER') {
          const { audioData } = chunk.knowledge
          yield {
            AudioEvent: {
              AudioChunk: audioData,
            },
          }
        }
      }
    }
    
    1. To begin streaming the info to Amazon Transcribe, you should use the fabricated iterator and enabled NumberOfChannels: 2 and EnableChannelIdentification: true to allow the twin channel transcription. For extra data, check with the AWS SDK StartStreamTranscriptionCommand documentation.
    import {
      LanguageCode,
      MediaEncoding,
      StartStreamTranscriptionCommand,
    } from '@aws-sdk/client-transcribe-streaming'
    
    const command = new StartStreamTranscriptionCommand({
        LanguageCode: LanguageCode.EN_US,
        MediaEncoding: MediaEncoding.PCM,
        MediaSampleRateHertz: SAMPLE_RATE,
        NumberOfChannels: 2,
        EnableChannelIdentification: true,
        ShowSpeakerLabel: true,
        AudioStream: getAudioStream(audioIterator),
      })
    
    1. After you ship the request, a WebSocket connection is created to trade audio stream knowledge and Amazon Transcribe outcomes:
    const knowledge = await consumer.ship(command)
    for await (const occasion of knowledge.TranscriptResultStream) {
        for (const results of occasion.TranscriptEvent.Transcript.Outcomes || []) {
            callback({ ...end result })
        }
    }
    

    The end result object will embody a ChannelId property that you should use to determine your microphone supply, corresponding to ch_0 and ch_1, respectively.

    Deep dive: Audio Worklet

    Audio Worklets can execute in a separate thread to offer very low-latency audio processing. The implementation and demo supply code may be discovered within the public/worklets/recording-processor.js file.

    For our case, we use the Worklet to carry out two predominant duties:

    1. Course of the mergerNode audio in an iterable method. This node consists of each of our audio channels and is the enter to our Worklet.
    2. Encode the info bytes of the mergerNode node into PCM signed 16-bit little-endian audio format. We do that for every iteration or when required to emit a message payload to our utility.

    The final code construction to implement that is as follows:

    class RecordingProcessor extends AudioWorkletProcessor {
      constructor(choices) {
        tremendous()
      }
      course of(inputs, outputs) {...}
    }
    
    registerProcessor('recording-processor', RecordingProcessor)
    

    You may move customized choices to this Worklet occasion utilizing the processorOptions attribute. In our demo, we set a maxFrameCount: (SAMPLE_RATE * 4) / 10 as a bitrate information to find out when to emit a brand new message payload. A message is for instance:

    this.port.postMessage({
      message: 'SHARE_RECORDING_BUFFER',
      buffer: this._recordingBuffer,
      recordingLength: this.recordedFrames,
      audioData: new Uint8Array(pcmEncodeArray(this._recordingBuffer)), // PCM encoded audio format
    })
    

    PCM encoding for 2 channels

    Probably the most necessary sections is easy methods to encode to PCM for 2 channels. Following the AWS documentation within the Amazon Transcribe API Reference, the AudioChunk is outlined by: Length (s) * Pattern Fee (Hz) * Variety of Channels * 2. For 2 channels, 1 second at 16000Hz is: 1 * 16000 * 2 * 2 = 64000 bytes. Our encoding operate it ought to then appear like this:

    // Discover that enter is an array, the place every component is a channel with Float32 values between -1.0 and 1.0 from the AudioWorkletProcessor.
    const pcmEncodeArray = (enter: Float32Array[]) => {
      const numChannels = enter.size
      const numSamples = enter[0].size
      const bufferLength = numChannels * numSamples * 2 // 2 bytes per pattern per channel
      const buffer = new ArrayBuffer(bufferLength)
      const view = new DataView(buffer)
    
      let index = 0
    
      for (let i = 0; i < numSamples; i++) {
        // Encode for every channel
        for (let channel = 0; channel < numChannels; channel++) {
          const s = Math.max(-1, Math.min(1, enter[channel][i]))
          // Convert the 32 bit float to 16 bit PCM audio waveform samples.
          // Max worth: 32767 (0x7FFF), Min worth: -32768 (-0x8000) 
          view.setInt16(index, s < 0 ? s * 0x8000 : s * 0x7fff, true)
          index += 2
        }
      }
      return buffer
    }
    

    For extra data how the audio knowledge blocks are dealt with, see AudioWorkletProcessor: course of() methodology. For extra data on PCM format encoding, see Multimedia Programming Interface and Information Specs 1.0.

    Conclusion

    On this put up, we explored the implementation particulars of an internet utility that makes use of the browser’s Net Audio API and Amazon Transcribe streaming to allow real-time dual-channel transcription. By utilizing the mix of AudioContext, ChannelMergerNode, and AudioWorklet, we had been in a position to seamlessly course of and encode the audio knowledge from two microphones earlier than sending it to Amazon Transcribe for transcription. The usage of the AudioWorklet particularly allowed us to realize low-latency audio processing, offering a clean and responsive consumer expertise.

    You may construct upon this demo to create extra superior real-time transcription purposes that cater to a variety of use instances, from assembly recordings to voice-controlled interfaces.

    Check out the answer for your self, and depart your suggestions within the feedback.


    In regards to the Creator

    Jorge Lanzarotti Jorge Lanzarotti is a Sr. Prototyping SA at Amazon Net Providers (AWS) based mostly on Tokyo, Japan. He helps clients within the public sector by creating progressive options to difficult issues.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    A Deep Dive into Picture Embeddings and Vector Search with BigQuery on Google Cloud

    July 30, 2025

    MMAU: A Holistic Benchmark of Agent Capabilities Throughout Numerous Domains

    July 29, 2025

    Construct a drug discovery analysis assistant utilizing Strands Brokers and Amazon Bedrock

    July 29, 2025
    Top Posts

    Wiz Uncovers Vital Entry Bypass Flaw in AI-Powered Vibe Coding Platform Base44

    July 30, 2025

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025
    Don't Miss

    Wiz Uncovers Vital Entry Bypass Flaw in AI-Powered Vibe Coding Platform Base44

    By Declan MurphyJuly 30, 2025

    Cybersecurity researchers have disclosed a now-patched essential safety flaw in a well-liked vibe coding platform…

    AI vs. AI: Prophet Safety raises $30M to interchange human analysts with autonomous defenders

    July 30, 2025

    A Deep Dive into Picture Embeddings and Vector Search with BigQuery on Google Cloud

    July 30, 2025

    Robotic arm with gentle grippers helps individuals with disabilities make pizza and extra

    July 30, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.