Constructing pure voice conversations with AI brokers requires advanced infrastructure and many code from engineering groups. Textual content-based agent interactions observe a turn-based sample: a consumer sends an entire request, waits for the agent to course of it, and receives a full response earlier than persevering with. Bi-directional streaming removes this constraint by establishing a persistent connection that carries information in each instructions concurrently.
Amazon Bedrock AgentCore Runtime helps bi-directional streaming for real-time, two-way communication between customers and AI brokers. With this functionality, brokers can concurrently hearken to consumer enter whereas producing responses, making a extra pure conversational move. That is notably well-suited for multimodal interactions, similar to voice and imaginative and prescient agent conversations. The agent can start responding whereas nonetheless receiving consumer enter, deal with mid-conversation interruptions, and modify its responses based mostly on real-time suggestions.
A bi-directional voice chat agent can conduct spoken conversations with the fluidity of human dialogue in order that customers can interrupt, make clear, or change subjects naturally. These brokers course of streaming audio enter and output concurrently whereas sustaining conversational state. Constructing this infrastructure requires managing persistent low-latency connections, dealing with concurrent audio streams, preserving context throughout exchanges, and scaling a number of conversations. Implementing these capabilities from scratch calls for months of engineering effort and specialised real-time techniques experience. Amazon Bedrock AgentCore Runtime addresses these challenges by offering a safe, serverless, and purpose-built internet hosting setting for deploying and working AI brokers, with out requiring builders to construct and preserve advanced streaming infrastructure themselves.
On this publish, you’ll study bi-directional streaming on AgentCore Runtime and the stipulations to create a WebSocket implementation. Additionally, you will discover ways to use Strands Brokers to implement a bi-directional streaming resolution for voice brokers.
AgentCore Runtime bi-directional streaming
Bi-directional streaming makes use of the WebSocket protocol. WebSocket gives full-duplex communication over a single TCP connection, establishing a persistent channel the place information flows repeatedly in each instructions. This protocol has broad consumer help throughout browsers, cellular purposes, and server environments, making it accessible for numerous implementation situations.
When a connection is established, the agent can obtain consumer enter as a stream whereas concurrently sending response chunks again to the consumer. The AgentCore Runtime manages the underlying infrastructure that handles connection, message ordering, and maintains conversational state throughout the bi-directional alternate. This alleviates the necessity for builders to construct customized streaming infrastructure or handle the complexities of concurrent information flows.Voice conversations differ from text-based interactions of their expectation of pure move. When talking with a voice agent, customers count on the identical conversational dynamics they expertise with people: the flexibility to interrupt when they should right themselves, to interject clarification mid-response, or to redirect the dialog with out awkward pauses.With bi-directional streaming, it’s attainable for voice brokers to course of incoming audio whereas producing responses, detecting interruptions, and adjusting conduct in real-time. The agent maintains conversational context all through these interactions, preserving the thread of dialogue even because the dialog shifts course. This functionality additionally helps voice brokers from turn-based techniques right into a responsive conversational accomplice.
Past voice conversations, bi-directional streaming has a number of interplay patterns. Interactive debugging classes enable builders to information brokers by problem-solving in real-time, offering suggestions because the agent explores options. Collaborative brokers can work alongside customers on shared duties, receiving steady enter because the work progresses reasonably than ready for full directions. Multi-modal brokers can course of streaming video or sensor information whereas concurrently offering evaluation and proposals. Async long-running agent operations can course of duties over minutes or hours whereas streaming incremental outcomes to purchasers.
WebSocket implementation
To create a WebSocket implementation in AgentCore Runtime, you need to observe a couple of patterns. Firstly, your containers should implement WebSocket endpoints on port 8080 on the /ws path, which aligns with normal WebSocket server practices. This WebSocket endpoint will allow a single agent container to serve each the standard InvokeAgentRuntime API and the brand new InvokeAgentRuntimeWithWebsocketStream API. Moreover, prospects should present a /ping endpoint for well being checks.
Bi-directional streaming utilizing WebSockets on AgentCore Runtime helps purposes utilizing a WebSocket language library. The consumer should connect with the service endpoint with a WebSocket protocol connection:
You additionally want to make use of one of many supported authentication strategies (SigV4 headers, SigV4 pre-signed URL, or OAuth 2.0) and to make it possible for the agent software implements the WebSocket service contract as laid out in HTTP protocol contract.
Strands bi-directional agent: Simplified voice agent growth
Amazon Nova Sonic unifies speech understanding and technology right into a single mannequin, delivering human-like conversational AI with low latency, main accuracy, and powerful value efficiency. Its built-in structure gives expressive speech technology and real-time transcription in a single mannequin, dynamically adapting responses based mostly on enter speech prosody, tempo, and timbre.
With bi-directional streaming now additionally obtainable in AgentCore Runtime, you have got a number of methods to indicate easy methods to host a voice agent: one could be the direct implementation the place you’ll want to managing WebSocket connections, parsing protocol occasions, dealing with audio chunks, and orchestrating async duties; one other is the strands bi-directional agent implementation that abstracts this complexity and implements these steps by itself.
Instance Implementation
On this publish, you need to consult with the Amazon Bedrock AgentCore bi-directional code, which implements bi-directional communication with Amazon Bedrock AgentCore. The repository has two implementations: One which makes use of native Amazon Nova Sonic Python implementation deployed on to AgentCore Runtime, and a high-level framework implementation utilizing the Strands bi-directional agent for simplified real-time audio conversations.
The next diagram exhibits the native Amazon Nova Sonic Python WebSocket server on to AgentCore. It gives full management over the Nova Sonic protocol with direct occasion dealing with for full visibility into session administration, audio streaming, and response technology.
The Strands bi-directional agent framework for real-time audio conversations with Amazon Nova Sonic gives a high-level abstraction that simplifies bi-directional streaming, computerized session administration, and gear integration. The code snippet under is an instance of this simplification.
This implementation demonstrates the simplicity of Strands: instantiate a mannequin, create an agent with instruments and a system immediate, and run it with enter/output streams. The framework handles protocol complexity internally.
The next is the agent declaration part within the code:
Instruments are handed on to the agent’s constructor, and Strands handles operate calling orchestration routinely. In abstract, a local WebSocket implementation of the identical performance requires roughly 150 strains of code, whereas Strands implementation reduces this to roughly 20 strains centered on enterprise logic. Builders can concentrate on defining agent conduct, integrating instruments, and crafting system prompts reasonably than managing WebSocket connections, parsing occasions, dealing with audio chunks, or orchestrating async duties. This makes bi-directional streaming accessible to builders with out specialised real-time techniques experience whereas sustaining full entry to the audio dialog capabilities of Nova Sonic. The Strands bi-directional characteristic is at the moment solely supported for the Python SDK. In case you are on the lookout for flexibility within the implementation of your voice agent, the native Amazon Nova Sonic implementation may also help you. Additionally, this may be necessary for the instances the place you have got a number of totally different patterns of communication from agent to mannequin. With Amazon Nova Sonic implementation it is possible for you to to regulate each step of the method with full management. The framework method can present higher management of dependencies, as a result of it’s achieved by the SDK, and gives consistency throughout techniques. The identical Strands bi-directional agent code construction works with Nova Sonic, OpenAI Realtime API, and Google Gemini Stay builders merely swap the mannequin implementation whereas maintaining the remainder of their code unchanged.
Conclusion
The bi-directional streaming functionality of Amazon Bedrock AgentCore Runtime transforms how builders can construct conversational AI brokers. By offering WebSocket-based real-time communication infrastructure, AgentCore removes months of engineering effort required to implement streaming techniques from scratch. The framework runtime allows builders to deploy a number of varieties of voice brokers—from native protocol implementations utilizing Amazon Nova Sonic to high-level frameworks just like the Strands bi-directional agent—inside the identical safe, serverless setting.
In regards to the authors
Lana Zhang is a Senior Specialist Options Architect for Generative AI at AWS inside the Worldwide Specialist Group. She makes a speciality of AI/ML, with a concentrate on use instances similar to AI voice assistants and multimodal understanding. She works intently with prospects throughout numerous industries, together with media and leisure, gaming, sports activities, promoting, monetary companies, and healthcare, to assist them rework their enterprise options by AI.
Phelipe Fabres is a Senior Specialist Options Architect for Generative AI at AWS for Startups. He makes a speciality of AI/ML with a concentrate on Agentic techniques and the total course of of coaching/inference. He has greater than 10 years of working with software program growth, from monolith to event-driven architectures with a Ph.D. in Graph Idea.
Evandro Franco is an Sr. Knowledge Scientist engaged on Amazon Internet Providers. He’s a part of the International GTM group that helps AWS prospects overcome enterprise challenges associated to AI/ML on high of AWS, primarily on Amazon Bedrock AgentCore and Strands Brokers. He has greater than 18 years of expertise working with know-how, from software program growth, infrastructure, serverless, to machine studying. In his free time, Evandro enjoys taking part in along with his son, primarily constructing some humorous Lego bricks.

