Construct a serverless audio summarization answer with Amazon Bedrock and Whisper

Recordings of enterprise conferences, interviews, and buyer interactions have change into important for preserving necessary info. Nevertheless, transcribing and summarizing these recordings manually is commonly time-consuming and labor-intensive. With the progress in generative AI and automated speech recognition (ASR), automated options have emerged to make this course of quicker and extra environment friendly.

Defending personally identifiable info (PII) is a crucial side of knowledge safety, pushed by each moral tasks and authorized necessities. On this put up, we reveal the right way to use the Open AI Whisper basis mannequin (FM) Whisper Massive V3 Turbo, out there in Amazon Bedrock Market, which provides entry to over 140 fashions by means of a devoted providing, to provide close to real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of delicate info.

Amazon Bedrock is a completely managed service that gives a alternative of high-performing FMs from main AI firms like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming quickly), Stability AI, and Amazon Nova by means of a single API, together with a broad set of capabilities to construct generative AI functions with safety, privateness, and accountable AI. Moreover, you should utilize Amazon Bedrock Guardrails to robotically redact delicate info, together with PII, from the transcription summaries to assist compliance and information safety wants.

On this put up, we stroll by means of an end-to-end structure that mixes a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Features to orchestrate the workflow, facilitating seamless integration and processing.

Answer overview

The answer highlights the facility of integrating serverless applied sciences with generative AI to automate and scale content material processing workflows. The person journey begins with importing a recording by means of a React frontend utility, hosted on Amazon CloudFront and backed by Amazon Easy Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Features state machine that orchestrates the core processing steps, utilizing AI fashions and Lambda features for seamless information move and transformation. The next diagram illustrates the answer structure.

The workflow consists of the next steps:

The React utility is hosted in an S3 bucket and served to customers by means of CloudFront for quick, world entry. API Gateway handles interactions between the frontend and backend companies.
Customers add audio or video information immediately from the app. These recordings are saved in a chosen S3 bucket for processing.
An Amazon EventBridge rule detects the S3 add occasion and triggers the Step Features state machine, initiating the AI-powered processing pipeline.
The state machine performs audio transcription, summarization, and redaction by orchestrating a number of Amazon Bedrock fashions in sequence. It makes use of Whisper for transcription, Claude for summarization, and Guardrails to redact delicate information.
The redacted abstract is returned to the frontend utility and exhibited to the person.

The next diagram illustrates the state machine workflow.

The Step Features state machine orchestrates a sequence of duties to transcribe, summarize, and redact delicate info from uploaded audio/video recordings:

A Lambda operate is triggered to assemble enter particulars (for instance, Amazon S3 object path, metadata) and put together the payload for transcription.
The payload is shipped to the OpenAI Whisper Massive V3 Turbo mannequin by means of the Amazon Bedrock Market to generate a close to real-time transcription of the recording.
The uncooked transcript is handed to Anthropic’s Claude Sonnet 3.5 by means of Amazon Bedrock, which produces a concise and coherent abstract of the dialog or content material.
A second Lambda operate validates and forwards the abstract to the redaction step.
The abstract is processed by means of Amazon Bedrock Guardrails, which robotically redacts PII and different delicate information.
The redacted abstract is saved or returned to the frontend utility by means of an API, the place it’s exhibited to the person.

Conditions

Earlier than you begin, just remember to have the next conditions in place:

Create a guardrail within the Amazon Bedrock console

For directions for creating guardrails in Amazon Bedrock, check with Create a guardrail. For particulars on detecting and redacting PII, see Take away PII from conversations by utilizing delicate info filters. Configure your guardrail with the next key settings:

Allow PII detection and dealing with
Set PII motion to Redact
Add the related PII varieties, similar to:
- Names and identities
- Telephone numbers
- E-mail addresses
- Bodily addresses
- Monetary info
- Different delicate private info

After you deploy the guardrail, word the Amazon Useful resource Identify (ARN), and you may be utilizing this when deploys the mannequin.

Deploy the Whisper mannequin

Full the next steps to deploy the Whisper Massive V3 Turbo mannequin:

On the Amazon Bedrock console, select Mannequin catalog below Basis fashions within the navigation pane.
Seek for and select Whisper Massive V3 Turbo.
On the choices menu (three dots), select Deploy.

Modify the endpoint title, variety of cases, and occasion sort to fit your particular use case. For this put up, we use the default settings.
Modify the Superior settings part to fit your use case. For this put up, we use the default settings.
Select Deploy.

This creates a brand new AWS Identification and Entry Administration IAM function and deploys the mannequin.

You’ll be able to select Market deployments within the navigation pane, and within the Managed deployments part, you’ll be able to see the endpoint standing as Creating. Watch for the endpoint to complete deployment and the standing to vary to In Service, then copy the Endpoint Identify, and you may be utilizing this when deploying the

Deploy the answer infrastructure

Within the GitHub repo, comply with the directions within the README file to clone the repository, then deploy the frontend and backend infrastructure.

We use the AWS Cloud Improvement Equipment (AWS CDK) to outline and deploy the infrastructure. The AWS CDK code deploys the next assets:

React frontend utility
Backend infrastructure
S3 buckets for storing uploads and processed outcomes
Step Features state machine with Lambda features for audio processing and PII redaction
API Gateway endpoints for dealing with requests
IAM roles and insurance policies for safe entry
CloudFront distribution for internet hosting the frontend

Implementation deep dive

The backend consists of a sequence of Lambda features, every dealing with a selected stage of the audio processing pipeline:

Add handler – Receives audio information and shops them in Amazon S3
Transcription with Whisper – Converts speech to textual content utilizing the Whisper mannequin
Speaker detection – Differentiates and labels particular person audio system throughout the audio
Summarization utilizing Amazon Bedrock – Extracts and summarizes key factors from the transcript
PII redaction – Makes use of Amazon Bedrock Guardrails to take away delicate info for privateness compliance

Let’s study among the key elements:

The transcription Lambda operate makes use of the Whisper mannequin to transform audio information to textual content:

def transcribe_with_whisper(audio_chunk, endpoint_name):
    # Convert audio to hex string format
    hex_audio = audio_chunk.hex()
    
    # Create payload for Whisper mannequin
    payload = {
        "audio_input": hex_audio,
        "language": "english",
        "activity": "transcribe",
        "top_p": 0.9
    }
    
    # Invoke the SageMaker endpoint operating Whisper
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps(payload)
    )
    
    # Parse the transcription response
    response_body = json.masses(response['Body'].learn().decode('utf-8'))
    transcription_text = response_body['text']
    
    return transcription_text

We use Amazon Bedrock to generate concise summaries from the transcriptions:

def generate_summary(transcription):
    # Format the immediate with the transcription
    immediate = f"{transcription}nnGive me the abstract, audio system, key discussions, and motion objects with homeowners"
    
    # Name Bedrock for summarization
    response = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
        physique=json.dumps({
            "immediate": immediate,
            "max_tokens_to_sample": 4096,
            "temperature": 0.7,
            "top_p": 0.9,
        })
    )
    
    # Extract and return the abstract
    consequence = json.masses(response.get('physique').learn())
    return consequence.get('completion')

A vital element of our answer is the automated redaction of PII. We carried out this utilizing Amazon Bedrock Guardrails to assist compliance with privateness laws:

def apply_guardrail(bedrock_runtime, content material, guardrail_id):
# Format content material in keeping with API necessities
formatted_content = [{"text": {"text": content}}]

# Name the guardrail API
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
supply="OUTPUT",  # Utilizing OUTPUT parameter for correct move
content material=formatted_content
)

# Extract redacted textual content from response
if 'motion' in response and response['action'] == 'GUARDRAIL_INTERVENED':
if len(response['outputs']) > 0:
output = response['outputs'][0]
if 'textual content' in output and isinstance(output['text'], str):
return output['text']

# Return unique content material if redaction fails
return content material

When PII is detected, it’s changed with sort indicators (for instance, {PHONE} or {EMAIL}), ensuring that summaries stay informative whereas defending delicate information.

To handle the advanced processing pipeline, we use Step Features to orchestrate the Lambda features:

{
"Remark": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Sort": "Activity",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Subsequent": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Sort": "Activity",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Subsequent": "GenerateSummary"
},
"GenerateSummary": {
"Sort": "Activity",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"Finish": true
}
}
}

This workflow makes positive every step completes efficiently earlier than continuing to the subsequent, with automated error dealing with and retry logic inbuilt.

Check the answer

After you may have efficiently accomplished the deployment, you should utilize the CloudFront URL to check the answer performance.

Safety issues

Safety is a vital side of this answer, and we’ve carried out a number of greatest practices to assist information safety and compliance:

Delicate information redaction – Robotically redact PII to guard person privateness.
High-quality-Grained IAM Permissions – Apply the precept of least privilege throughout AWS companies and assets.
Amazon S3 entry controls – Use strict bucket insurance policies to restrict entry to approved customers and roles.
API safety – Safe API endpoints utilizing Amazon Cognito for person authentication (non-obligatory however beneficial).
CloudFront safety – Implement HTTPS and apply trendy TLS protocols to facilitate safe content material supply.
Amazon Bedrock information safety – Amazon Bedrock (together with Amazon Bedrock Market) protects buyer information and doesn’t ship information to suppliers or prepare utilizing buyer information. This makes positive your proprietary info stays safe when utilizing AI capabilities.

Clear up

To stop pointless costs, ensure to delete the assets provisioned for this answer if you’re completed:

Delete the Amazon Bedrock guardrail:
1. On the Amazon Bedrock console, within the navigation menu, select Guardrails.
2. Select your guardrail, then select Delete.
Delete the Whisper Massive V3 Turbo mannequin deployed by means of the Amazon Bedrock Market:
1. On the Amazon Bedrock console, select Market deployments within the navigation pane.
2. Within the Managed deployments part, choose the deployed endpoint and select Delete.
Delete the AWS CDK stack by operating the command cdk destroy, which deletes the AWS infrastructure.

Conclusion

This serverless audio summarization answer demonstrates the advantages of mixing AWS companies to create a classy, safe, and scalable utility. By utilizing Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content material supply, we’ve constructed an answer that may deal with giant volumes of audio content material effectively whereas serving to you align with safety greatest practices.

The automated PII redaction function helps compliance with privateness laws, making this answer well-suited for regulated industries similar to healthcare, finance, and authorized companies the place information safety is paramount. To get began, deploy this structure inside your AWS atmosphere to speed up your audio processing workflows.

In regards to the Authors

Kaiyin Hu is a Senior Options Architect for Strategic Accounts at Amazon Net Companies, with years of expertise throughout enterprises, startups, {and professional} companies. At present, she helps clients construct cloud options and drives GenAI adoption to cloud. Beforehand, Kaiyin labored within the Good House area, helping clients in integrating voice and IoT applied sciences.

Sid Vantair is a Options Architect with AWS masking Strategic accounts. He thrives on resolving advanced technical points to beat buyer hurdles. Exterior of labor, he cherishes spending time along with his household and fostering inquisitiveness in his kids.

Main Menu

What's Hot

A Privateness-First Rival to ChatGPT

Qilin Ransomware Makes use of TPwSav.sys Driver to Bypass EDR Safety Measures

Why I like to recommend this Bluetooth tracker to each iPhone and Android customers over AirTags

Construct a serverless audio summarization answer with Amazon Bedrock and Whisper

Apple Workshop on Human-Centered Machine Studying 2024

Mistral-Small-3.2-24B-Instruct-2506 is now accessible on Amazon Bedrock Market and Amazon SageMaker JumpStart

A Deep Dive into Picture Embeddings and Vector Search with BigQuery on Google Cloud

A Privateness-First Rival to ChatGPT

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

A Privateness-First Rival to ChatGPT

Qilin Ransomware Makes use of TPwSav.sys Driver to Bypass EDR Safety Measures

Why I like to recommend this Bluetooth tracker to each iPhone and Android customers over AirTags

How Octopus Power used tradition to achieve the highest

Main Menu

Subscribe to Updates

What's Hot

Construct a serverless audio summarization answer with Amazon Bedrock and Whisper

Answer overview

Conditions

Create a guardrail within the Amazon Bedrock console

Deploy the Whisper mannequin

Deploy the answer infrastructure

Implementation deep dive

Check the answer

Safety issues

Clear up

Conclusion

In regards to the Authors

Related Posts