Automate the creation of handout notes utilizing Amazon Bedrock Information Automation

Organizations throughout varied sectors face important challenges when changing assembly recordings or recorded displays into structured documentation. The method of making handouts from displays requires numerous guide effort, comparable to reviewing recordings to establish slide transitions, transcribing spoken content material, capturing and organizing screenshots, synchronizing visible parts with speaker notes, and formatting content material. These challenges affect productiveness and scalability, particularly when coping with a number of presentation recordings, convention classes, coaching supplies, and academic content material.

On this put up, we present how one can construct an automatic, serverless resolution to rework webinar recordings into complete handouts utilizing Amazon Bedrock Information Automation for video evaluation. We stroll you thru the implementation of Amazon Bedrock Information Automation to transcribe and detect slide adjustments, in addition to the usage of Amazon Bedrock basis fashions (FMs) for transcription refinement, mixed with customized AWS Lambda capabilities orchestrated by AWS Step Capabilities. By detailed implementation particulars, architectural patterns, and code, you’ll learn to construct a workflow that automates the handout creation course of.

Amazon Bedrock Information Automation

Amazon Bedrock Information Automation makes use of generative AI to automate the transformation of multimodal knowledge (comparable to photographs, movies and extra) right into a customizable structured format. Examples of structured codecs embrace summaries of scenes in a video, unsafe or express content material in textual content and pictures, or organized content material based mostly on ads or manufacturers. The answer introduced on this put up makes use of Amazon Bedrock Information Automation to extract audio segments and completely different photographs in movies.

Resolution overview

Our resolution makes use of a serverless structure orchestrated by Step Capabilities to course of presentation recordings into complete handouts. The workflow consists of the next steps:

The workflow begins when a video is uploaded to Amazon Easy Storage Service (Amazon S3), which triggers an occasion notification by means of Amazon EventBridge guidelines that initiates our video processing workflow in Step Capabilities.
After the workflow is triggered, Amazon Bedrock Information Automation initiates a video transformation job to establish completely different photographs within the video. In our case, that is represented by a change of slides. The workflow strikes right into a ready state, and checks for the transformation job progress. If the job is in progress, the workflow returns to the ready state. When the job is full, the workflow continues, and we now have extracted each visible photographs and spoken content material.
These visible photographs and spoken content material feed right into a synchronization step. On this Lambda perform, we use the output of the Amazon Bedrock Information Automation job to match the spoken content material to the correlating photographs based mostly on the matching of timestamps.
After perform has matched the spoken content material to the visible photographs, the workflow strikes right into a parallel state. One of many steps of this state is the technology of screenshots. We use a FFmpeg-enabled Lambda perform to create photographs for every recognized video shot.
The opposite step of the parallel state is the refinement of our transformations. Amazon Bedrock processes and improves every uncooked transcription part by means of a Map state. This helps us take away speech disfluencies and enhance the sentence construction.
Lastly, after the screenshots and refined transcript are created, the workflow makes use of a Lambda perform to create handouts. We use the Python-PPTX library, which generates the ultimate presentation with synchronized content material. These closing handouts are saved in Amazon S3 for distribution.

The next diagram illustrates this workflow.

If you wish to check out this resolution, we have now created an AWS Cloud Improvement Package (AWS CDK) stack out there within the accompanying GitHub repo that you would be able to deploy in your account. It deploys the Step Capabilities state machine to orchestrate the creation of handout notes from the presentation video recording. It additionally gives you with a pattern video to check out the outcomes.

To deploy and take a look at the answer in your individual account, observe the directions within the GitHub repository’s README file. The next sections describe in additional element the technical implementation particulars of this resolution.

Video add and preliminary processing

The workflow begins with Amazon S3, which serves because the entry level for our video processing pipeline. When a video is uploaded to a devoted S3 bucket, it triggers an occasion notification that, by means of EventBridge guidelines, initiates our Step Capabilities workflow.

Shot detection and transcription utilizing Amazon Bedrock Information Automation

This step makes use of Amazon Bedrock Information Automation to detect slide transitions and create video transcriptions. To combine this as a part of the workflow, you need to create an Amazon Bedrock Information Automation challenge. A challenge is a grouping of output configurations. Every challenge can include customary output configurations in addition to customized output blueprints for paperwork, photographs, video, and audio. The challenge has already been created as a part of the AWS CDK stack. After you arrange your challenge, you’ll be able to course of content material utilizing the InvokeDataAutomationAsync API. In our resolution, we use the Step Capabilities service integration to execute this API name and begin the asynchronous processing job. A job ID is returned for monitoring the method.

The workflow should now verify the standing of the processing job earlier than persevering with with the handout creation course of. That is executed by polling Amazon Bedrock Information Automation for the job standing utilizing the GetDataAutomationStatus API frequently. Utilizing a mix of the Step Capabilities Wait and Selection states, we are able to ask the workflow to ballot the API on a set interval. This not solely provides you the flexibility to customise the interval relying in your wants, but it surely additionally helps you management the workflow prices, as a result of each state transition is billed in Customary workflows, which this resolution makes use of.

When the GetDataAutomationStatus API output reveals as SUCCESS, the loop exits and the workflow continues to the following step, which can match transcripts to the visible photographs.

Matching audio segments with corresponding photographs

To create complete handouts, you need to set up a mapping between the visible photographs and their corresponding audio segments. This mapping is essential to ensure the ultimate handouts precisely characterize each the visible content material and the spoken narrative of the presentation.

A shot represents a sequence of interrelated consecutive frames captured in the course of the presentation, sometimes indicating a definite visible state. In our presentation context, a shot corresponds to both a brand new slide or a major slide animation that provides or modifies content material.

An audio phase is a selected portion of an audio recording that accommodates uninterrupted spoken language, with minimal pauses or breaks. This phase captures a pure circulation of speech. The Amazon Bedrock Information Automation output gives an audio_segments array, with every phase containing exact timing info comparable to the beginning and finish time of every phase. This enables for correct synchronization with the visible photographs.

The synchronization between photographs and audio segments is important for creating correct handouts that protect the presentation’s narrative circulation. To realize this, we implement a Lambda perform that manages the matching course of in three steps:

The perform retrieves the processing outcomes from Amazon S3, which accommodates each the visible photographs and audio segments.
It creates structured JSON arrays from these parts, getting ready them for the matching algorithm.
It executes an identical algorithm that analyzes the completely different timestamps of the audio segments and the photographs, and matches them based mostly on these timestamps. This algorithm additionally considers timestamp overlaps between photographs and audio segments.

For every shot, the perform examines audio segments and identifies these whose timestamps overlap with the shot’s length, ensuring the related spoken content material is related to its corresponding slide within the closing handouts. The perform returns the matched outcomes on to the Step Capabilities workflow, the place it’ll function enter for the following step, the place Amazon Bedrock will refine the transcribed content material and the place we’ll create screenshots in parallel.

Screenshot technology

After you get the timestamps of every shot and related audio phase, you’ll be able to seize the slides of the presentation to create complete handouts. Every detected shot from Amazon Bedrock Information Automation represents a definite visible state within the presentation—sometimes a brand new slide or important content material change. By producing screenshots at these exact moments, we ensure that our handouts precisely characterize the visible circulation of the unique presentation.

That is executed with a Lambda perform utilizing the ffmpeg-python library. This library acts as a Python binding for the FFmpeg media framework, so you’ll be able to run FFmpeg terminal instructions utilizing Python strategies. In our case, we are able to extract frames from the video at particular timestamps recognized by Amazon Bedrock Information Automation. The screenshots are saved in an S3 bucket for use in creating the handouts, as described within the following code. To make use of ffmpeg-python in Lambda, we created a Lambda ZIP deployment containing the required dependencies to run the code. Directions on how one can create the ZIP file will be present in our GitHub repository.

The next code reveals how a screenshot is taken utilizing ffmpeg-python. You may view the total Lambda code on GitHub.

## Taking a screenshot at a selected timestamp 
ffmpeg.enter(video_path, ss=timestamp).output(screenshot_path, vframes=1).run()

Transcript refinement with Amazon Bedrock

In parallel with the screenshot technology, we refine the transcript utilizing a big language mannequin (LLM). We do that to enhance the standard of the transcript and filter out errors and speech disfluencies. This course of makes use of an Amazon Bedrock mannequin to reinforce the standard of the matched transcription segments whereas sustaining content material accuracy. We use a Lambda perform that integrates with Amazon Bedrock by means of the Python Boto3 consumer, utilizing a immediate to information the mannequin’s refinement course of. The perform can then course of every transcript phase, instructing the mannequin to do the next:

Repair typos and grammatical errors
Take away speech disfluencies (comparable to “uh” and “um”)
Keep the unique that means and technical accuracy
Protect the context of the presentation

In our resolution, we used the next immediate with three instance inputs and outputs:

immediate=""'That is the results of a transcription. 
I would like you to take a look at this audio phase and repair the typos and errors current. 
Be at liberty to make use of the context of the remainder of the transcript to refine (however do not miss any data). 
Pass over components the place the speaker misspoke. 
Ensure that to additionally take away works like "uh" or "um". 
Solely make change to the information or sentence construction when there are errors. 
Solely give again the refined transcript as output, do not add anything or any context or title. 
If there are not any typos or errors, return the unique object enter. 
Don't clarify why you might have or haven't made any adjustments; I simply need the JSON object. 

These are examples: 
Enter:  
Output: 

Enter:  
Output: 

Enter:  
Output: 

Right here is the thing: ''' + textual content

The next is an instance enter and output:

Enter: Yeah. Um, so let's speak somewhat bit about recovering from a ransomware assault, proper?

Output: Sure, let's speak somewhat bit about recovering from a ransomware assault.

To optimize processing pace whereas adhering to the utmost token limits of the Amazon Bedrock InvokeModel API, we use the Step Capabilities Map state. This permits parallel processing of a number of transcriptions, every akin to a separate video phase. As a result of these transcriptions should be dealt with individually, the Map state effectively distributes the workload. Moreover, it reduces operational overhead by managing integration—taking an array as enter, passing every aspect to the Lambda perform, and mechanically reconstructing the array upon completion.The Map state returns the refined transcript on to the Step Capabilities workflow, sustaining the construction of the matched segments whereas offering cleaner, extra skilled textual content content material for the ultimate handout technology.

Handout technology

The ultimate step in our workflow includes creating the handouts utilizing the python-pptx library. This step combines the refined transcripts with the generated screenshots to create a complete presentation doc.

The Lambda perform processes the matched segments sequentially, creating a brand new slide for every screenshot whereas including the corresponding refined transcript as speaker notes. The implementation makes use of a customized Lambda layer containing the python-pptx package deal. To allow this performance in Lambda, we created a customized layer utilizing Docker. Through the use of Docker to create our layer, we ensure that the dependencies are compiled in an atmosphere that matches the Lambda runtime. You could find the directions to create this layer and the layer itself in our GitHub repository.

The Lambda perform implementation makes use of python-pptx to create structured displays:

import boto3
from pptx import Presentation
from pptx.util import Inches
import os
import json

def lambda_handler(occasion, context):
    # Create new presentation with particular dimensions
    prs = Presentation()
    prs.slide_width = int(12192000)  # Customary presentation width
    prs.slide_height = int(6858000)  # Customary presentation top
    
    # Course of every phase
    for i in vary(num_images):
        # Add new slide
        slide = prs.slides.add_slide(prs.slide_layouts[5])
        
        # Add screenshot as full-slide picture
        slide.shapes.add_picture(image_path, 0, 0, width=slide_width)
        
        # Add transcript as speaker notes
        notes_slide = slide.notes_slide
        transcription_text = transcription_segments[i].get('transcript', '')
        notes_slide.notes_text_frame.textual content = transcription_text
    
    # Save presentation
    pptx_path = os.path.be part of(tmp_dir, "lecture_notes.pptx")
    prs.save(pptx_path)

The perform processes segments sequentially, making a presentation that mixes visible photographs with their corresponding audio segments, leading to handouts prepared for distribution.

The next screenshot reveals an instance of a generated slide with notes. The total deck has been added as a file within the GitHub repository.

Conclusion

On this put up, we demonstrated how one can construct a serverless resolution that automates the creation of handout notes from recorded slide displays. By combining Amazon Bedrock Information Automation with customized Lambda capabilities, we’ve created a scalable pipeline that considerably reduces the guide effort required in creating handout supplies. Our resolution addresses a number of key challenges in content material creation:

Automated detection of slide transitions, content material adjustments, and correct transcription of spoken content material utilizing the video modality capabilities of Amazon Bedrock Information Automation
Clever refinement of transcribed textual content utilizing Amazon Bedrock
Synchronized visible and textual content material with a customized matching algorithm
Handout technology utilizing the ffmpeg-python and python-pptx libraries in Lambda

The serverless structure, orchestrated by Step Capabilities, gives dependable execution whereas sustaining cost-efficiency. Through the use of Python packages for FFmpeg and a Lambda layer for python-pptx, we’ve overcome technical limitations and created a sturdy resolution that may deal with varied presentation codecs and lengths. This resolution will be prolonged and customised for various use instances, from academic establishments to company coaching applications. Sure steps such because the transcript refinement may also be improved, as an example by including translation capabilities to account for numerous audiences.

To be taught extra about Amazon Bedrock Information Automation, discuss with the next assets:

Concerning the authors

Laura Verghote is the GenAI Lead for PSI Europe at Amazon Internet Providers (AWS), driving Generative AI adoption throughout public sector organizations. She companions with clients all through Europe to speed up their GenAI initiatives by means of technical experience and strategic planning, bridging advanced necessities with modern AI options.

Elie Elmalem is a options architect at Amazon Internet Providers (AWS) and helps Schooling clients throughout the UK and EMEA. He works with clients to successfully use AWS providers, offering architectural finest practices, recommendation, and steering. Exterior of labor, he enjoys spending time with household and buddies and loves watching his favourite soccer crew play.

Main Menu

What's Hot

SafePay Ransomware Strikes 260+ Victims Throughout A number of Nations

Tesla Discovered Partly Liable in 2019 Autopilot Demise

Guarantee Integrity of Pharmaceutical Merchandise with Robotic Palletizing

Automate the creation of handout notes utilizing Amazon Bedrock Information Automation

Have an effect on Fashions Have Weak Generalizability to Atypical Speech

Introducing AWS Batch Assist for Amazon SageMaker Coaching jobs

Greatest Net Scraping Corporations in 2025

SafePay Ransomware Strikes 260+ Victims Throughout A number of Nations

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

SafePay Ransomware Strikes 260+ Victims Throughout A number of Nations

Tesla Discovered Partly Liable in 2019 Autopilot Demise

Guarantee Integrity of Pharmaceutical Merchandise with Robotic Palletizing

Cybercrooks faked Microsoft OAuth apps for MFA phishing

Main Menu

Subscribe to Updates

What's Hot

Automate the creation of handout notes utilizing Amazon Bedrock Information Automation

Amazon Bedrock Information Automation

Resolution overview

Video add and preliminary processing

Shot detection and transcription utilizing Amazon Bedrock Information Automation

Matching audio segments with corresponding photographs

Screenshot technology

Transcript refinement with Amazon Bedrock

Handout technology

Conclusion

Concerning the authors

Related Posts