Stream ingest knowledge from Kafka to Amazon Bedrock Data Bases utilizing customized connectors

Retrieval Augmented Era (RAG) enhances AI responses by combining the generative AI mannequin’s capabilities with info from exterior knowledge sources, quite than relying solely on the mannequin’s built-in information. On this put up, we showcase the customized knowledge connector functionality in Amazon Bedrock Data Bases that makes it easy to construct RAG workflows with customized enter knowledge. By way of this functionality, Amazon Bedrock Data Bases helps the ingestion of streaming knowledge, which implies builders can add, replace, or delete knowledge of their information base by direct API calls.

Consider the examples of clickstream knowledge, bank card swipes, Web of Issues (IoT) sensor knowledge, log evaluation and commodity costs—the place each present knowledge and historic traits are vital to make a discovered determination. Beforehand, to feed such essential knowledge inputs, you needed to first stage it in a supported knowledge supply after which both provoke or schedule a knowledge sync job. Primarily based on the standard and amount of the info, the time to finish this course of assorted. With customized knowledge connectors, you’ll be able to rapidly ingest particular paperwork from customized knowledge sources with out requiring a full sync and ingest streaming knowledge with out the necessity for middleman storage. By avoiding time-consuming full syncs and storage steps, you acquire sooner entry to knowledge, lowered latency, and improved utility efficiency.

Nonetheless, with streaming ingestion utilizing customized connectors, Amazon Bedrock Data Bases processes such streaming knowledge with out utilizing an middleman knowledge supply, making it obtainable virtually instantly. This characteristic chunks and converts enter knowledge into embeddings utilizing your chosen Amazon Bedrock mannequin and shops every part within the backend vector database. This automation applies to each newly created and present databases, streamlining your workflow so you’ll be able to concentrate on constructing AI purposes with out worrying about orchestrating knowledge chunking, embeddings era, or vector retailer provisioning and indexing. Moreover, this characteristic offers the power to ingest particular paperwork from customized knowledge sources, all whereas lowering latency and assuaging operational prices for middleman storage.

Amazon Bedrock

Amazon Bedrock is a totally managed service that gives a selection of high-performing basis fashions (FMs) from main AI corporations similar to Anthropic, Cohere, Meta, Stability AI, and Amazon by a single API, together with a broad set of capabilities it is advisable to construct generative AI purposes with safety, privateness, and accountable AI. Utilizing Amazon Bedrock, you’ll be able to experiment with and consider high FMs in your use case, privately customise them together with your knowledge utilizing methods similar to fine-tuning and RAG, and construct brokers that execute duties utilizing your enterprise programs and knowledge sources.

Amazon Bedrock Data Bases

Amazon Bedrock Data Bases permits organizations to construct absolutely managed RAG pipelines by augmenting contextual info from personal knowledge sources to ship extra related, correct, and customised responses. With Amazon Bedrock Data Bases, you’ll be able to construct purposes which are enriched by the context that’s acquired from querying a information base. It allows a sooner time to product launch by abstracting from the heavy lifting of constructing pipelines and offering you an out-of-the-box RAG answer, thus lowering the construct time in your utility.

Amazon Bedrock Data Bases customized connector

Amazon Bedrock Data Bases helps customized connectors and the ingestion of streaming knowledge, which implies you’ll be able to add, replace, or delete knowledge in your information base by direct API calls.

Answer overview: Construct a generative AI inventory worth analyzer with RAG

For this put up, we implement a RAG structure with Amazon Bedrock Data Bases utilizing a customized connector and subjects constructed with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a consumer who could also be to know inventory worth traits. Amazon MSK is a streaming knowledge service that manages Apache Kafka infrastructure and operations, making it easy to run Apache Kafka purposes on Amazon Internet Companies (AWS). The answer allows real-time evaluation of buyer suggestions by vector embeddings and massive language fashions (LLMs).

The next structure diagram has two parts:

Preprocessing streaming knowledge workflow famous in letters on the highest of the diagram:

Mimicking streaming enter, add a .csv file with inventory worth knowledge into MSK matter
Routinely set off the buyer AWS Lambda perform
Ingest consumed knowledge right into a information base
Data base internally utilizing embeddings mannequin transforms into vector index
Data base internally storing vector index into the vector database

Runtime execution throughout consumer queries famous in numerals on the backside of the diagram:

Customers question on inventory costs
Basis mannequin makes use of the information base to seek for a solution
The information base returns with related paperwork
Person answered with related reply

Implementation design

The implementation follows these high-level steps:

Knowledge supply setup – Configure an MSK matter that streams enter inventory costs
Amazon Bedrock Data Bases setup – Create a information base in Amazon Bedrock utilizing the fast create a brand new vector retailer choice, which mechanically provisions and units up the vector retailer
Knowledge consumption and ingestion – As and when knowledge lands within the MSK matter, set off a Lambda perform that extracts inventory indices, costs, and timestamp info and feeds into the customized connector for Amazon Bedrock Data Bases
Check the information base – Consider buyer suggestions evaluation utilizing the information base

Answer walkthrough

To construct a generative AI inventory evaluation instrument with Amazon Bedrock Data Bases customized connector, use directions within the following sections.

Configure the structure

To do that structure, deploy the AWS CloudFormation template from this GitHub repository in your AWS account. This template deploys the next parts:

Useful digital personal clouds (VPCs), subnets, safety teams and AWS Id and Entry Administration (IAM) roles
An MSK cluster internet hosting Apache Kafka enter matter
A Lambda perform to devour Apache Kafka matter knowledge
An Amazon SageMaker Studio pocket book for granular setup and enablement

Create an Apache Kafka matter

Within the precreated MSK cluster, the required brokers are deployed prepared to be used. The following step is to make use of a SageMaker Studio terminal occasion to connect with the MSK cluster and create the take a look at stream matter. On this step, you comply with the detailed directions which are talked about at Create a subject within the Amazon MSK cluster. The next are the final steps concerned:

Obtain and set up the newest Apache Kafka shopper
Hook up with the MSK cluster dealer occasion
Create the take a look at stream matter on the dealer occasion

Create a information base in Amazon Bedrock

To create a information base in Amazon Bedrock, comply with these steps:

On the Amazon Bedrock console, within the left navigation web page beneath Builder instruments, select Data Bases.

To provoke information base creation, on the Create dropdown menu, select Data Base with vector retailer, as proven within the following screenshot.

Within the Present Data Base particulars pane, enter BedrockStreamIngestKnowledgeBase because the Data Base identify.
Underneath IAM permissions, select the default choice, Create and use a brand new service function, and (non-compulsory) present a Service function identify, as proven within the following screenshot.

On the Select knowledge supply pane, choose Customized as the info supply the place your dataset is saved
Select Subsequent, as proven within the following screenshot

On the Configure knowledge supply pane, enter BedrockStreamIngestKBCustomDS because the Knowledge supply identify.
Underneath Parsing technique, choose Amazon Bedrock default parser and for Chunking technique, select Default chunking. Select Subsequent, as proven within the following screenshot.

On the Choose embeddings mannequin and configure vector retailer pane, for Embeddings mannequin, select Titan Textual content Embeddings v2. For Embeddings kind, select Floating-point vector embeddings. For Vector dimensions, choose 1024, as proven within the following screenshot. Ensure you have requested and acquired entry to the chosen FM in Amazon Bedrock. To be taught extra, check with Add or take away entry to Amazon Bedrock basis fashions.

On the Vector database pane, choose Fast create a brand new vector retailer and select the brand new Amazon OpenSearch Serverless choice because the vector retailer.

On the following display screen, overview your alternatives. To finalize the setup, select Create.
Inside a couple of minutes, the console will show your newly created information base.

Configure AWS Lambda Apache Kafka shopper

Now, utilizing API calls, you configure the buyer Lambda perform so it will get triggered as quickly because the enter Apache Kafka matter receives knowledge.

Configure the manually created Amazon Bedrock Data Base ID and its customized Knowledge Supply ID as atmosphere variables throughout the Lambda perform. Whenever you use the pattern pocket book, the referred perform names and IDs will probably be crammed in mechanically.

response = lambda_client.update_function_configuration(
        FunctionName=,
        Atmosphere={
            'Variables': {
                'KBID': ,
                'DSID': 
            }
        }
    )

When it’s accomplished, you tie the Lambda shopper perform to hear for occasions within the supply Apache Kafka matter:

response = lambda_client.create_event_source_mapping(
        EventSourceArn=,
        FunctionName=,
        StartingPosition='LATEST',
        Enabled=True,
        Matters=['streamtopic']
    )

Overview AWS Lambda Apache Kafka shopper

The Apache Kafka shopper Lambda perform reads knowledge from the Apache Kafka matter, decodes it, extracts inventory worth info, and ingests it into the Amazon Bedrock information base utilizing the customized connector.

Extract the information base ID and the info supply ID:

kb_id = os.environ['KBID']
ds_id = os.environ['DSID']

Outline a Python perform to decode enter occasions:

def decode_payload(event_data):
    agg_data_bytes = base64.b64decode(event_data)
    decoded_data = agg_data_bytes.decode(encoding="utf-8") 
    event_payload = json.masses(decoded_data)
    return event_payload

Decode and parse required knowledge on the enter occasion acquired from the Apache Kafka matter. Utilizing them, create a payload to be ingested into the information base:

data = occasion['records']['streamtopic-0']
for rec in data:
        # Every document has separate eventID, and many others.
        event_payload = decode_payload(rec['value'])
        ticker = event_payload['ticker']
        worth = event_payload['price']
        timestamp = event_payload['timestamp']
        myuuid = uuid.uuid4()
        payload_ts = datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
        payload_string = "At " + payload_ts + " the worth of " + ticker + " is " + str(worth) + "."

Ingest the payload into Amazon Bedrock Data Bases utilizing the customized connector:

response = bedrock_agent_client.ingest_knowledge_base_documents(
                knowledgeBaseId = kb_id,
                dataSourceId = ds_id,
                paperwork= [
                    {
                        'content': {
                            'custom' : {
                                'customDocumentIdentifier': {
                                    'id' : str(myuuid)
                                },
                                'inlineContent' : {
                                    'textContent' : {
                                        'data' : payload_string
                                    },
                                    'type' : 'TEXT'
                                },
                                'sourceType' : 'IN_LINE'
                            },
                            'dataSourceType' : 'CUSTOM'
                        }
                    }
                ]
            )

Testing

Now that the required setup is completed, you set off the workflow by ingesting take a look at knowledge into your Apache Kafka matter hosted with the MSK cluster. For finest outcomes, repeat this part by altering the .csv enter file to indicate inventory worth enhance or lower.

Put together the take a look at knowledge. In my case, I had the next knowledge enter as a .csv file with a header.

ticker	worth
OOOO	$44.50
ZVZZT	$3,413.23
ZNTRX	$22.34
ZNRXX	$208.76
NTEST	$0.45
ZBZX	$36.23
ZEXIT	$942.34
ZIEXT	$870.23
ZTEST	$23.75
ZVV	$2,802.86
ZXIET	$63.00
ZAZZT	$18.86
ZBZZT	$998.26
ZCZZT	$72.34
ZVZZC	$90.32
ZWZZT	$698.24
ZXZZT	$932.32

Outline a Python perform to place knowledge to the subject. Use pykafka shopper to ingest knowledge:

def put_to_topic(kafka_host, topic_name, ticker, quantity, timestamp):    
    shopper = KafkaClient(hosts = kafka_host)
    matter = shopper.subjects[topic_name]
    payload = {
        'ticker': ticker,
        'worth': quantity,
        'timestamp': timestamp
    }
    ret_status = True
    knowledge = json.dumps(payload)
    encoded_message = knowledge.encode("utf-8")
    print(f'Sending ticker knowledge: {ticker}...')
    with matter.get_sync_producer() as producer:
        consequence=producer.produce(encoded_message)        
    return ret_status

Learn the .csv file and push the data to the subject:

df = pd.read_csv('TestData.csv')
start_test_time = time.time() 
print(datetime.utcfromtimestamp(start_test_time).strftime('%Y-%m-%d %H:%M:%S'))
df = df.reset_index()
for index, row in df.iterrows():
    put_to_topic(BootstrapBrokerString, KafkaTopic, row['ticker'], row['price'], time.time())
end_test_time = time.time()
print(datetime.utcfromtimestamp(end_test_time).strftime('%Y-%m-%d %H:%M:%S'))

Verification

If the info ingestion and subsequent processing is profitable, navigate to the Amazon Bedrock Data Bases knowledge supply web page to verify the uploaded info.

Querying the information base

Inside the Amazon Bedrock Data Bases console, you may have entry to question the ingested knowledge instantly, as proven within the following screenshot.

To do this, choose an Amazon Bedrock FM that you’ve entry to. In my case, I selected Amazon Nova Lite 1.0, as proven within the following screenshot.

When it’s accomplished, the query, “How is ZVZZT trending?”, yields the outcomes based mostly on the ingested knowledge. Word how Amazon Bedrock Data Bases exhibits the way it derived the reply, even pointing to the granular knowledge aspect from its supply.

Cleanup

To ensure you’re not paying for sources, delete and clear up the sources created.

Delete the Amazon Bedrock information base.
Delete the mechanically created Amazon OpenSearch Serverless cluster.
Delete the mechanically created Amazon Elastic File System (Amazon EFS) shares backing the SageMaker Studio atmosphere.
Delete the mechanically created safety teams related to the Amazon EFS share. You would possibly have to take away the inbound and outbound guidelines earlier than they are often deleted.
Delete the mechanically created elastic community interfaces connected to the Amazon MSK safety group for Lambda site visitors.
Delete the mechanically created Amazon Bedrock Data Bases execution IAM function.
Cease the kernel cases with Amazon SageMaker Studio.
Delete the CloudFormation stack.

Conclusion

On this put up, we confirmed you ways Amazon Bedrock Data Bases helps customized connectors and the ingestion of streaming knowledge, by which builders can add, replace, or delete knowledge of their information base by direct API calls. Amazon Bedrock Data Bases gives absolutely managed, end-to-end RAG workflows to create extremely correct, low-latency, safe, and customized generative AI purposes by incorporating contextual info out of your firm’s knowledge sources. With this functionality, you’ll be able to rapidly ingest particular paperwork from customized knowledge sources with out requiring a full sync, and ingest streaming knowledge with out the necessity for middleman storage.

Ship suggestions to AWS re:Submit for Amazon Bedrock or by your ordinary AWS contacts, and interact with the generative AI builder group at group.aws.

In regards to the Creator

Prabhakar Chandrasekaran is a Senior Technical Account Supervisor with AWS Enterprise Help. Prabhakar enjoys serving to clients construct cutting-edge AI/ML options on the cloud. He additionally works with enterprise clients offering proactive steering and operational help, serving to them enhance the worth of their options when utilizing AWS. Prabhakar holds eight AWS and 7 different skilled certifications. With over 22 years {of professional} expertise, Prabhakar was a knowledge engineer and a program chief within the monetary providers area previous to becoming a member of AWS.

Main Menu

What's Hot

Key Capabilities and Pricing Defined

Why Monitoring Issues In 2026

Greatest Android Smartwatch for 2026

Stream ingest knowledge from Kafka to Amazon Bedrock Data Bases utilizing customized connectors

Constructing Good Machine Studying in Low-Useful resource Settings

Steve Yegge Desires You to Cease Taking a look at Your Code – O’Reilly

LiTo: Floor Gentle Area Tokenization

Key Capabilities and Pricing Defined

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Key Capabilities and Pricing Defined

Why Monitoring Issues In 2026

Greatest Android Smartwatch for 2026

Ought to You Be Susceptible At Work?

Main Menu

Subscribe to Updates

What's Hot

Stream ingest knowledge from Kafka to Amazon Bedrock Data Bases utilizing customized connectors

Amazon Bedrock

Amazon Bedrock Data Bases

Amazon Bedrock Data Bases customized connector

Answer overview: Construct a generative AI inventory worth analyzer with RAG

Implementation design

Answer walkthrough

Configure the structure

Create an Apache Kafka matter

Create a information base in Amazon Bedrock

Configure AWS Lambda Apache Kafka shopper

Overview AWS Lambda Apache Kafka shopper

Testing

Verification

Querying the information base

Cleanup

Conclusion

In regards to the Creator

Related Posts