Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Seth Godin on Management, Vulnerability, and Making an Influence within the New World Of Work

    March 14, 2026

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026

    AMC Robotics and HIVE Announce Collaboration to Advance AI-Pushed Robotics Compute Infrastructure

    March 14, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Speed up Enterprise AI Improvement utilizing Weights & Biases and Amazon Bedrock AgentCore
    Machine Learning & Research

    Speed up Enterprise AI Improvement utilizing Weights & Biases and Amazon Bedrock AgentCore

    Oliver ChambersBy Oliver ChambersJanuary 3, 2026No Comments11 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Speed up Enterprise AI Improvement utilizing Weights & Biases and Amazon Bedrock AgentCore
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    This put up is co-written by Thomas Capelle and Ray Strickland from Weights & Biases (W&B).

    Generative synthetic intelligence (AI) adoption is accelerating throughout enterprises, evolving from easy basis mannequin interactions to classy agentic workflows. As organizations transition from proof-of-concepts to manufacturing deployments, they require sturdy instruments for growth, analysis, and monitoring of AI functions at scale.

    On this put up, we show the right way to use Basis Fashions (FMs) from Amazon Bedrock and the newly launched Amazon Bedrock AgentCore alongside W&B Weave to assist construct, consider, and monitor enterprise AI options. We cowl the whole growth lifecycle from monitoring particular person FM calls to monitoring advanced agent workflows in manufacturing.

    Overview of W&B Weave

    Weights & Biases (W&B) is an AI developer system that gives complete instruments for coaching fashions, fine-tuning, and leveraging basis fashions for enterprises of all sizes throughout varied industries.

    W&B Weave presents a unified suite of developer instruments to assist each stage of your agentic AI workflows. It permits:

    • Tracing & monitoring: Observe giant language mannequin (LLM) calls and utility logic to debug and analyze manufacturing methods.
    • Systematic iteration: Refine and iterate on prompts, datasets and fashions.
    • Experimentation: Experiment with completely different fashions and prompts within the LLM Playground.
    • Analysis: Use customized or pre-built scorers alongside our comparability instruments to systematically assess and improve utility efficiency. Acquire person and skilled suggestions for real-life testing and analysis.
    • Guardrails: Assist defend your utility with safeguards for content material moderation, immediate security, and extra. Use customized or third-party guardrails (together with Amazon Bedrock Guardrails) or W&B Weave’s native guardrails.

    W&B Weave might be absolutely managed by Weights & Biases in a multi-tenant or single-tenant setting or might be deployed in a buyer’s Amazon Digital Personal Cloud (VPC) immediately. As well as, W&B Weave’s integration into the W&B Improvement Platform supplies organizations a seamlessly built-in expertise between the mannequin coaching/fine-tuning workflow and the agentic AI workflow.

    To get began, subscribe to the Weights & Biases AI Improvement Platform by way of AWS Market. People and educational groups can subscribe to W&B at no further value.

    Monitoring Amazon Bedrock FMs with W&B Weave SDK

    W&B Weave integrates seamlessly with Amazon Bedrock by way of Python and TypeScript SDKs. After putting in the library and patching your Bedrock consumer, W&B Weave routinely tracks the LLM calls:

    !pip set up weave
    import weave
    import boto3
    import json
    from weave.integrations.bedrock.bedrock_sdk import patch_client
    
    weave.init("my_bedrock_app")
    
    # Create and patch the Bedrock consumer
    consumer = boto3.consumer("bedrock-runtime")
    patch_client(consumer)
    
    # Use the consumer as common
    response = consumer.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
        physique=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": [
                {"role": "user", "content": "What is the capital of France?"}
            ]
        }),
        contentType="utility/json",
        settle for="utility/json"
    )
    response_dict = json.hundreds(response.get('physique').learn())
    print(response_dict["content"][0]["text"])

    This integration routinely variations experiments and tracks configurations, offering full visibility into your Amazon Bedrock functions with out modifying core logic.

    Experimenting with Amazon Bedrock FMs in W&B Weave Playground

    The W&B Weave Playground accelerates immediate engineering with an intuitive interface for testing and evaluating Bedrock fashions. Key options embody:

    • Direct immediate enhancing and message retrying
    • Facet-by-side mannequin comparability
    • Entry from hint views for fast iteration

    To start, add your AWS credentials within the Playground settings, choose your most well-liked Amazon Bedrock FMs, and begin experimenting. The interface permits fast iteration on prompts whereas sustaining full traceability of experiments.

    Evaluating Amazon Bedrock FMs with W&B Weave Evaluations

    W&B Weave Evaluations supplies devoted instruments for evaluating generative AI fashions successfully. By leveraging W&B Weave Evaluations alongside Amazon Bedrock, customers can effectively consider these fashions, analyze outputs, and visualize efficiency throughout key metrics. Customers can use in-built scorers from W&B Weave, third celebration or customized scorers, and human/skilled suggestions as effectively. This mix permits for a deeper understanding of the tradeoffs between fashions, comparable to variations in value, accuracy, pace, and output high quality.

    W&B Weave has a first-class method to monitor evaluations with Mannequin & Analysis lessons. To arrange an analysis job, prospects can:

    • Outline a dataset or checklist of dictionaries with a group of examples to be evaluated
    • Create an inventory of scoring capabilities. Every perform ought to have a model_output and optionally, different inputs out of your examples, and return a dictionary with the scores
    • Outline an Amazon Bedrock mannequin by utilizing Mannequin class
    • Consider this mannequin by calling Analysis

    Right here’s an instance of organising an analysis job:

    import weave
    from weave import Analysis
    import asyncio
    
    # Acquire your examples
    examples = [
        {"question": "What is the capital of France?", "expected": "Paris"},
        {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
        {"question": "What is the square root of 64?", "expected": "8"},
    ]
    
    # Outline any customized scoring perform
    @weave.op()
    def match_score1(anticipated: str, output: dict) -> dict:
        # Right here is the place you'd outline the logic to attain the mannequin output
        return {'match': anticipated == model_output['generated_text']}
    
    @weave.op()
    def function_to_evaluate(query: str):
        # here is the place you'd add your LLM name and return the output
        return  {'generated_text': 'Paris'}
    
    # Rating your examples utilizing scoring capabilities
    analysis = Analysis(
        dataset=examples, scorers=[match_score1]
    )
    
    # Begin monitoring the analysis
    weave.init('intro-example')
    # Run the analysis
    asyncio.run(analysis.consider(function_to_evaluate))

    The analysis dashboard visualizes efficiency metrics, enabling knowledgeable choices about mannequin choice and configuration. For detailed steerage, see our earlier put up on evaluating LLM summarization with Amazon Bedrock and Weave.

    Enhancing Amazon Bedrock AgentCore Observability with W&B Weave

    Amazon Bedrock AgentCore is a whole set of companies for deploying and working extremely succesful brokers extra securely at enterprise scale. It supplies safer runtime environments, workflow execution instruments, and operational controls that work with well-liked frameworks like Strands Brokers, CrewAI, LangGraph, and LlamaIndex, in addition to many LLM fashions – whether or not from Amazon Bedrock or exterior sources.

    AgentCore consists of built-in observability by way of Amazon CloudWatch dashboards that monitor key metrics like token utilization, latency, session length, and error charges. It additionally traces workflow steps, exhibiting which instruments had been invoked and the way the mannequin responded, offering important visibility for debugging and high quality assurance in manufacturing.

    When working with AgentCore and W&B Weave collectively, groups can use AgentCore’s built-in operational monitoring and safety foundations whereas additionally utilizing W&B Weave if it aligns with their current growth workflows. Organizations already invested within the W&B setting could select to include W&B Weave’s visualization instruments alongside AgentCore’s native capabilities. This method offers groups flexibility to make use of the observability resolution that most closely fits their established processes and preferences when growing advanced brokers that chain a number of instruments and reasoning steps.

    There are two predominant approaches so as to add W&B Weave observability to your AgentCore brokers: utilizing the native W&B Weave SDK or integrating by way of OpenTelemetry.

    Native W&B Weave SDK

    The only method is to make use of W&B Weave’s @weave.op decorator to routinely monitor perform calls. Initialize W&B Weave along with your undertaking title and wrap the capabilities you need to monitor:

    import weave
    import os
    
    os.environ["WANDB_API_KEY"] = "your_api_key"
    weave.init("your_project_name")
    
    @weave.op()
    def word_count_op(textual content: str) -> int:
        return len(textual content.break up())
    
    @weave.op()
    def run_agent(agent: Agent, user_message: str) -> Dict[str, Any]:
        end result = agent(user_message)
        return {"message": end result.message, "mannequin": agent.mannequin.config["model_id"]}

    Since AgentCore runs as a docker container, add W&B weave to your dependencies (for instance, uv add weave) to incorporate it in your container picture.

    OpenTelemetry Integration

    For groups already utilizing OpenTelemetry or wanting vendor-neutral instrumentation, W&B Weave helps OTLP (OpenTelemetry Protocol) immediately:

    from opentelemetry import hint
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    
    auth_b64 = base64.b64encode(f"api:{WANDB_API_KEY}".encode()).decode()
    exporter = OTLPSpanExporter(
        endpoint="https://hint.wandb.ai/otel/v1/traces",
        headers={"Authorization": f"Primary {auth_b64}", "project_id": WEAVE_PROJECT}
    )
    
    # Create spans to trace execution
    with tracer.start_as_current_span("invoke_agent") as span:
        span.set_attribute("enter.worth", json.dumps({"immediate": user_message}))
        end result = agent(user_message)
        span.set_attribute("output.worth", json.dumps({"message": end result.message}))

    This method maintains compatibility with AgentCore’s current OpenTelemetry infrastructure whereas routing traces to W&B Weave for visualization.When utilizing each AgentCore and W&B Weave collectively, groups have a number of choices for observability. AgentCore’s CloudWatch integration displays system well being, useful resource utilization, and error charges whereas offering tracing for agent reasoning and gear choice. W&B Weave presents visualization capabilities that current execution knowledge in codecs acquainted to groups already utilizing the W&B setting. Each options present visibility into how brokers course of data and make choices, permitting organizations to decide on the observability method that greatest aligns with their current workflows and preferences.This dual-layer method means customers can:

    • Monitor manufacturing service degree agreements (SLAs) by way of CloudWatch alerts
    • Debug advanced agent behaviors in W&B Weave’s hint explorer
    • Optimize token utilization and latency with detailed execution breakdowns
    • Examine agent efficiency throughout completely different prompts and configurations

    The mixing requires minimal code modifications, preserves your current AgentCore deployment, and scales along with your agent complexity. Whether or not you’re constructing easy tool-calling brokers or orchestrating multi-step workflows, this observability stack supplies the insights wanted to iterate shortly and deploy confidently.

    For implementation particulars and full code examples, discuss with our earlier put up.

    Conclusion

    On this put up, we demonstrated the right way to construct and optimize enterprise-grade agentic AI options by combining Amazon Bedrock’s FMs and AgentCore with W&B Weave’s complete observability toolkit. We explored how W&B Weave can improve each stage of the LLM growth lifecycle—from preliminary experimentation within the Playground to systematic analysis of mannequin efficiency, and eventually to manufacturing monitoring of advanced agent workflows.

    The mixing between Amazon Bedrock and W&B Weave supplies a number of key capabilities:

    • Computerized monitoring of Amazon Bedrock FM calls with minimal code modifications utilizing the W&B Weave SDK
    • Speedy experimentation by way of the W&B Weave Playground’s intuitive interface for testing prompts and evaluating fashions
    • Systematic analysis with customized scoring capabilities to guage completely different Amazon Bedrock fashions
    • Complete observability for AgentCore deployments, with CloudWatch metrics offering extra sturdy operational monitoring supplemented by detailed execution traces

    To get began:

    • Request a free trial or subscribe to Weights &Biases AI Improvement Platform by way of AWS Market
    • Set up the W&B Weave SDK and comply with our code examples to start monitoring your Bedrock FM calls
    • Experiment with completely different fashions within the W&B Weave Playground by including your AWS credentials and testing varied Amazon Bedrock FMs
    • Arrange evaluations utilizing the W&B Weave Analysis framework to systematically evaluate mannequin efficiency on your use instances
    • Improve your AgentCore brokers by including W&B Weave observability utilizing both the native SDK or OpenTelemetry integration

    Begin with a easy integration to trace your Amazon Bedrock calls, then progressively undertake extra superior options as your AI functions develop in complexity. The mixture of Amazon Bedrock and W&B Weave’s complete growth instruments supplies the inspiration wanted to construct, consider, and preserve production-ready AI options at scale.


    In regards to the authors

    James Yi is a Senior AI/ML Companion Options Architect at AWS. He spearheads AWS’s strategic partnerships in Rising Applied sciences, guiding engineering groups to design and develop cutting-edge joint options in generative AI. He permits area and technical groups to seamlessly deploy, function, safe, and combine companion options on AWS. James collaborates intently with enterprise leaders to outline and execute joint Go-To-Market methods, driving cloud-based enterprise progress. Outdoors of labor, he enjoys taking part in soccer, touring, and spending time together with his household.

    Ray Strickland is a Senior Companion Options Architect at AWS specializing in AI/ML, Agentic AI and Clever Doc Processing. He permits companions to deploy scalable generative AI options utilizing AWS greatest practices and drives innovation by way of strategic companion enablement packages. Ray collaborates throughout a number of AWS groups to speed up AI adoption and has in depth expertise in companion analysis and enablement.

    Thomas Capelle is a Machine Studying Engineer at Weights & Biases. He’s answerable for maintaining the www.github.com/wandb/examples repository stay and updated. He additionally builds content material on MLOPS, functions of W&B to industries, and enjoyable deep studying generally. Beforehand he was utilizing deep studying to resolve short-term forecasting for photo voltaic power. He has a background in City Planning, Combinatorial Optimization, Transportation Economics, and Utilized Math.

    Scott Juang is the Director of Alliances at Weights & Biases. Previous to W&B, he led various strategic alliances at AWS and Cloudera. Scott studied Supplies Engineering and has a ardour for renewable power.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026

    P-EAGLE: Quicker LLM inference with Parallel Speculative Decoding in vLLM

    March 14, 2026

    We Used 5 Outlier Detection Strategies on a Actual Dataset: They Disagreed on 96% of Flagged Samples

    March 13, 2026
    Top Posts

    Seth Godin on Management, Vulnerability, and Making an Influence within the New World Of Work

    March 14, 2026

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025
    Don't Miss

    Seth Godin on Management, Vulnerability, and Making an Influence within the New World Of Work

    By Charlotte LiMarch 14, 2026

    http://visitors.libsyn.com/safe/futureofworkpodcast/Audio_45min_-_Seth_Godin_-_WITH_ADS.mp3 Would you like each day management insights, knowledge, and ideas? Subscribe to Nice Management On…

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026

    AMC Robotics and HIVE Announce Collaboration to Advance AI-Pushed Robotics Compute Infrastructure

    March 14, 2026

    Tremble Chatbot App Entry, Prices, and Characteristic Insights

    March 14, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.