Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    ViruaLover Chatbot Entry, Pricing, and Function Overview

    March 10, 2026

    Hacker abusing .arpa area to evade phishing detection, says Infoblox

    March 10, 2026

    This is Monitor the Artemis II Mission in Actual Time With NASA’s New Device

    March 10, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Run NVIDIA Nemotron 3 Nano as a totally managed serverless mannequin on Amazon Bedrock
    Machine Learning & Research

    Run NVIDIA Nemotron 3 Nano as a totally managed serverless mannequin on Amazon Bedrock

    Oliver ChambersBy Oliver ChambersMarch 10, 2026No Comments11 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Run NVIDIA Nemotron 3 Nano as a totally managed serverless mannequin on Amazon Bedrock
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    This publish is cowritten with Abdullahi Olaoye, Curtice Lockhart, Nirmal Kumar Juluru from NVIDIA.

    We’re excited to announce that NVIDIA’s Nemotron 3 Nano is now out there as a totally managed and serverless mannequin in Amazon Bedrock. This follows our earlier announcement at AWS re:Invent supporting NVIDIA Nemotron 2 Nano 9B and NVIDIA Nemotron 2 Nano VL 12B fashions.

    With NVIDIA Nemotron open fashions on Amazon Bedrock, you possibly can speed up innovation and ship tangible enterprise worth with out having to handle infrastructure complexities. You possibly can energy your generative AI functions with Nemotron’s capabilities via the inference capabilities of Amazon Bedrock and harness the good thing about its intensive options and tooling.

    This publish explores the technical traits of the NVIDIA Nemotron 3 Nano mannequin and discusses potential software use instances. Moreover, it supplies technical steerage that will help you get began utilizing this mannequin on your generative AI functions throughout the Amazon Bedrock atmosphere.

    About Nemotron 3 Nano

    NVIDIA Nemotron 3 Nano is a small language mannequin (SLM) with a hybrid Combination-of-Specialists (MoE) structure that delivers excessive compute effectivity and accuracy that builders can use to construct specialised agentic AI techniques. The mannequin is absolutely open with open-weights, datasets, and recipes facilitating transparency and confidence for builders and enterprises. In comparison with different comparable sized fashions, Nemotron 3 Nano excels in coding and reasoning duties, taking the lead on benchmarks corresponding to SWE Bench Verified, AIME 2025, Area Laborious v2, and IFBench.

    Mannequin overview:

    • Structure:
      • Combination-of-Specialists (MoE) with Hybrid Transformer-Mamba Structure
      • Helps Token Funds for offering accuracy whereas avoiding overthinking
    • Accuracy:
      • Main accuracy on coding, scientific reasoning, math, software calling, instruction following, and chat
      • Nemotron 3 Nano leads on benchmarks corresponding to SWE Bench, AIME 2025, Humanity Final Examination, IFBench, RULER, and Area Laborious (in comparison with different open language fashions with 30 billion or fewer MoE)
    • Mannequin measurement: 30 B with 3 B energetic parameters
    • Context size: 256K
    • Mannequin enter: Textual content
    • Mannequin output: Textual content

    Nemotron 3 Nano combines Mamba, Transformer, and Combination-of-Specialists layers right into a single spine to assist steadiness effectivity, reasoning accuracy, and scale. Mamba permits long-range sequence modeling with low reminiscence overhead, whereas Transformer layers assist add exact consideration for structured reasoning duties like code, math, and planning. MoE routing additional boosts scalability by activating solely a subset of consultants per token, serving to to enhance latency and throughput. This makes Nemotron 3 Nano particularly well-suited for agent clusters operating many concurrent, light-weight workflows.

    To study extra about Nemotron 3 Nano’s structure and the way it’s educated, see Inside NVIDIA Nemotron 3: Strategies, Instruments, and Information That Make It Environment friendly and Correct.

    Mannequin benchmarks

    The next picture exhibits that Nemotron 3 Nano leads in probably the most engaging quadrant in Synthetic Evaluation Openness Index vs. Intelligence Index. Why openness issues: It builds belief via transparency. Builders and enterprises can confidently construct on Nemotron with clear visibility into the mannequin, knowledge pipeline, and knowledge traits, enabling easy auditing and governance.

    Title: Chart exhibiting Nemotron 3 Nano in probably the most engaging quadrant in Synthetic Evaluation Openness vs Intelligence Index (Supply: Synthetic Evaluation)

    As proven within the following picture, Nemotron 3 Nano supplies main accuracy with the best effectivity among the many open fashions and scores a formidable 52 factors, a major bounce over the earlier Nemotron 2 Nano mannequin. Token demand is growing because of agentic AI, so the power to ‘assume quick’ (arrive on the appropriate reply shortly whereas utilizing fewer tokens) is important. Nemotron 3 Nano delivers excessive throughput with its environment friendly Hybrid Transformer-Mamba and MoE structure.

    Title: NVIDIA Nemotron 3 Nano supplies highest effectivity with main accuracy amongst open fashions with a formidable 52 factors rating on Synthetic Evaluation Intelligence vs. Output Velocity Index. (Supply: Synthetic Evaluation)

    NVIDIA Nemotron 3 Nano use instances

    Nemotron 3 Nano helps energy numerous use instances for various industries. Among the use instances embody

    • Finance – Speed up mortgage processing by extracting knowledge, analyzing earnings patterns, detecting fraudulent operations, decreasing cycle instances, and danger.
    • Cybersecurity – Robotically triage vulnerabilities, carry out in-depth malware evaluation, and proactively hunt for safety threats.
    • Software program improvement – Help with duties like code summarization.
    • Retail – Optimize stock administration and assist improve in-store service with real-time, personalised product suggestions and assist.

    Get began with NVIDIA Nemotron 3 Nano in Amazon Bedrock

    To check NVIDIA Nemotron 3 Nano in Amazon Bedrock, full the next steps:

    1. Navigate to the Amazon Bedrock console and choose Chat/Textual content playground from the left menu (below the Take a look at part).
    2. Select Choose mannequin within the upper-left nook of the playground.
    3. Select NVIDIA from the class listing, then choose NVIDIA Nemotron 3 Nano.
    4. Select Apply to load the mannequin.

    After choice, you possibly can check the mannequin instantly. Let’s use the next immediate to generate a unit check in Python code utilizing the pytest framework:

    Write a pytest unit check suite for a Python operate referred to as calculate_mortgage(principal, fee, years). Embrace check instances for: 1) An ordinary 30-year fastened mortgage 2) An edge case with 0% curiosity 3) Error dealing with for unfavourable enter values.

    Complicated duties like this immediate can profit from a sequence of thought method to assist produce a exact consequence based mostly on the reasoning capabilities constructed natively into the mannequin.

    Utilizing the AWS CLI and SDKs

    You possibly can entry the mannequin programmatically utilizing the mannequin ID nvidia.nemotron-nano-3-30b. The mannequin helps each the InvokeModel and Converse APIs via the AWS Command Line Interface (AWS CLI) and AWS SDK with nvidia.nemotron-nano-3-30b because the mannequin ID. Additional, it helps the Amazon Bedrock OpenAI SDK appropriate API.

    Run the next command to invoke the mannequin straight out of your terminal utilizing the AWS Command Line Interface (AWS CLI) and the InvokeModel API:

    aws bedrock-runtime invoke-model  
     --model-id nvidia.nemotron-nano-3-30b  
     --region us-west-2  
     --body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
     --cli-binary-format raw-in-base64-out  
    invoke-model-output.txt

    To invoke the mannequin via the AWS SDK for Python (boto3), use the next script to ship a immediate to the mannequin, on this case by utilizing the Converse API:

    import boto3 
    from botocore.exceptions import ClientError 
    
    # Create a Bedrock Runtime consumer within the AWS Area you need to use. 
    consumer = boto3.consumer("bedrock-runtime", region_name="us-west-2") 
    
    # Set the mannequin ID
    model_id = "nvidia.nemotron-nano-3-30b" 
    
    # Begin a dialog with the person message. 
    
    user_message = "Type_Your_Prompt_Here" 
    dialog = [ 
       { 
           "role": "user", 
    
           "content": [{"text": user_message}], 
       } 
    ]  
    
    attempt: 
       # Ship the message to the mannequin utilizing a fundamental inference configuration. 
       response = consumer.converse( 
            modelId=model_id, 
    
           messages=dialog, 
            inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}, 
       ) 
     
       # Extract and print the response textual content. 
        response_text = response["output"]["message"]["content"][0]["text"] 
       print(response_text)
    
    besides (ClientError, Exception) as e: 
        print(f"ERROR: Cannot invoke '{model_id}'. Motive: {e}") 
        exit(1)

    To invoke the mannequin via the Amazon Bedrock OpenAI-compatible ChatCompletions endpoint, you are able to do so by utilizing the OpenAI SDK:

    # Import OpenAI SDK
    from openai import OpenAI
    
    # Set atmosphere variables
    os.environ["OPENAI_API_KEY"] = ""
    os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime..amazon.com/openai/v1"
    
    # Set the mannequin ID
    model_id = "nvidia.nemotron-nano-3-30b"
    
    # Set prompts
    system_prompt = “Type_Your_System_Prompt_Here”
    user_message = "Type_Your_User_Prompt_Here"
    
    
    # Use ChatCompletionsAPI
    response = consumer.chat.completions.create(
        mannequin= mannequin _ID,                 
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": user_message}
        ],
        temperature=0,
        max_completion_tokens=1000
    )
     
    # Extract and print the response textual content
    print(response.selections[0].message.content material)

    Use NVIDIA Nemotron 3 Nano with Amazon Bedrock options

    You possibly can improve your generative AI functions by combining Nemotron 3 Nano with the Amazon Bedrock managed instruments. Use Amazon Bedrock Guardrails to implement safeguards and Amazon Data Bases to create sturdy Retrieval Augmented Era (RAG) workflows.

    Amazon Bedrock guardrails

    Guardrails is a managed security layer that helps implement accountable AI by filtering dangerous content material, redacting delicate info (PII), and blocking particular matters throughout prompts and responses. It really works throughout a number of fashions to assist detect immediate injection assaults and hallucinations.

    Instance use case: In case you’re constructing a mortgage assistant, you possibly can assist stop it from providing basic funding recommendation. By configuring a filter for the phrase “shares”, person prompts containing that time period might be instantly blocked and obtain a customized message.

    To arrange a guardrail, full the next steps:

    1. Within the Amazon Bedrock console, navigate to the Construct part on the left and choose Guardrails.
    2. Create a brand new guardrail and configure the required filters on your use case.

    After configured, check the guardrail with numerous prompts to confirm its efficiency. You possibly can then fine-tune settings, corresponding to denied matters, phrase filters, and PII redaction, to match your particular security necessities. For a deep dive, see Create your guardrail.

    Amazon Bedrock Data Bases

    Amazon Bedrock Data Bases automates the whole RAG workflow. It handles ingesting content material out of your knowledge sources, chunking it into searchable segments, changing them into vector embeddings, and storing them in a vector database. Then, when a person submits a question, the system matches the enter towards saved vectors to seek out semantically comparable content material, which is then used to reinforce the immediate despatched to the muse mannequin.

    For this instance, we uploaded PDFs (for instance, Shopping for a New House, House Mortgage Toolkit, Searching for a Mortgage) to Amazon Easy Storage Service (Amazon S3) and chosen Amazon OpenSearch Serverless because the vector retailer. The next code demonstrates the way to question this data base utilizing the RetrieveAndGenerate API, whereas robotically facilitating security compliance alignment via a particular Guardrail ID.

    import boto3
    bedrock_agent_runtime_client = boto3.consumer('bedrock-agent-runtime')
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        enter={
            'textual content': 'I'm all for buying a house. What steps ought to I take to verify I'm ready to tackle a mortgage?'
        },
        retrieveAndGenerateConfiguration={
            'knowledgeBaseConfiguration': {
                'generationConfiguration': {
                    'guardrailConfiguration': {
                        'guardrailId': '',
                        'guardrailVersion': '1'
                    }
                },
                'knowledgeBaseId': '',
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/nvidia.nemotron-nano-3-30b',
                "generationConfiguration": {
                    "promptTemplate": {
                        "textPromptTemplate": (
                            "You're a useful assistant that solutions questions on mortgages"
                            "search outcomes.nn"
                            "Search outcomes:n$search_results$nn"
                            "Person question:n$question$nn"
                            "Reply clearly and concisely."
                        )
                    },
                },
                "orchestrationConfiguration": {
                    "promptTemplate": {
                        "textPromptTemplate": (
                            "You're very educated on mortgages"
                            "Dialog to date:n$conversation_history$nn"
                            "Person question:n$question$nn"
                            "$output_format_instructions$"
                        )
                    }
                }
            },
            'sort': 'KNOWLEDGE_BASE'
        }
    )
    print(response)

    It directs the NVIDIA Nemotron 3 Nano mannequin to synthesize the retrieved paperwork into a transparent, grounded reply utilizing your customized immediate template. To arrange your personal pipeline, evaluation the total walkthrough within the Amazon Bedrock Person Information.

    Conclusion

    On this publish, we confirmed you the way to get began with NVIDIA Nemotron 3 Nano on Amazon Bedrock for absolutely managed serverless inference. We additionally confirmed you the way to use the mannequin with Amazon Bedrock Data Bases and Amazon Bedrock Guardrails. The mannequin is now out there within the US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Mumbai), South America (Sao Paulo), Europe (London), and Europe (Milan) AWS Areas. Verify the full Area listing for future updates. To study extra, take a look at NVIDIA Nemotron and provides NVIDIA Nemotron 3 Nano a attempt within the Amazon Bedrock console immediately.


    Concerning the authors

    Antonio Rodriguez

    Antonio Rodriguez is a Principal Generative AI Specialist Options Architect at Amazon Internet Providers. He helps corporations of various sizes clear up their challenges, embrace innovation, and create new enterprise alternatives with Amazon Bedrock. Aside from work, he likes to spend time together with his household and play sports activities together with his buddies.

    Aris Tsakpinis

    Aris Tsakpinis is a Senior Specialist Options Architect for Generative AI specializing in open weight fashions on Amazon Bedrock and the broader generative AI open-source atmosphere. Alongside his skilled function, he’s pursuing a PhD in Machine Studying Engineering on the College of Regensburg, the place his analysis focuses on utilized generative AI in scientific domains.

    Abdullahi Olaoye

    Abdullahi Olaoye is a Senior AI Options Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and merchandise with cloud AI companies and open-source instruments to optimize AI mannequin deployment, inference, and generative AI workflows. He collaborates with cloud suppliers to assist improve AI workload efficiency and drive adoption of NVIDIA-powered AI and generative AI options.

    Curtice Lockhart

    Curtice Lockhart is an AI Options Architect at NVIDIA, the place he helps prospects deploy language and imaginative and prescient fashions to construct end-to-end AI workflows utilizing NVIDIA’s tooling on AWS. He enjoys making complicated AI ideas really feel approachable and spending his time exploring the artwork, music, and being outside.

    Nirmal Kumar Juluru

    Nirmal Kumar Juluru is a product advertising supervisor at NVIDIA driving the adoption of Nemotron and NeMo. He beforehand labored as a software program developer. Nirmal holds an MBA from Carnegie Mellon College and a bachelors in laptop science from BITS Pilani.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    Google Stax: Testing Fashions and Prompts Towards Your Personal Standards

    March 9, 2026

    The 6 Finest AI Agent Reminiscence Frameworks You Ought to Attempt in 2026

    March 9, 2026

    Multi-Frequency Fusion for Sturdy Video Face Forgery Detection

    March 9, 2026
    Top Posts

    ViruaLover Chatbot Entry, Pricing, and Function Overview

    March 10, 2026

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025
    Don't Miss

    ViruaLover Chatbot Entry, Pricing, and Function Overview

    By Amelia Harper JonesMarch 10, 2026

    The design of VirtuaLover Chat emphasizes dialogue over instructions, permitting conversations to hold ahead throughout…

    Hacker abusing .arpa area to evade phishing detection, says Infoblox

    March 10, 2026

    This is Monitor the Artemis II Mission in Actual Time With NASA’s New Device

    March 10, 2026

    Run NVIDIA Nemotron 3 Nano as a totally managed serverless mannequin on Amazon Bedrock

    March 10, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.