AI is shifting quick, and for many of our clients, the actual alternative isn’t in experimenting with it—it’s in operating AI in manufacturing the place it drives significant enterprise outcomes. This implies constructing methods that run reliably, carry out at scale, and meet your group’s safety and compliance necessities.
In the present day at NVIDIA GTC 2026, AWS and NVIDIA introduced an expanded collaboration with new know-how integrations to assist rising AI compute demand and assist you to construct and run AI options which might be production-ready. These integrations span accelerated computing, interconnect applied sciences, and mannequin fine-tuning and inference. They embrace:
Main bulletins at NVIDIA GTC 2026
Scaling AI infrastructure with expanded GPU choices and optimized interconnect
Accelerating compute capability within the agentic AI period
Beginning in 2026, AWS will add greater than 1 million NVIDIA GPUs together with Blackwell and Rubin GPU architectures throughout our world cloud areas. AWS presents the broadest assortment of NVIDIA GPU-based cases of any cloud supplier to energy a various set of AI/ML workloads. AWS and NVIDIA are additionally collaborating on Spectrum networking and different infrastructure areas, including to over 15 years of joint innovation between our two firms.
AWS’s superior cloud and AI infrastructure supplies enterprises, startups, and researchers with the infrastructure wanted to construct and scale agentic AI methods—able to reasoning, planning, and performing autonomously throughout advanced workflows.
New Amazon EC2 cases with NVIDIA RTX PRO 4500 Blackwell Server Version GPUs
In the present day, we introduced that Amazon EC2 cases accelerated by NVIDIA RTX PRO 4500 Blackwell Server Version GPUs are coming quickly. AWS is the primary main cloud supplier to announce assist for RTX PRO 4500 Blackwell Server Version GPUs. These cases are well-suited for a variety of workloads, together with information analytics, conversational AI, content material technology, recommender methods, video streaming, video rendering, and different graphics workloads.
Amazon EC2 cases accelerated by NVIDIA RTX PRO 4500 Blackwell Server Version GPUs might be constructed on the AWS Nitro System, a mixture of devoted {hardware} and light-weight hypervisor which delivers virtually the entire compute and reminiscence sources of the host {hardware} to your cases for higher total useful resource utilization and efficiency. The Nitro System’s specialised {hardware}, software program, and firmware are designed to implement restrictions in order that no person, together with anybody at AWS, can entry your delicate AI workloads and information. As well as, the Nitro System helps firmware updates, bug fixes, and optimizations whereas the system stays operational. These capabilities throughout the Nitro System allow the improved useful resource effectivity, safety, and stability that AI, analytics, and graphics workloads require in manufacturing.
Accelerating interconnect for disaggregated LLM inference with NVIDIA NIXL on AWS EFA and Trainium
As mannequin sizes develop, communication overhead between GPUs or Trainium can grow to be a bottleneck. In the present day, we introduced assist for NVIDIA Inference Xfer Library (NIXL) with AWS EFA to speed up disaggregated Massive Language Mannequin (LLM) inference on Amazon EC2, throughout NVIDIA GPUs and AWS Trainiums. Accelerating disaggregated inference is crucial for scaling fashionable AI workloads as a result of it allows environment friendly overlap of communication and computation whereas minimizing communication latency and maximizing GPU utilization. This integration allows high-throughput, low-latency KV-cache information motion between GPU compute nodes performing token technology and distributed reminiscence sources that retailer KV-cache state. It additionally supplies the pliability to construct inference clusters utilizing any mixture of GPU and Trainium EFA-enabled EC2 cases. NIXL with EFA integrates natively with in style open-source frameworks resembling NVIDIA Dynamo, vLLM, and SGLang, delivering improved inter-token latency and extra environment friendly KV-cache reminiscence utilization.
Accelerating information analytics with Amazon EMR and NVIDIA GPUs
Operating Apache Spark 3x sooner utilizing Amazon EMR on Amazon EKS with G7e cases
Knowledge engineers and information scientists incessantly face hours-long information processing pipelines that gradual AI/ML mannequin iteration and enterprise intelligence technology. We’re seeing important efficiency positive aspects for these workloads—AWS and NVIDIA ship 3x sooner efficiency for Apache Spark workloads with Amazon EMR on EKS on G7e cases. This efficiency outcomes from joint AWS-NVIDIA engineering collaboration optimizing GPU-accelerated analytics by combining Amazon EMR on EKS with NVIDIA’s RTX PRO 6000 structure. With Amazon EMR and G7e cases, information engineers and information scientists can speed up time-to-insight for AI/ML characteristic engineering, advanced ETL transformations, and real-time analytics at scale. Prospects operating large-scale information processing pipelines can lower the time wanted to run analytics whereas sustaining full compatibility with current Spark purposes.
Increasing NVIDIA Nemotron mannequin assist on Amazon Bedrock
Tremendous-tuning Nemotron fashions in Amazon Bedrock with Reinforcement Tremendous-Tuning (Coming quickly)
Builders will quickly be capable to fine-tune NVIDIA Nemotron fashions instantly on Amazon Bedrock utilizing Reinforcement Tremendous-Tuning (RFT). That is important for groups that must align mannequin habits to particular domains, whether or not that’s authorized, healthcare, finance, or another specialised subject. Reinforcement fine-tuning allows you to form how a mannequin causes and responds, not simply what it is aware of. And since this runs natively on Amazon Bedrock, there’s zero infrastructure overhead. You outline the duty, present the suggestions sign, and Bedrock handles the remainder. Find out about Reinforcement Tremendous-Tuning in Amazon Bedrock.
Nemotron 3 Tremendous on Amazon Bedrock (Coming quickly)
NVIDIA Nemotron 3 Tremendous—a hybrid MoE mannequin constructed for multi-agent workloads and prolonged reasoning—is coming quickly to Amazon Bedrock. Designed to allow AI brokers to take care of accuracy throughout advanced, multi-step workflows, it powers use instances throughout finance cybersecurity, retail , and software program growth—delivering quick, cost-efficient inference by means of a completely managed API.
Enhancing vitality effectivity and sustainability
As AI workloads scale, efficiency per watt isn’t only a sustainability metric—it’s a aggressive benefit. In this NVIDIA GTC session, Amazon CSO Kara Hurst will be part of sustainability leaders from Equinix and PepsiCo to debate how AI is remodeling enterprise vitality and infrastructure at scale—from information facilities as energetic grid individuals to AI as an enterprise effectivity engine, and the way AWS will help you obtain optimum vitality effectivity with AWS infrastructure being 4.1x extra energy-efficient than on-premises information facilities.
Constructed to run, collectively
What makes these bulletins thrilling isn’t any single functionality—it’s what they signify collectively. Fifteen years of partnership between AWS and NVIDIA has produced a full stack of AI infrastructure optimized finish to finish, from the GPU to the community to the managed providers layer. You don’t need to sew it collectively yourselves. It’s able to run.
When you’re at GTC this week, come discover us on the AWS sales space. Try dwell demos, catch our in-booth theater periods, and choose up personalized swag with AWS Swag Manufacturing unit.
Go to AWS at NVIDIA GTC 2026 to see every part AWS has occurring on the convention.
In regards to the authors

