Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Russian hackers accused of assault on Poland electrical energy grid

    January 26, 2026

    Palantir Defends Work With ICE to Workers Following Killing of Alex Pretti

    January 26, 2026

    The Workers Who Quietly Maintain Groups Collectively

    January 26, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Practice and deploy AI fashions at trillion-parameter scale with Amazon SageMaker HyperPod help for P6e-GB200 UltraServers
    Machine Learning & Research

    Practice and deploy AI fashions at trillion-parameter scale with Amazon SageMaker HyperPod help for P6e-GB200 UltraServers

    Oliver ChambersBy Oliver ChambersAugust 12, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Practice and deploy AI fashions at trillion-parameter scale with Amazon SageMaker HyperPod help for P6e-GB200 UltraServers
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Think about harnessing the ability of 72 cutting-edge NVIDIA Blackwell GPUs in a single system for the following wave of AI innovation, unlocking 360 petaflops of dense 8-bit floating level (FP8) compute and 1.4 exaflops of sparse 4-bit floating level (FP4) compute. Immediately, that’s precisely what Amazon SageMaker HyperPod delivers with the launch of help for P6e-GB200 UltraServers. Accelerated by NVIDIA GB200 NVL72, P6e-GB200 UltraServers present industry-leading GPU efficiency, community throughput, and reminiscence for creating and deploying trillion-parameter AI fashions at scale. By seamlessly integrating these UltraServers with the distributed coaching surroundings of SageMaker HyperPod, organizations can quickly scale mannequin growth, scale back downtime, and simplify the transition from coaching to large-scale deployment. With the automated, resilient, and extremely scalable machine studying infrastructure of SageMaker HyperPod, organizations can seamlessly distribute large AI workloads throughout hundreds of accelerators and handle mannequin growth end-to-end with unprecedented effectivity. Utilizing SageMaker HyperPod with P6e-GB200 UltraServers marks a pivotal shift in direction of sooner, extra resilient, and cost-effective coaching and deployment for state-of-the-art generative AI fashions.

    On this publish, we overview the technical specs of P6e-GB200 UltraServers, talk about their efficiency advantages, and spotlight key use circumstances. We then stroll although tips on how to buy UltraServer capability by versatile coaching plans and get began utilizing UltraServers with SageMaker HyperPod.

    Contained in the UltraServer

    P6e-GB200 UltraServers are accelerated by NVIDIA GB200 NVL72, connecting 36 NVIDIA Grace™ CPUs and 72 Blackwell GPUs in the identical NVIDIA NVLink™ area. Every ml.p6e-gb200.36xlarge compute node inside an UltraServer consists of two NVIDIA GB200 Grace Blackwell Superchips, every connecting two high-performance NVIDIA Blackwell GPUs and an Arm-based NVIDIA Grace CPU with the NVIDIA NVLink chip-to-chip (C2C) interconnect. SageMaker HyperPod is launching P6e-GB200 UltraServers in two sizes. The ml.u-p6e-gb200x36 UltraServer features a rack of 9 compute nodes absolutely related with NVSwitch (NVS), offering a complete of 36 Blackwell GPUs in the identical NVLink area, and the ml.u-p6e-gb200x72 UltraServer features a rack-pair of 18 compute nodes with a complete of 72 Blackwell GPUs in the identical NVLink area. The next diagram illustrates this configuration.

    Efficiency advantages of UltraServers

    On this part, we talk about among the efficiency advantages of UltraServers.

    GPU and compute energy

    With P6e-GB200 UltraServers, you may entry as much as 72 NVIDIA Blackwell GPUs inside a single NVLink area, with a complete of 360 petaflops of FP8 compute (with out sparsity), 1.4 exaflops of FP4 compute (with sparsity) and 13.4 TB of high-bandwidth reminiscence (HBM3e). EveryGrace Blackwell Superchip pairs two Blackwell GPUs with one Grace CPU by the NVLink-C2C interconnect, delivering 10 petaflops of dense FP8 compute, 40 petaflops of sparse FP4 compute, as much as 372 GB HBM3e, and 850GB of cache-coherent quick reminiscence per module. This co-location boosts bandwidth between GPU and CPU by an order of magnitude in comparison with previous-generation cases. Every NVIDIA Blackwell GPU incorporates a second-generation Transformer Engine and helps the newest AI precision microscaling (MX) knowledge codecs equivalent to MXFP6 and MXFP4, in addition to NVIDIA NVFP4. When mixed with frameworks like NVIDIA Dynamo, NVIDA TensorRT-LLM and NVIDIA NeMo, these Transformer Engines considerably speed up inference and coaching for big language fashions (LLMs) and Combination-of-Specialists (MoE) fashions, supporting increased effectivity and efficiency for contemporary AI workloads.

    Excessive-performance networking

    P6e-GB200 UltraServers ship as much as 130 TBps of low-latency NVLink bandwidth between GPUs for environment friendly large-scale AI workload communication. At double the bandwidth of its predecessor, the fifth-generation NVIDIA NVLink gives as much as 1.8 TBps of bidirectional, direct GPU-to-GPU interconnect, vastly enhancing intra-server communication. Every compute node inside an UltraServer could be configured with as much as 17 bodily community interface playing cards (NICs), every supporting as much as 400 Gbps of bandwidth. P6e-GB200 UltraServers present as much as 28.8 Tbps of whole Elastic Material Adapter (EFA) v4 networking, utilizing the Scalable Dependable Datagram (SRD) protocol to intelligently route community visitors throughout a number of paths, offering easy operation even throughout congestion or {hardware} failures. For extra data, discuss with EFA configuration for a P6e-GB200 cases.

    Storage and knowledge throughput

    P6e-GB200 UltraServers help as much as 405 TB of native NVMe SSD storage, ideally suited for large-scale datasets and quick checkpointing throughout AI mannequin coaching. For top-performance shared storage, Amazon FSx for Lustre file methods could be accessed over EFA with GPUDirect Storage (GDS), offering direct knowledge switch between the file system and the GPU reminiscence with TBps of throughput and thousands and thousands of enter/output operations per second (IOPS) for demanding AI coaching and inference.

    Topology-aware scheduling

    Amazon Elastic Compute Cloud (Amazon EC2) gives topology data that describes the bodily and community relationships between cases in your cluster. For UltraServer compute nodes, Amazon EC2 exposes which cases belong to the identical UltraServer, so that you’re coaching and inference algorithms can perceive NVLink connectivity patterns. This topology data helps optimize distributed coaching by permitting frameworks just like the NVIDIA Collective Communications Library (NCCL) to make clever choices about communication patterns and knowledge placement. For extra data, see How Amazon EC2 occasion topology works.

    With Amazon Elastic Kubernetes Service (Amazon EKS) orchestration, SageMaker HyperPod routinely labels UltraServer compute nodes with their respective AWS Area, Availability Zone, Community Node Layers (1–4), and UltraServer ID. These topology labels can be utilized with node affinities, and pod topology unfold constraints to assign Pods to cluster nodes for optimum efficiency.

    With Slurm orchestration, SageMaker HyperPod routinely allows the topology plugin and creates a topology.conf file with the respective BlockName, Nodes, and BlockSizes to match your UltraServer capability. This manner, you may group and section your compute nodes to optimize job efficiency.

    Use circumstances for UltraServers

    P6e-GB200 UltraServers can effectively prepare fashions with over a trillion parameters on account of their unified NVLink area, ultrafast reminiscence, and excessive cross-node bandwidth, making them ideally suited for state-of-the-art AI growth. The substantial interconnect bandwidth makes positive even extraordinarily massive fashions could be partitioned and educated in a extremely parallel and environment friendly method with out the efficiency setbacks seen in disjointed multi-node methods. This leads to sooner iteration cycles and higher-quality AI fashions, serving to organizations push the boundaries of state-of-the-art AI analysis and innovation.

    For real-time trillion-parameter mannequin inference, P6e-GB200 UltraServers allow 30 instances sooner inference on frontier trillion-parameter LLMs in comparison with prior platforms, reaching real-time efficiency for complicated fashions utilized in generative AI, pure language understanding, and conversational brokers. When paired with NVIDIA Dynamo, P6e-GB200 UltraServers ship vital efficiency positive factors, particularly for lengthy context lengths. NVIDIA Dynamo disaggregates the compute-heavy prefill section and the memory-heavy decode section onto completely different GPUs, supporting impartial optimization and useful resource allocation inside the massive 72-GPU NVLink area. This permits extra environment friendly administration of enormous context home windows and high-concurrency functions.

    P6e-GB200 UltraServers supply substantial advantages to startup, analysis, and enterprise prospects with a number of groups that have to run numerous distributed coaching and inference workloads on shared infrastructure. When used at the side of SageMaker HyperPod job governance, UltraServers present distinctive scalability and useful resource pooling, so completely different groups can launch simultaneous jobs with out bottlenecks. Enterprises can maximize infrastructure utilization, scale back total prices, and speed up undertaking timelines, all whereas supporting the complicated wants of groups creating and serving superior AI fashions, together with large LLMs for high-concurrency real-time inference, throughout a single, resilient platform.

    Versatile coaching plans for UltraServer capability

    SageMaker AI at present gives P6e-GB200 UltraServer capability by versatile coaching plans within the Dallas AWS Native Zone (us-east-1-dfw-2a). UltraServers can be utilized for each SageMaker HyperPod and SageMaker coaching jobs.

    To get began, navigate to the SageMaker AI coaching plans console, which features a new UltraServer compute sort, from which you’ll choose your UltraServer sort: ml.u-p6e-gb200x36 (containing 9 ml.p6e-gb200.36xlarge compute nodes) or ml.u-p6e-gb200x72 (containing 18 ml.p6e-gb200.36xlarge compute nodes).

    After discovering the coaching plan that matches your wants, it’s endorsed that you just configure at the very least one spare ml.p6e-gb200.36xlarge compute node to ensure defective cases could be rapidly changed with minimal disruption.

    Create an UltraServer cluster with SageMaker HyperPod

    After buying an UltraServer coaching plan, you may add the capability to an ml.p6e-gb200.36xlarge sort occasion group inside your SageMaker HyperPod cluster and specify the amount of cases that you just wish to provision as much as the quantity accessible inside the coaching plan. For instance, in the event you bought a coaching plan for one ml.u-p6e-gb200x36 UltraServer, you possibly can provision as much as 9 compute nodes, whereas in the event you bought a coaching plan for one ml.u-p6e-gb200x72 UltraServer, you possibly can provision as much as 18 compute nodes.

    By default, SageMaker will optimize the position of occasion group nodes inside the similar UltraServer in order that GPUs throughout nodes are interconnected inside the similar NVLink area to attain the perfect knowledge switch efficiency in your jobs. For instance, if you are going to buy two ml.u-p6e-gb200x72 UltraServers with 17 compute nodes accessible every (assuming you configured two spares), then create an occasion group with 24 nodes, the primary 17 compute nodes might be positioned on UltraServer A, and the opposite 7 compute nodes might be positioned on UltraServer B.

    Conclusion

    P6e-GB200 UltraServers assist organizations prepare, fine-tune, and serve the world’s most bold AI fashions at scale. By combining extraordinary GPU assets, ultrafast networking, and industry-leading reminiscence with the automation and scalability of SageMaker HyperPod, enterprises can speed up the completely different phases of the AI lifecycle, from experimentation and distributed coaching by seamless inference and deployment. This highly effective answer breaks new floor in efficiency and adaptability and reduces operational complexity and prices, in order that innovators can unlock new prospects and lead the following period of AI development.


    In regards to the authors

    Nathan Arnold is a Senior AI/ML Specialist Options Architect at AWS primarily based out of Austin Texas. He helps AWS prospects—from small startups to massive enterprises—prepare and deploy basis fashions effectively on AWS. When he’s not working with prospects, he enjoys climbing, path operating, and enjoying along with his canines.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    How CLICKFORCE accelerates data-driven promoting with Amazon Bedrock Brokers

    January 26, 2026

    5 Breakthroughs in Graph Neural Networks to Watch in 2026

    January 26, 2026

    AI within the Workplace – O’Reilly

    January 26, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    Russian hackers accused of assault on Poland electrical energy grid

    By Declan MurphyJanuary 26, 2026

    On Dec. 29 and 30, the Polish electrical energy grid was subjected to a cyberattack…

    Palantir Defends Work With ICE to Workers Following Killing of Alex Pretti

    January 26, 2026

    The Workers Who Quietly Maintain Groups Collectively

    January 26, 2026

    Nike Knowledge Breach Claims Floor as WorldLeaks Leaks 1.4TB of Recordsdata On-line – Hackread – Cybersecurity Information, Knowledge Breaches, AI, and Extra

    January 26, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.