Think about harnessing the ability of 72 cutting-edge NVIDIA Blackwell GPUs in a single system for the following wave of AI innovation, unlocking 360 petaflops of dense 8-bit floating level (FP8) compute and 1.4 exaflops of sparse 4-bit floating level (FP4) compute. Immediately, that’s precisely what Amazon SageMaker HyperPod delivers with the launch of help for P6e-GB200 UltraServers. Accelerated by NVIDIA GB200 NVL72, P6e-GB200 UltraServers present industry-leading GPU efficiency, community throughput, and reminiscence for creating and deploying trillion-parameter AI fashions at scale. By seamlessly integrating these UltraServers with the distributed coaching surroundings of SageMaker HyperPod, organizations can quickly scale mannequin growth, scale back downtime, and simplify the transition from coaching to large-scale deployment. With the automated, resilient, and extremely scalable machine studying infrastructure of SageMaker HyperPod, organizations can seamlessly distribute large AI workloads throughout hundreds of accelerators and handle mannequin growth end-to-end with unprecedented effectivity. Utilizing SageMaker HyperPod with P6e-GB200 UltraServers marks a pivotal shift in direction of sooner, extra resilient, and cost-effective coaching and deployment for state-of-the-art generative AI fashions.
On this publish, we overview the technical specs of P6e-GB200 UltraServers, talk about their efficiency advantages, and spotlight key use circumstances. We then stroll although tips on how to buy UltraServer capability by versatile coaching plans and get began utilizing UltraServers with SageMaker HyperPod.
Contained in the UltraServer
P6e-GB200 UltraServers are accelerated by NVIDIA GB200 NVL72, connecting 36 NVIDIA Grace™ CPUs and 72 Blackwell GPUs in the identical NVIDIA NVLink™ area. Every ml.p6e-gb200.36xlarge compute node inside an UltraServer consists of two NVIDIA GB200 Grace Blackwell Superchips, every connecting two high-performance NVIDIA Blackwell GPUs and an Arm-based NVIDIA Grace CPU with the NVIDIA NVLink chip-to-chip (C2C) interconnect. SageMaker HyperPod is launching P6e-GB200 UltraServers in two sizes. The ml.u-p6e-gb200x36 UltraServer features a rack of 9 compute nodes absolutely related with NVSwitch (NVS), offering a complete of 36 Blackwell GPUs in the identical NVLink area, and the ml.u-p6e-gb200x72 UltraServer features a rack-pair of 18 compute nodes with a complete of 72 Blackwell GPUs in the identical NVLink area. The next diagram illustrates this configuration.
Efficiency advantages of UltraServers
On this part, we talk about among the efficiency advantages of UltraServers.
GPU and compute energy
With P6e-GB200 UltraServers, you may entry as much as 72 NVIDIA Blackwell GPUs inside a single NVLink area, with a complete of 360 petaflops of FP8 compute (with out sparsity), 1.4 exaflops of FP4 compute (with sparsity) and 13.4 TB of high-bandwidth reminiscence (HBM3e). EveryGrace Blackwell Superchip pairs two Blackwell GPUs with one Grace CPU by the NVLink-C2C interconnect, delivering 10 petaflops of dense FP8 compute, 40 petaflops of sparse FP4 compute, as much as 372 GB HBM3e, and 850GB of cache-coherent quick reminiscence per module. This co-location boosts bandwidth between GPU and CPU by an order of magnitude in comparison with previous-generation cases. Every NVIDIA Blackwell GPU incorporates a second-generation Transformer Engine and helps the newest AI precision microscaling (MX) knowledge codecs equivalent to MXFP6 and MXFP4, in addition to NVIDIA NVFP4. When mixed with frameworks like NVIDIA Dynamo, NVIDA TensorRT-LLM and NVIDIA NeMo, these Transformer Engines considerably speed up inference and coaching for big language fashions (LLMs) and Combination-of-Specialists (MoE) fashions, supporting increased effectivity and efficiency for contemporary AI workloads.
Excessive-performance networking
P6e-GB200 UltraServers ship as much as 130 TBps of low-latency NVLink bandwidth between GPUs for environment friendly large-scale AI workload communication. At double the bandwidth of its predecessor, the fifth-generation NVIDIA NVLink gives as much as 1.8 TBps of bidirectional, direct GPU-to-GPU interconnect, vastly enhancing intra-server communication. Every compute node inside an UltraServer could be configured with as much as 17 bodily community interface playing cards (NICs), every supporting as much as 400 Gbps of bandwidth. P6e-GB200 UltraServers present as much as 28.8 Tbps of whole Elastic Material Adapter (EFA) v4 networking, utilizing the Scalable Dependable Datagram (SRD) protocol to intelligently route community visitors throughout a number of paths, offering easy operation even throughout congestion or {hardware} failures. For extra data, discuss with EFA configuration for a P6e-GB200 cases.
Storage and knowledge throughput
P6e-GB200 UltraServers help as much as 405 TB of native NVMe SSD storage, ideally suited for large-scale datasets and quick checkpointing throughout AI mannequin coaching. For top-performance shared storage, Amazon FSx for Lustre file methods could be accessed over EFA with GPUDirect Storage (GDS), offering direct knowledge switch between the file system and the GPU reminiscence with TBps of throughput and thousands and thousands of enter/output operations per second (IOPS) for demanding AI coaching and inference.
Topology-aware scheduling
Amazon Elastic Compute Cloud (Amazon EC2) gives topology data that describes the bodily and community relationships between cases in your cluster. For UltraServer compute nodes, Amazon EC2 exposes which cases belong to the identical UltraServer, so that you’re coaching and inference algorithms can perceive NVLink connectivity patterns. This topology data helps optimize distributed coaching by permitting frameworks just like the NVIDIA Collective Communications Library (NCCL) to make clever choices about communication patterns and knowledge placement. For extra data, see How Amazon EC2 occasion topology works.
With Amazon Elastic Kubernetes Service (Amazon EKS) orchestration, SageMaker HyperPod routinely labels UltraServer compute nodes with their respective AWS Area, Availability Zone, Community Node Layers (1–4), and UltraServer ID. These topology labels can be utilized with node affinities, and pod topology unfold constraints to assign Pods to cluster nodes for optimum efficiency.
With Slurm orchestration, SageMaker HyperPod routinely allows the topology plugin and creates a topology.conf file with the respective BlockName, Nodes, and BlockSizes to match your UltraServer capability. This manner, you may group and section your compute nodes to optimize job efficiency.
Use circumstances for UltraServers
P6e-GB200 UltraServers can effectively prepare fashions with over a trillion parameters on account of their unified NVLink area, ultrafast reminiscence, and excessive cross-node bandwidth, making them ideally suited for state-of-the-art AI growth. The substantial interconnect bandwidth makes positive even extraordinarily massive fashions could be partitioned and educated in a extremely parallel and environment friendly method with out the efficiency setbacks seen in disjointed multi-node methods. This leads to sooner iteration cycles and higher-quality AI fashions, serving to organizations push the boundaries of state-of-the-art AI analysis and innovation.
For real-time trillion-parameter mannequin inference, P6e-GB200 UltraServers allow 30 instances sooner inference on frontier trillion-parameter LLMs in comparison with prior platforms, reaching real-time efficiency for complicated fashions utilized in generative AI, pure language understanding, and conversational brokers. When paired with NVIDIA Dynamo, P6e-GB200 UltraServers ship vital efficiency positive factors, particularly for lengthy context lengths. NVIDIA Dynamo disaggregates the compute-heavy prefill section and the memory-heavy decode section onto completely different GPUs, supporting impartial optimization and useful resource allocation inside the massive 72-GPU NVLink area. This permits extra environment friendly administration of enormous context home windows and high-concurrency functions.
P6e-GB200 UltraServers supply substantial advantages to startup, analysis, and enterprise prospects with a number of groups that have to run numerous distributed coaching and inference workloads on shared infrastructure. When used at the side of SageMaker HyperPod job governance, UltraServers present distinctive scalability and useful resource pooling, so completely different groups can launch simultaneous jobs with out bottlenecks. Enterprises can maximize infrastructure utilization, scale back total prices, and speed up undertaking timelines, all whereas supporting the complicated wants of groups creating and serving superior AI fashions, together with large LLMs for high-concurrency real-time inference, throughout a single, resilient platform.
Versatile coaching plans for UltraServer capability
SageMaker AI at present gives P6e-GB200 UltraServer capability by versatile coaching plans within the Dallas AWS Native Zone (us-east-1-dfw-2a). UltraServers can be utilized for each SageMaker HyperPod and SageMaker coaching jobs.
To get began, navigate to the SageMaker AI coaching plans console, which features a new UltraServer compute sort, from which you’ll choose your UltraServer sort: ml.u-p6e-gb200x36 (containing 9 ml.p6e-gb200.36xlarge compute nodes) or ml.u-p6e-gb200x72 (containing 18 ml.p6e-gb200.36xlarge compute nodes).

After discovering the coaching plan that matches your wants, it’s endorsed that you just configure at the very least one spare ml.p6e-gb200.36xlarge compute node to ensure defective cases could be rapidly changed with minimal disruption.

Create an UltraServer cluster with SageMaker HyperPod
After buying an UltraServer coaching plan, you may add the capability to an ml.p6e-gb200.36xlarge sort occasion group inside your SageMaker HyperPod cluster and specify the amount of cases that you just wish to provision as much as the quantity accessible inside the coaching plan. For instance, in the event you bought a coaching plan for one ml.u-p6e-gb200x36 UltraServer, you possibly can provision as much as 9 compute nodes, whereas in the event you bought a coaching plan for one ml.u-p6e-gb200x72 UltraServer, you possibly can provision as much as 18 compute nodes.

By default, SageMaker will optimize the position of occasion group nodes inside the similar UltraServer in order that GPUs throughout nodes are interconnected inside the similar NVLink area to attain the perfect knowledge switch efficiency in your jobs. For instance, if you are going to buy two ml.u-p6e-gb200x72 UltraServers with 17 compute nodes accessible every (assuming you configured two spares), then create an occasion group with 24 nodes, the primary 17 compute nodes might be positioned on UltraServer A, and the opposite 7 compute nodes might be positioned on UltraServer B.
Conclusion
P6e-GB200 UltraServers assist organizations prepare, fine-tune, and serve the world’s most bold AI fashions at scale. By combining extraordinary GPU assets, ultrafast networking, and industry-leading reminiscence with the automation and scalability of SageMaker HyperPod, enterprises can speed up the completely different phases of the AI lifecycle, from experimentation and distributed coaching by seamless inference and deployment. This highly effective answer breaks new floor in efficiency and adaptability and reduces operational complexity and prices, in order that innovators can unlock new prospects and lead the following period of AI development.
In regards to the authors
Nathan Arnold is a Senior AI/ML Specialist Options Architect at AWS primarily based out of Austin Texas. He helps AWS prospects—from small startups to massive enterprises—prepare and deploy basis fashions effectively on AWS. When he’s not working with prospects, he enjoys climbing, path operating, and enjoying along with his canines.

