Right now, we’re excited to announce that Amazon SageMaker HyperPod now helps managed node computerized scaling with Karpenter, so you may effectively scale your SageMaker HyperPod clusters to satisfy your inference and coaching calls for. Actual-time inference workloads require computerized scaling to handle unpredictable visitors patterns and keep service stage agreements (SLAs). As demand spikes, organizations should quickly adapt their GPU compute with out compromising response occasions or cost-efficiency. Not like self-managed Karpenter deployments, this service-managed resolution alleviates the operational overhead of putting in, configuring, and sustaining Karpenter controllers, whereas offering tighter integration with the resilience capabilities of SageMaker HyperPod. This managed strategy helps scale to zero, lowering the necessity for devoted compute assets to run the Karpenter controller itself, bettering cost-efficiency.
SageMaker HyperPod affords a resilient, high-performance infrastructure, observability, and tooling optimized for large-scale mannequin coaching and deployment. Corporations like Perplexity, HippocraticAI, H.AI, and Articul8 are already utilizing SageMaker HyperPod for coaching and deploying fashions. As extra prospects transition from coaching basis fashions (FMs) to working inference at scale, they require the power to robotically scale their GPU nodes to deal with actual manufacturing visitors by scaling up throughout excessive demand and cutting down during times of decrease utilization. This functionality necessitates a robust cluster auto scaler. Karpenter, an open supply Kubernetes node lifecycle supervisor created by AWS, is a well-liked selection amongst Kubernetes customers for cluster auto scaling as a result of its highly effective capabilities that optimize scaling occasions and cut back prices.
This launch supplies a managed Karpenter-based resolution for computerized scaling that’s put in and maintained by SageMaker HyperPod, eradicating the undifferentiated heavy lifting of setup and administration from prospects. The function is offered for SageMaker HyperPod EKS clusters, and you may allow auto scaling to remodel your SageMaker HyperPod cluster from static capability to a dynamic, cost-optimized infrastructure that scales with demand. This combines Karpenter’s confirmed node lifecycle administration with the purpose-built and resilient infrastructure of SageMaker HyperPod, designed for large-scale machine studying (ML) workloads. On this publish, we dive into the advantages of Karpenter, and supply particulars on enabling and configuring Karpenter in your SageMaker HyperPod EKS clusters.
New options and advantages
Karpenter-based auto scaling in your SageMaker HyperPod clusters supplies the next capabilities:
- Service managed lifecycle – SageMaker HyperPod handles Karpenter set up, updates, and upkeep, assuaging operational overhead
- Simply-in-time provisioning – Karpenter observes your pending pods and provisions the required compute to your workloads from an on-demand pool
- Scale to zero – You possibly can scale right down to zero nodes with out sustaining devoted controller infrastructure
- Workload-aware node choice – Karpenter chooses optimum occasion varieties primarily based on pod necessities, Availability Zones, and pricing to attenuate prices
- Automated node consolidation – Karpenter repeatedly evaluates clusters for optimization alternatives, shifting workloads to keep away from underutilized nodes
- Built-in resilience – Karpenter makes use of the built-in fault tolerance and node restoration mechanisms of SageMaker HyperPod
These capabilities are constructed on high of lately launched steady provisioning capabilities, which allows SageMaker HyperPod to robotically provision remaining capability within the background whereas workloads begin instantly on out there situations. When node provisioning encounters failures as a result of capability constraints or different points, SageMaker HyperPod robotically retries within the background till clusters attain their desired scale, so your auto scaling operations stay resilient and non-blocking.
Resolution overview
The next diagram illustrates the answer structure.
Karpenter works as a controller within the cluster and operates within the following steps:
- Watching – Karpenter watches for un-schedulable pods within the cluster by way of the Kubernetes API server. These could possibly be pods that go into pending state when deployed or robotically scaled to extend the reproduction rely.
- Evaluating – When Karpenter finds such pods, it computes the form and dimension of a NodeClaim to suit the set of pods necessities (GPU, CPU, reminiscence) and topology constraints, and checks if it might pair them with an present NodePool. For every NodePool, it queries the SageMaker HyperPod APIs to get the occasion varieties supported by the NodePool. It makes use of the details about occasion kind metadata ({hardware} necessities, zone, capability kind) to discover a matching NodePool.
- Provisioning – If Karpenter finds an identical NodePool, it creates a NodeClaim and tries to provision a brand new occasion for use as the brand new node. Karpenter internally makes use of the
sagemaker:UpdateClusterAPI to extend the capability of the chosen occasion group. - Disrupting – Karpenter periodically checks if a brand new node is required or not. If it’s not wanted, Karpenter deletes it, which internally interprets to a delete node request to the SageMaker HyperPod cluster.
Stipulations
Confirm you will have the required quotas for the situations you’ll create within the SageMaker HyperPod cluster. To evaluation your quotas, on the Service Quotas console, select AWS companies within the navigation pane, then select SageMaker. For instance, the next screenshot reveals the out there quota for g5.12xlarge situations (three).

To replace the cluster, you could first create AWS Identification and Entry Administration (IAM) permissions for Karpenter. For directions, see Create an IAM function for HyperPod autoscaling with Karpenter.
Create and configure a SageMaker HyperPod cluster
To start, launch and configure your SageMaker HyperPod EKS cluster and confirm that steady provisioning mode is enabled on cluster creation. Full the next steps:
- On the SageMaker AI console, select HyperPod clusters within the navigation pane.
- Select Create HyperPod cluster and Orchestrated on Amazon EKS.
- For Setup choices, choose Customized setup.
- For Identify, enter a reputation.
- For Occasion restoration, choose Automated.
- For Occasion provisioning mode, choose Use steady provisioning.
- Select Submit.

This setup creates the mandatory configuration comparable to digital non-public cloud (VPC), subnets, safety teams, and EKS cluster, and installs operators within the cluster. You too can present present assets comparable to an EKS cluster if you wish to use an present cluster as an alternative of making a brand new one. This setup will take round 20 minutes.
Confirm that every InstanceGroup is proscribed to 1 zone by choosing the OverrideVpcConfig and deciding on just one subnet per every InstanceGroup.

After you create the cluster, you could replace it to allow Karpenter. You are able to do this utilizing Boto3 or the AWS Command Line Interface (AWS CLI) utilizing the UpdateCluster API command (after configuring the AWS CLI to hook up with your AWS account).
The next code makes use of Python Boto3:
After you run this command and replace the cluster, you may confirm that Karpenter has been enabled by working the DescribeCluster API.
The next code makes use of Python:
The next code makes use of the AWS CLI:
The next code reveals our output:
Now you will have a working cluster. The following step is to arrange some customized assets in your cluster for Karpenter.
Create HyperpodNodeClass
HyperpodNodeClass is a customized useful resource that maps to pre-created occasion teams in SageMaker HyperPod, defining constraints round which occasion varieties and Availability Zones are supported for Karpenter’s auto scaling choices. To make use of HyperpodNodeClass, merely specify the names of the InstanceGroups of your SageMaker HyperPod cluster that you just need to use because the supply for the AWS compute assets to make use of to scale up your pods in your NodePools.
The HyperpodNodeClass title that you just use right here is carried over to the NodePool within the subsequent part the place you reference it. This tells the NodePool which HyperpodNodeClass to attract assets from. To create a HyperpodNodeClass, full the next steps:
- Create a YAML file (for instance,
nodeclass.yaml) much like the next code. AddInstanceGroupnames that you just used on the time of the SageMaker HyperPod cluster creation. You too can add new occasion teams to an present SageMaker HyperPod EKS cluster. - Reference the
HyperPodNodeClasstitle in your NodePool configuration.
The next is a pattern HyperpodNodeClass that makes use of ml.g6.xlarge and ml.g6.4xlarge occasion varieties:
- Apply the configuration to your EKS cluster utilizing
kubectl:
- Monitor the
HyperpodNodeClassstanding to confirm thePreparedsituation in standing is ready toTrueto make sure it was efficiently created:
The SageMaker HyperPod cluster should have AutoScaling enabled and the AutoScaling standing should change to InService earlier than the HyperpodNodeClass may be utilized.
For extra data and key concerns, see Autoscaling on SageMaker HyperPod EKS.
Create NodePool
The NodePool units constraints on the nodes that may be created by Karpenter and the pods that may run on these nodes. The NodePool may be set to carry out varied actions, comparable to:
- Outline labels and taints to restrict the pods that may run on nodes Karpenter creates
- Restrict node creation to sure zones, occasion varieties, and laptop architectures, and so forth
For extra details about NodePool, seek advice from NodePools. SageMaker HyperPod managed Karpenter helps a restricted set of well-known Kubernetes and Karpenter necessities, which we clarify on this publish.
To create a NodePool, full the next steps:
- Create a YAML file named
nodepool.yamlalong with your desired NodePool configuration.
The next code is a pattern configuration to create a pattern NodePool. We specify the NodePool to incorporate our ml.g6.xlarge SageMaker occasion kind, and we moreover specify it for one zone. Consult with NodePools for extra customizations.
- Apply the NodePool to your cluster:
- Monitor the NodePool standing to make sure the
Preparedsituation within the standing is ready toTrue:
This instance reveals how a NodePool can be utilized to specify the {hardware} (occasion kind) and placement (Availability Zone) for pods.
Launch a easy workload
The next workload runs a Kubernetes deployment the place the pods in deployment are requesting for 1 CPU and 256 MB reminiscence per reproduction, per pod. The pods haven’t been spun up but.
After we apply this, we are able to see a deployment and a single node launch in our cluster, as proven within the following screenshot.

To scale this element, use the next command:
Inside a couple of minutes, we are able to see Karpenter add the requested nodes to the cluster.

Implement superior auto scaling for inference with KEDA and Karpenter
To implement an end-to-end auto scaling resolution on SageMaker HyperPod, you may arrange Kubernetes Occasion-driven Autoscaling (KEDA) together with Karpenter. KEDA allows pod-level auto scaling primarily based on a variety of metrics, together with Amazon CloudWatch metrics, Amazon Easy Queue Service (Amazon SQS) queue lengths, Prometheus queries, and useful resource utilization patterns. By configuring Keda ScaledObject assets to focus on your mannequin deployments, KEDA can dynamically alter the variety of inference pods primarily based on real-time demand indicators.
When integrating KEDA and Karpenter, this mixture creates a robust two-tier auto scaling structure. As KEDA scales your pods up or down primarily based on workload metrics, Karpenter robotically provisions or deletes nodes in response to altering useful resource necessities. This integration delivers optimum efficiency whereas controlling prices by ensuring your cluster has exactly the correct quantity of compute assets out there always. For efficient implementation, think about the next key elements:
- Set acceptable buffer thresholds in KEDA to accommodate Karpenter’s node provisioning time
- Configure cooldown durations fastidiously to forestall scaling oscillations
- Outline clear useful resource requests and limits to assist Karpenter make optimum node picks
- Create specialised NodePools tailor-made to particular workload traits
The next is a pattern spec of a KEDA ScaledObject file that scales the variety of pods primarily based on CloudWatch metrics of Utility Load Balancer (ALB) request rely:
Clear up
To scrub up your assets to keep away from incurring extra expenses, delete your SageMaker HyperPod cluster.
Conclusion
With the launch of Karpenter node auto scaling on SageMaker HyperPod, ML workloads can robotically adapt to altering workload necessities, optimize useful resource utilization, and assist management prices by scaling exactly when wanted. You too can combine it with event-driven pod auto scalers comparable to KEDA to scale primarily based on customized metrics.
To expertise these advantages to your ML workloads, allow Karpenter in your SageMaker HyperPod clusters. For detailed implementation steering and finest practices, seek advice from Autoscaling on SageMaker HyperPod EKS.
Concerning the authors
Vivek Gangasani is a Worldwide Lead GenAI Specialist Options Architect for SageMaker Inference. He drives Go-to-Market (GTM) and Outbound Product technique for SageMaker Inference. He additionally helps enterprises and startups deploy, handle, and scale their GenAI fashions with SageMaker and GPUs. Presently, he’s targeted on growing methods and content material for optimizing inference efficiency and GPU effectivity for internet hosting Giant Language Fashions. In his free time, Vivek enjoys climbing, watching motion pictures, and attempting completely different cuisines.
Adam Stanley is a Resolution Architect for Software program, Web and Mannequin Supplier prospects at Amazon Net Companies (AWS). He helps prospects adopting all AWS companies, however focuses totally on Machine Studying coaching and inference infrastructure. Previous to AWS, Adam went to the College of New South Wales and graduated with levels in Arithmetic and Accounting. You possibly can join with him on LinkedIn.
Kunal Jha is a Principal Product Supervisor at AWS, the place he focuses on constructing Amazon SageMaker HyperPod to allow scalable distributed coaching and fine-tuning of basis fashions. In his spare time, Kunal enjoys snowboarding and exploring the Pacific Northwest. You possibly can join with him on LinkedIn.
Ty Bergstrom is a Software program Engineer at Amazon Net Companies. He works on the HyperPod Clusters platform for Amazon SageMaker.

