Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    March 14, 2026

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Prepare and deploy fashions on Amazon SageMaker HyperPod utilizing the brand new HyperPod CLI and SDK
    Machine Learning & Research

    Prepare and deploy fashions on Amazon SageMaker HyperPod utilizing the brand new HyperPod CLI and SDK

    Oliver ChambersBy Oliver ChambersSeptember 3, 2025No Comments30 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Prepare and deploy fashions on Amazon SageMaker HyperPod utilizing the brand new HyperPod CLI and SDK
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Coaching and deploying massive AI fashions requires superior distributed computing capabilities, however managing these distributed methods shouldn’t be advanced for information scientists and machine studying (ML) practitioners. The newly launched command line interface (CLI) and software program growth package (SDK) for Amazon SageMaker HyperPod simplify how you should utilize the service’s distributed coaching and inference capabilities.

    The SageMaker HyperPod CLI offers information scientists with an intuitive command-line expertise, abstracting away the underlying complexity of distributed methods. Constructed on high of the SageMaker HyperPod SDK, the CLI provides simple instructions for frequent workflows like launching coaching or fine-tuning jobs, deploying inference endpoints, and monitoring cluster efficiency. This makes it best for fast experimentation and iteration.

    For extra superior use instances requiring fine-grained management, the SageMaker HyperPod SDK allows programmatic entry to customise your ML workflows. Builders can use the SDK’s Python interface to exactly configure coaching and deployment parameters whereas sustaining the simplicity of working with acquainted Python objects.

    On this publish, we show tips on how to use each the CLI and SDK to coach and deploy massive language fashions (LLMs) on SageMaker HyperPod. We stroll by sensible examples of distributed coaching utilizing Absolutely Sharded Information Parallel (FSDP) and mannequin deployment for inference, showcasing how these instruments streamline the event of production-ready generative AI purposes.

    Conditions

    To observe the examples on this publish, you have to have the next conditions:

    As a result of the use instances that we show are about coaching and deploying LLMs with the SageMaker HyperPod CLI and SDK, you have to additionally set up the next Kubernetes operators within the cluster:

    Set up the SageMaker HyperPod CLI

    First, you have to set up the most recent model of the SageMaker HyperPod CLI and SDK (the examples on this publish are primarily based on model 3.1.0). From the native atmosphere, run the next command (you may as well set up in a Python digital atmosphere):

    # Set up the HyperPod CLI and SDK
    pip set up sagemaker-hyperpod

    This command units up the instruments wanted to work together with SageMaker HyperPod clusters. For an present set up, be sure to have the most recent model of the bundle put in (sagemaker-hyperpod>=3.1.0) to have the ability to use the related set of options. To confirm if the CLI is put in appropriately, you may run the hyp command and test the outputs:

    # Test if the HyperPod CLI is appropriately put in
    hyp

    The output can be much like the next, and consists of directions on tips on how to use the CLI:

    Utilization: hyp [OPTIONS] COMMAND [ARGS]...
    
    Choices:
      --help  Present this message and exit.
    
    Instructions:
      create               Create endpoints or pytorch jobs.
      delete               Delete endpoints or pytorch jobs.
      describe             Describe endpoints or pytorch jobs.
      get-cluster-context  Get context associated to the present set cluster.
      get-logs             Get pod logs for endpoints or pytorch jobs.
      get-monitoring       Get monitoring configurations for Hyperpod cluster.
      get-operator-logs    Get operator logs for endpoints.
      invoke               Invoke mannequin endpoints.
      listing                 Record endpoints or pytorch jobs.
      list-cluster         Record SageMaker Hyperpod Clusters with metadata.
      list-pods            Record pods for endpoints or pytorch jobs.
      set-cluster-context  Hook up with a HyperPod EKS cluster.

    For extra info on CLI utilization and the out there instructions and respective parameters, consult with the CLI reference documentation.

    Set the cluster context

    The SageMaker HyperPod CLI and SDK use the Kubernetes API to work together with the cluster. Subsequently, ensure that the underlying Kubernetes Python consumer is configured to execute API calls in opposition to your cluster by setting the cluster context.

    Use the CLI to listing the clusters out there in your AWS account:

    # Record all HyperPod clusters in your AWS account
    hyp list-cluster
    [
        {
            "Cluster": "ml-cluster",
            "Instances": [
                {
                    "InstanceType": "ml.g5.8xlarge",
                    "TotalNodes": 8,
                    "AcceleratorDevicesAvailable": 8,
                    "NodeHealthStatus=Schedulable": 8,
                    "DeepHealthCheckStatus=Passed": "N/A"
                },
                {
                    "InstanceType": "ml.m5.12xlarge",
                    "TotalNodes": 1,
                    "AcceleratorDevicesAvailable": "N/A",
                    "NodeHealthStatus=Schedulable": 1,
                    "DeepHealthCheckStatus=Passed": "N/A"
                }
            ]
        }
    ]

    Set the cluster context specifying the cluster title as enter (in our case, we use ml-cluster as ):

    # Set the cluster context for subsequent instructions
    hyp set-cluster-context --cluster-name 

    Prepare fashions with the SageMaker HyperPod CLI and SDK

    The SageMaker HyperPod CLI offers a simple method to submit PyTorch mannequin coaching and fine-tuning jobs to a SageMaker HyperPod cluster. Within the following instance, we schedule a Meta Llama 3.1 8B mannequin coaching job with FSDP.

    The CLI executes coaching utilizing the HyperPodPyTorchJob Kubernetes {custom} useful resource, which is applied by the HyperPod coaching operator, that must be put in within the cluster as mentioned within the conditions part.

    First, clone the awsome-distributed-training repository and create the Docker picture that you’ll use for the coaching job:

    cd ~
    git clone https://github.com/aws-samples/awsome-distributed-training/
    cd awsome-distributed-training/3.test_cases/pytorch/FSDP

    Then, log in to the Amazon Elastic Container Registry (Amazon ECR) to drag the bottom picture and construct the brand new container:

    export AWS_REGION=$(aws ec2 describe-availability-zones --output textual content --query 'AvailabilityZones[0].[RegionName]')
    export ACCOUNT=$(aws sts get-caller-identity --query Account --output textual content)
    export REGISTRY=${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/
    docker construct -f Dockerfile -t ${REGISTRY}fsdp:pytorch2.7.1 .

    The Dockerfile within the awsome-distributed-training repository referenced within the previous code already incorporates the HyperPod elastic agent, which orchestrates lifecycles of coaching employees on every container and communicates with the HyperPod coaching operator. For those who’re utilizing a special Dockerfile, set up the HyperPod elastic agent following the directions in HyperPod elastic agent.

    Subsequent, create a brand new registry on your coaching picture if wanted and push the constructed picture to it:

    # Create registry if wanted
    REGISTRY_COUNT=$(aws ecr describe-repositories | grep "fsdp" | wc -l)
    if [ "$REGISTRY_COUNT" -eq 0 ]; then
        aws ecr create-repository --repository-name fsdp
    fi
    
    # Login to registry
    echo "Logging in to $REGISTRY ..."
    aws ecr get-login-password | docker login --username AWS --password-stdin $REGISTRY
    
    # Push picture to registry
    docker picture push ${REGISTRY}fsdp:pytorch2.7.1

    After you will have efficiently created the Docker picture, you may submit the coaching job utilizing the SageMaker HyperPod CLI.

    Internally, the SageMaker HyperPod CLI will use the Kubernetes Python consumer to construct a HyperPodPyTorchJob {custom} useful resource after which create it on the Kubernetes the cluster.

    You’ll be able to modify the CLI command for different Meta Llama configurations by exchanging the --args to the specified arguments and values; examples might be discovered within the Kubernetes manifests within the awsome-distributed-training repository.

    Within the given configuration, the coaching job will write checkpoints to /fsx/checkpoints on the FSx for Lustre PVC.

    hyp create hyp-pytorch-job 
        --job-name fsdp-llama3-1-8b 
        --image ${REGISTRY}fsdp:pytorch2.7.1 
        --command '[
            hyperpodrun,
            --tee=3,
            --log_dir=/tmp/hyperpod,
            --nproc_per_node=1,
            --nnodes=8,
            /fsdp/train.py
        ]' 
        --args '[
            --max_context_width=8192,
            --num_key_value_heads=8,
            --intermediate_size=14336,
            --hidden_width=4096,
            --num_layers=32,
            --num_heads=32,
            --model_type=llama_v3,
            --tokenizer=hf-internal-testing/llama-tokenizer,
            --checkpoint_freq=50,
            --validation_freq=25,
            --max_steps=50,
            --checkpoint_dir=/fsx/checkpoints,
            --dataset=allenai/c4,
            --dataset_config_name=en,
            --resume_from_checkpoint=/fsx/checkpoints,
            --train_batch_size=1,
            --val_batch_size=1,
            --sharding_strategy=full,
            --offload_activations=1
        ]' 
        --environment '{"PYTORCH_CUDA_ALLOC_CONF": "max_split_size_mb:32"}' 
        --pull-policy "IfNotPresent" 
        --instance-type ml.g5.8xlarge 
        --node-count 8 
        --tasks-per-node 1 
        --deep-health-check-passed-nodes-only false 
        --max-retry 3 
        --volume title=shmem,sort=hostPath,mount_path=/dev/shm,path=/dev/shm,read_only=false 
        --volume title=fsx,sort=pvc,mount_path=/fsx,claim_name=fsx-claim,read_only=false

    The hyp create hyp-pytorch-job command helps further arguments, which might be found by operating the next:

    hyp create hyp-pytorch-job --help

    The previous instance code incorporates the next related arguments:

    • --command and --args provide flexibility in setting the command to be executed within the container. The command executed is hyperpodrun, applied by the HyperPod elastic agent that’s put in within the coaching container. The HyperPod elastic agent extends PyTorch’s ElasticAgent and manages the communication of the assorted employees with the HyperPod coaching operator. For extra info, consult with HyperPod elastic agent.
    • --environment defines atmosphere variables and customizes the coaching execution.
    • --max-retry signifies the utmost variety of restarts on the course of degree that can be tried by the HyperPod coaching operator. For extra info, consult with Utilizing the coaching operator to run jobs.
    • --volume is used to map persistent or ephemeral volumes to the container.

    If profitable, the command will output the next:

    Utilizing model: 1.0
    2025-08-12 10:03:03,270 - sagemaker.hyperpod.coaching.hyperpod_pytorch_job - INFO - Efficiently submitted HyperPodPytorchJob 'fsdp-llama3-1-8b'!

    You’ll be able to observe the standing of the coaching job by the CLI. Working hyp listing hyp-pytorch-job will present the standing first as Created after which as Working after the containers have been began:

    NAME                          NAMESPACE           STATUS         AGE            
    --------------------------------------------------------------------------------
    fsdp-llama3-1-8b              default             Working        6m        

    To listing the pods which might be created by this coaching job, run the next command:

    hyp list-pods hyp-pytorch-job --job-name fsdp-llama3-1-8b
    Pods for job: fsdp-llama3-1-8b
    
    POD NAME                                          NAMESPACE           
    ----------------------------------------------------------------------
    fsdp-llama3-1-8b-pod-0                            default             
    fsdp-llama3-1-8b-pod-1                            default             
    fsdp-llama3-1-8b-pod-2                            default         
    fsdp-llama3-1-8b-pod-3                            default         
    fsdp-llama3-1-8b-pod-4                            default         
    fsdp-llama3-1-8b-pod-5                            default         
    fsdp-llama3-1-8b-pod-6                            default        
    fsdp-llama3-1-8b-pod-7                            default          

    You’ll be able to observe the logs of one of many coaching pods that get spawned by operating the next command:

    hyp get-logs hyp-pytorch-job --pod-name fsdp-llama3-1-8b-pod-0 
    --job-name fsdp-llama3-1-8b
    ...
    2025-08-12T14:59:25.069208138Z [HyperPodElasticAgent] 2025-08-12 14:59:25,069 [INFO] [rank0-restart0] /usr/native/lib/python3.10/dist-packages/torch/distributed/elastic/agent/server/api.py:685: [default] Beginning employee group 
    2025-08-12T14:59:25.069301320Z [HyperPodElasticAgent] 2025-08-12 14:59:25,069 [INFO] [rank0-restart0] /usr/native/lib/python3.10/dist-packages/hyperpod_elastic_agent/hyperpod_elastic_agent.py:221: Beginning employees with employee spec worker_group.spec=WorkerSpec(position="default", local_world_size=1, rdzv_handler=, fn=None, entrypoint="/usr/bin/python3", args=('-u', '/fsdp/prepare.py', '--max_context_width=8192', '--num_key_value_heads=8', '--intermediate_size=14336', '--hidden_width=4096', '--num_layers=32', '--num_heads=32', '--model_type=llama_v3', '--tokenizer=hf-internal-testing/llama-tokenizer', '--checkpoint_freq=50', '--validation_freq=50', '--max_steps=100', '--checkpoint_dir=/fsx/checkpoints', '--dataset=allenai/c4', '--dataset_config_name=en', '--resume_from_checkpoint=/fsx/checkpoints', '--train_batch_size=1', '--val_batch_size=1', '--sharding_strategy=full', '--offload_activations=1'), max_restarts=3, monitor_interval=0.1, master_port=None, master_addr=None, local_addr=None)... 
    2025-08-12T14:59:30.264195963Z [default0]:2025-08-12 14:59:29,968 [INFO] **foremost**: Creating Mannequin 
    2025-08-12T15:00:51.203541576Z [default0]:2025-08-12 15:00:50,781 [INFO] **foremost**: Created mannequin with complete parameters: 7392727040 (7.39 B) 
    2025-08-12T15:01:18.139531830Z [default0]:2025-08-12 15:01:18 I [checkpoint.py:79] Loading checkpoint from /fsx/checkpoints/llama_v3-24steps ... 
    2025-08-12T15:01:18.833252603Z [default0]:2025-08-12 15:01:18,081 [INFO] **foremost**: Wrapped mannequin with FSDP 
    2025-08-12T15:01:18.833290793Z [default0]:2025-08-12 15:01:18,093 [INFO] **foremost**: Created optimizer

    We elaborate on extra superior debugging and observability options on the finish of this part.

    Alternatively, should you favor a programmatic expertise and extra superior customization choices, you may submit the coaching job utilizing the SageMaker HyperPod Python SDK. For extra info, consult with the SDK reference documentation. The next code will yield the equal coaching job submission to the previous CLI instance:

    import os
    from sagemaker.hyperpod.coaching import HyperPodPytorchJob
    from sagemaker.hyperpod.coaching import ReplicaSpec, Template, VolumeMounts, Spec, Containers, Sources, RunPolicy, Volumes, HostPath, PersistentVolumeClaim
    from sagemaker.hyperpod.frequent.config import Metadata
    
    REGISTRY = os.environ['REGISTRY']
    
    # Outline job specs
    nproc_per_node = "1"  # Variety of processes per node
    replica_specs = [
        ReplicaSpec(
            name = "pod",  # Replica name
            replicas = 8,
            template = Template(
                spec = Spec(
                    containers =
                    [
                        Containers(
                            # Container name
                            name="fsdp-training-container",  
                            
                            # Training image
                            image=f"{REGISTRY}fsdp:pytorch2.7.1",  
                            # Volume mounts
                            volume_mounts=[
                                VolumeMounts(
                                    name="fsx",
                                    mount_path="/fsx"
                                ),
                                VolumeMounts(
                                    name="shmem", 
                                    mount_path="/dev/shm"
                                )
                            ],
                            env=[
                                    {"name": "PYTORCH_CUDA_ALLOC_CONF", "value": "max_split_size_mb:32"},
                                ],
                            
                            # Picture pull coverage
                            image_pull_policy="IfNotPresent",
                            assets=Sources(
                                requests={"nvidia.com/gpu": "1"},  
                                limits={"nvidia.com/gpu": "1"},   
                            ),
                            # Command to run
                            command=[
                                "hyperpodrun",
                                "--tee=3",
                                "--log_dir=/tmp/hyperpod",
                                "--nproc_per_node=1",
                                "--nnodes=8",
                                "/fsdp/train.py"
                            ],  
                            # Script arguments
                            args = [
                                '--max_context_width=8192',
                                '--num_key_value_heads=8',
                                '--intermediate_size=14336',
                                '--hidden_width=4096',
                                '--num_layers=32',
                                '--num_heads=32',
                                '--model_type=llama_v3',
                                '--tokenizer=hf-internal-testing/llama-tokenizer',
                                '--checkpoint_freq=2',
                                '--validation_freq=25',
                                '--max_steps=50',
                                '--checkpoint_dir=/fsx/checkpoints',
                                '--dataset=allenai/c4',
                                '--dataset_config_name=en',
                                '--resume_from_checkpoint=/fsx/checkpoints',
                                '--train_batch_size=1',
                                '--val_batch_size=1',
                                '--sharding_strategy=full',
                                '--offload_activations=1'
                            ]
                        )
                    ],
                    volumes = [
                        Volumes(
                            name="fsx",
                            persistent_volume_claim=PersistentVolumeClaim(
                                claim_name="fsx-claim",
                                read_only=False
                            ),
                        ),
                        Volumes(
                            name="shmem",
                            host_path=HostPath(path="/dev/shm"),
                        )
                    ],
                    node_selector={
                        "node.kubernetes.io/instance-type": "ml.g5.8xlarge",
                    },
                )
            ),
        )
    ]
    run_policy = RunPolicy(clean_pod_policy="None", job_max_retry_count=3)  
    # Create and begin the PyTorch job
    pytorch_job = HyperPodPytorchJob(
        # Job title
        metadata = Metadata(
            title="fsdp-llama3-1-8b",     
            namespace="default",
        ),
        # Processes per node
        nproc_per_node = nproc_per_node,   
        # Reproduction specs
        replica_specs = replica_specs,        
    )
    # Launch the job
    pytorch_job.create()  

    Debugging coaching jobs

    Along with monitoring the coaching pod logs as described earlier, there are a number of different helpful methods of debugging coaching jobs:

    • You’ll be able to submit coaching jobs with an extra --debug True flag, which can print the Kubernetes YAML to the console when the job begins so it may be inspected by customers.
    • You’ll be able to view an inventory of present coaching jobs by operating hyp listing hyp-pytorch-job.
    • You’ll be able to view the standing and corresponding occasions of the job by operating hyp describe hyp-pytorch-job —job-name fsdp-llama3-1-8b.
    • If the HyperPod observability stack is deployed to the cluster, run hyp get-monitoring --grafana and hyp get-monitoring --prometheus to get the Grafana dashboard and Prometheus workspace URLs, respectively, to view cluster and job metrics.
    • To observe GPU utilization or view listing contents, it may be helpful to execute instructions or open an interactive shell into the pods. You’ll be able to run instructions in a pod by operating, for instance, kubectl exec -it-- nvtop to run nvtop for visibility into GPU utilization. You’ll be able to open an interactive shell by operating kubectl exec -it-- /bin/bash.
    • The logs of the HyperPod coaching operator controller pod can have helpful details about scheduling. To view them, run kubectl get pods -n aws-hyperpod | grep hp-training-controller-manager to seek out the controller pod title and run kubectl logs -n aws-hyperpod to view the corresponding logs.

    Deploy fashions with the SageMaker HyperPod CLI and SDK

    The SageMaker HyperPod CLI offers instructions to rapidly deploy fashions to your SageMaker HyperPod cluster for inference. You’ll be able to deploy each basis fashions (FMs) out there on Amazon SageMaker JumpStart in addition to {custom} fashions with artifacts which might be saved on Amazon S3 or FSx for Lustre file methods.

    This performance will robotically deploy the chosen mannequin to the SageMaker HyperPod cluster by Kubernetes {custom} assets, that are applied by the HyperPod inference operator, that must be put in within the cluster as mentioned within the conditions part. It’s optionally potential to robotically create a SageMaker inference endpoint in addition to an Utility Load Balancer (ALB), which can be utilized straight utilizing HTTPS calls with a generated TLS certificates to invoke the mannequin.

    Deploy SageMaker JumpStart fashions

    You’ll be able to deploy an FM that’s out there on SageMaker JumpStart with the next command:

    hyp create hyp-jumpstart-endpoint 
      --model-id deepseek-llm-r1-distill-qwen-1-5b 
      --instance-type ml.g5.8xlarge 
      --endpoint-name 
      --tls-certificate-output-s3-uri s3:/// 
      --namespace default

    The previous code consists of the next parameters:

    • --model-id is the mannequin ID within the SageMaker JumpStart mannequin hub. On this instance, we deploy a DeepSeek R1-distilled model of Qwen 1.5B, which is accessible on SageMaker JumpStart.
    • --instance-type is the goal occasion sort in your SageMaker HyperPod cluster the place you wish to deploy the mannequin. This occasion sort should be supported by the chosen mannequin.
    • --endpoint-name is the title that the SageMaker inference endpoint can have. This title should be distinctive. SageMaker inference endpoint creation is optionally available.
    • --tls-certificate-output-s3-uri is the S3 bucket location the place the TLS certificates for the ALB can be saved. This can be utilized to straight invoke the mannequin by HTTPS. You need to use S3 buckets which might be accessible by the HyperPod inference operator IAM position.
    • --namespace is the Kubernetes namespace the mannequin can be deployed to. The default worth is ready to default.

    The CLI helps extra superior deployment configurations, together with auto scaling, by further parameters, which might be considered by operating the next command:

    hyp create hyp-jumpstart-endpoint --help

    If profitable, the command will output the next:

    Creating JumpStart mannequin and sagemaker endpoint. Endpoint title: deepseek-distill-qwen-endpoint-cli.
     The method could take a couple of minutes...

    After a couple of minutes, each the ALB and the SageMaker inference endpoint can be out there, which might be noticed by the CLI. Working hyp listing hyp-jumpstart-endpoint will present the standing first as DeploymentInProgress after which as DeploymentComplete when the endpoint is prepared for use:

    | title                               | namespace   | labels   | standing             |
    |------------------------------------|-------------|----------|--------------------|
    | deepseek-distill-qwen-endpoint-cli | default     |          | DeploymentComplete |

    To get further visibility into the deployment pod, run the next instructions to seek out the pod title and look at the corresponding logs:

    hyp list-pods hyp-jumpstart-endpoint --namespace 
    hyp get-logs hyp-jumpstart-endpoint --namespace  --pod-name 

    The output will look much like the next:

    2025-08-12T15:53:14.042031963Z WARN  PyProcess W-195-model-stderr: Capturing CUDA graph shapes: 100%|??????????| 35/35 [00:18<00:00,  1.63it/s]
    2025-08-12T15:53:14.042257357Z WARN  PyProcess W-195-model-stderr: Capturing CUDA graph shapes: 100%|??????????| 35/35 [00:18<00:00,  1.94it/s]
    2025-08-12T15:53:14.042297298Z INFO  PyProcess W-195-model-stdout: INFO 08-12 15:53:14 llm_engine.py:436] init engine (profile, create kv cache, warmup mannequin) took 26.18 seconds
    2025-08-12T15:53:15.215357997Z INFO  PyProcess Mannequin [model] initialized.
    2025-08-12T15:53:15.219205375Z INFO  WorkerThread Beginning employee thread WT-0001 for mannequin mannequin (M-0001, READY) on gadget gpu(0)
    2025-08-12T15:53:15.221591827Z INFO  ModelServer Initialize BOTH server with: EpollServerSocketChannel.
    2025-08-12T15:53:15.231404670Z INFO  ModelServer BOTH API bind to: http://0.0.0.0:8080

    You’ll be able to invoke the SageMaker inference endpoint you created by the CLI by operating the next command:

    hyp invoke hyp-jumpstart-endpoint 
        --endpoint-name deepseek-distill-qwen-endpoint-cli        
        --body '{"inputs":"What's the capital of USA?"}'

    You’re going to get an output much like the next:

    {"generated_text": " What's the capital of France? What's the capital of Japan? What's the capital of China? What's the capital of Germany? What's"}

    Alternatively, should you favor a programmatic expertise and superior customization choices, you should utilize the SageMaker HyperPod Python SDK. The next code will yield the equal deployment to the previous CLI instance:

    from sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config import Mannequin, Server, SageMakerEndpoint, TlsConfig
    from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
    
    mannequin=Mannequin(
        model_id='deepseek-llm-r1-distill-qwen-1-5b',
    )
    
    server=Server(
        instance_type="ml.g5.8xlarge",
    )
    
    endpoint_name=SageMakerEndpoint(title="deepseek-distill-qwen-endpoint-cli")
    
    tls_config=TlsConfig(tls_certificate_output_s3_uri='s3://')
    
    js_endpoint=HPJumpStartEndpoint(
        mannequin=mannequin,
        server=server,
        sage_maker_endpoint=endpoint_name,
        tls_config=tls_config,
        namespace="default"
    )
    
    js_endpoint.create() 

    Deploy {custom} fashions

    You too can use the CLI to deploy {custom} fashions with mannequin artifacts saved on both Amazon S3 or FSx for Lustre. That is helpful for fashions which were fine-tuned on {custom} information. You need to present the storage location of the mannequin artifacts in addition to a container picture for inference that’s appropriate with the mannequin artifacts and SageMaker inference endpoints. Within the following instance, we deploy a TinyLlama 1.1B mannequin from Amazon S3 utilizing the DJL Massive Mannequin Inference container picture.

    In preparation, obtain the mannequin artifacts domestically and push them to an S3 bucket:

    # Set up huggingface-hub if not current in your machine
    pip set up huggingface-hub
    
    # Obtain mannequin
    hf obtain TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama-1.1b-chat
    
    # Add to S3
    aws s3 cp ./tinyllama s3:///fashions/tinyllama-1.1b-chat/ --recursive

    Now you may deploy the mannequin with the next command:

    hyp create hyp-custom-endpoint 
        --endpoint-name my-custom-tinyllama-endpoint 
        --model-name tinyllama 
        --model-source-type s3 
        --model-location fashions/tinyllama-1.1b-chat/ 
        --s3-bucket-name  
        --s3-region  
        --instance-type ml.g5.8xlarge 
        --image-uri 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.33.0-lmi15.0.0-cu128 
        --container-port 8080 
        --model-volume-mount-name modelmount 
        --tls-certificate-output-s3-uri s3:/// 
        --namespace default

    The previous code incorporates the next key parameters:

    • --model-name is the title of the mannequin that can be created in SageMaker
    • --model-source-type specifies both fsx or s3 for the situation of the mannequin artifacts
    • --model-location specifies the prefix or folder the place the mannequin artifacts are positioned
    • --s3-bucket-name and —s3-region specify the S3 bucket title and AWS Area, respectively
    • --instance-type, --endpoint-name, --namespace, and --tls-certificate behave the identical as for the deployment of SageMaker JumpStart fashions

    Much like SageMaker JumpStart mannequin deployment, the CLI helps extra superior deployment configurations, together with auto scaling, by further parameters, which you’ll view by operating the next command:

    hyp create hyp-custom-endpoint --help

    If profitable, the command will output the next:

    Creating sagemaker mannequin and endpoint. Endpoint title: my-custom-tinyllama-endpoint.
     The method could take a couple of minutes...

    After a couple of minutes, each the ALB and the SageMaker inference endpoint can be out there, which you’ll observe by the CLI. Working hyp listing hyp-custom-endpoint will present the standing first as DeploymentInProgress and as DeploymentComplete when the endpoint is prepared for use:

    | title                         | namespace   | labels   | standing               |
    |------------------------------|-------------|----------|----------------------|
    | my-custom-tinyllama-endpoint | default     |          | DeploymentComplete   |

    To get further visibility into the deployment pod, run the next instructions to seek out the pod title and look at the corresponding logs:

    hyp list-pods hyp-custom-endpoint --namespace 
    hyp get-logs hyp-custom-endpoint --namespace  --pod-name 

    The output will look much like the next:

    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:36 [monitor.py:33] torch.compile takes 29.18 s in complete                                                          │
    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:37 [kv_cache_utils.py:634] GPU KV cache dimension: 809,792 tokens                                                     │
    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:37 [kv_cache_utils.py:637] Most concurrency for two,048 tokens per request: 395.41x                             │
    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [gpu_model_runner.py:1626] Graph capturing completed in 22 secs, took 0.37 GiB                                 │
    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [core.py:163] init engine (profile, create kv cache, warmup mannequin) took 59.39 seconds                         │
    │ INFO  PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [core_client.py:435] Core engine course of 0 prepared.                                                             │
    │ INFO  PyProcess Mannequin [model] initialized.                                                                                                                            │
    │ INFO  WorkerThread Beginning employee thread WT-0001 for mannequin mannequin (M-0001, READY) on gadget gpu(0)                                                                    │
    │ INFO  ModelServer Initialize BOTH server with: EpollServerSocketChannel.                                                                                              │
    │ INFO  ModelServer BOTH API bind to: http://0.0.0.0:8080 

    You’ll be able to invoke the SageMaker inference endpoint you created by the CLI by operating the next command:

    hyp invoke hyp-custom-endpoint 
        --endpoint-name my-custom-tinyllama-endpoint        
        --body '{"inputs":"What's the capital of USA?"}'

    You’re going to get an output much like the next:

    {"generated_text": " What's the capital of France? What's the capital of Japan? What's the capital of China? What's the capital of Germany? What's"}

    Alternatively, you may deploy utilizing the SageMaker HyperPod Python SDK. The next code will yield the equal deployment to the previous CLI instance:

    from sagemaker.hyperpod.inference.config.hp_endpoint_config import S3Storage, ModelSourceConfig, TlsConfig, EnvironmentVariables, ModelInvocationPort, ModelVolumeMount, Sources, Employee
    from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
    
    model_source_config = ModelSourceConfig(
        model_source_type="s3",
        model_location="fashions/tinyllama-1.1b-chat/",
        s3_storage=S3Storage(
            bucket_name="",
            area='',
        ),
    )
    
    employee = Employee(
        picture="763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.33.0-lmi15.0.0-cu128",
        model_volume_mount=ModelVolumeMount(
            title="modelmount",
        ),
        model_invocation_port=ModelInvocationPort(container_port=8080),
        assets=Sources(
                requests={"cpu": "30000m", "nvidia.com/gpu": 1, "reminiscence": "100Gi"},
                limits={"nvidia.com/gpu": 1}
        ),
    )
    
    tls_config = TlsConfig(tls_certificate_output_s3_uri='s3:///')
    
    custom_endpoint = HPEndpoint(
        endpoint_name="my-custom-tinyllama-endpoint",
        instance_type="ml.g5.8xlarge",
        model_name="tinyllama",  
        tls_config=tls_config,
        model_source_config=model_source_config,
        employee=employee,
    )
    
    custom_endpoint.create()

    Debugging inference deployments

    Along with the monitoring of the inference pod logs, there are a number of different helpful methods of debugging inference deployments:

    • You’ll be able to entry the HyperPod inference operator controller logs by the SageMaker HyperPod CLI. Run hyp get-operator-logs—since-hours 0.5 to entry the operator logs for {custom} and SageMaker JumpStart deployments, respectively.
    • You’ll be able to view an inventory of inference deployments by operating hyp listing.
    • You’ll be able to view the standing and corresponding occasions of deployments by operating hyp describe--name to view the standing and occasions for {custom} and SageMaker JumpStart deployments, respectively.
    • If the HyperPod observability stack is deployed to the cluster, run hyp get-monitoring --grafana and hyp get-monitoring --prometheus to get the Grafana dashboard and Prometheus workspace URLs, respectively, to view inference metrics as nicely.
    • To observe GPU utilization or view listing contents, it may be helpful to execute instructions or open an interactive shell into the pods. You’ll be able to run instructions in a pod by operating, for instance, kubectl exec -it-- nvtop to run nvtop for visibility into GPU utilization. You’ll be able to open an interactive shell by operating kubectl exec -it-- /bin/bash.

    For extra info on the inference deployment options in SageMaker HyperPod, see Amazon SageMaker HyperPod launches mannequin deployments to speed up the generative AI mannequin growth lifecycle and Deploying fashions on Amazon SageMaker HyperPod.

    Clear up

    To delete the coaching job from the corresponding instance, use the next CLI command:

    hyp delete hyp-pytorch-job --job-name fsdp-llama3-1-8b

    To delete the mannequin deployments from the inference instance, use the next CLI instructions for SageMaker JumpStart and {custom} mannequin deployments, respectively:

    hyp delete hyp-jumpstart-endpoint --name deepseek-distill-qwen-endpoint-cli
    hyp delete hyp-custom-endpoint --name my-custom-tinyllama-endpoint

    To keep away from incurring ongoing prices for the cases operating in your cluster, you may scale down the cases or delete cases.

    Conclusion

    The brand new SageMaker HyperPod CLI and SDK can considerably streamline the method of coaching and deploying large-scale AI fashions. By means of the examples on this publish, we’ve demonstrated how these instruments present the next advantages:

    • Simplified workflows – The CLI provides simple instructions for frequent duties like distributed coaching and mannequin deployment, making highly effective capabilities of SageMaker HyperPod accessible to information scientists with out requiring deep infrastructure information.
    • Versatile growth choices – Though the CLI handles frequent eventualities, the SDK allows fine-grained management and customization for extra advanced necessities, so builders can programmatically configure each facet of their distributed ML workloads.
    • Complete observability – Each interfaces present strong monitoring and debugging capabilities by system logs and integration with the SageMaker HyperPod observability stack, serving to rapidly determine and resolve points throughout growth.
    • Manufacturing-ready deployment – The instruments help end-to-end workflows from experimentation to manufacturing, together with options like automated TLS certificates technology for safe mannequin endpoints and integration with SageMaker inference endpoints.

    Getting began with these instruments is so simple as putting in the sagemaker-hyperpod bundle. The SageMaker HyperPod CLI and SDK present the appropriate degree of abstraction for each information scientists trying to rapidly experiment with distributed coaching and ML engineers constructing manufacturing methods.

    For extra details about SageMaker HyperPod and these growth instruments, consult with the SageMaker HyperPod CLI and SDK documentation or discover the instance notebooks.


    In regards to the authors

    Giuseppe Angelo Porcelli is a Principal Machine Studying Specialist Options Architect for Amazon Internet Providers. With a number of years of software program engineering and an ML background, he works with prospects of any dimension to know their enterprise and technical wants and design AI and ML options that make the perfect use of the AWS Cloud and the Amazon Machine Studying stack. He has labored on tasks in numerous domains, together with MLOps, pc imaginative and prescient, and NLP, involving a broad set of AWS companies. In his free time, Giuseppe enjoys taking part in soccer.

    Shweta Singh is a Senior Product Supervisor within the Amazon SageMaker Machine Studying platform group at AWS, main the SageMaker Python SDK. She has labored in a number of product roles in Amazon for over 5 years. She has a Bachelor of Science diploma in Pc Engineering and a Masters of Science in Monetary Engineering, each from New York College.

    Nicolas Jourdan is a Specialist Options Architect at AWS, the place he helps prospects unlock the total potential of AI and ML within the cloud. He holds a PhD in Engineering from TU Darmstadt in Germany, the place his analysis targeted on the reliability, idea drift detection, and MLOps of commercial ML purposes. Nicolas has in depth hands-on expertise throughout industries, together with autonomous driving, drones, and manufacturing, having labored in roles starting from analysis scientist to engineering supervisor. He has contributed to award-winning analysis, holds patents in object detection and anomaly detection, and is captivated with making use of cutting-edge AI to unravel advanced real-world issues.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026

    What OpenClaw Reveals In regards to the Subsequent Part of AI Brokers – O’Reilly

    March 14, 2026

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    By Declan MurphyMarch 14, 2026

    The Canadian telecoms large Telus is at present selecting up the items after a large…

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.