Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    March 14, 2026

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes
    Machine Learning & Research

    Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

    Oliver ChambersBy Oliver ChambersAugust 22, 2025No Comments18 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    This publish is the second a part of the GPT-OSS collection specializing in mannequin customization with Amazon SageMaker AI. In Half 1, we demonstrated fine-tuning GPT-OSS fashions utilizing open supply Hugging Face libraries with SageMaker coaching jobs, which helps distributed multi-GPU and multi-node configurations, so you’ll be able to spin up high-performance clusters on demand.

    On this publish, we present how one can fine-tune GPT OSS fashions on utilizing recipes on SageMaker HyperPod and Coaching Jobs. SageMaker HyperPod recipes aid you get began with coaching and fine-tuning standard publicly obtainable basis fashions (FMs) comparable to Meta’s Llama, Mistral, and DeepSeek in simply minutes, utilizing both SageMaker HyperPod or coaching jobs. The recipes present pre-built, validated configurations that alleviate the complexity of organising distributed coaching environments whereas sustaining enterprise-grade efficiency and scalability for fashions. We define steps to fine-tune the GPT-OSS mannequin on a multilingual reasoning dataset, HuggingFaceH4/Multilingual-Pondering, so GPT-OSS can deal with structured, chain-of-thought (CoT) reasoning throughout a number of languages.

    Answer overview

    This resolution makes use of SageMaker HyperPod recipes to run a fine-tuning job on HyperPod utilizing Amazon Elastic Kubernetes Service (Amazon EKS) orchestration or coaching jobs. Recipes are processed via the SageMaker HyperPod recipe launcher, which serves because the orchestration layer chargeable for launching a job on the corresponding structure comparable to SageMaker HyperPod (Slurm or Amazon EKS) or coaching jobs. To study extra, see SageMaker HyperPod recipes.

    For particulars on fine-tuning the GPT-OSS mannequin, see Nice-tune OpenAI GPT-OSS fashions on Amazon SageMaker AI utilizing Hugging Face libraries.

    Within the following sections, we talk about the stipulations for each choices, after which transfer on to the information preparation. The ready knowledge is saved to Amazon FSx for Lustre, which is used because the persistent file system for SageMaker HyperPod, or Amazon Easy Storage Service (Amazon S3) for coaching jobs. We then use recipes to submit the fine-tuning job, and eventually deploy the skilled mannequin to a SageMaker endpoint for testing and evaluating the mannequin. The next diagram illustrates this structure.

    Conditions

    To observe alongside, it’s essential to have the next stipulations:

    • An area growth setting with AWS credentials configured for creating and accessing SageMaker assets, or a distant setting comparable to Amazon SageMaker Studio.
    • For SageMaker HyperPod fine-tuning, full the next:
    • For fine-tuning the mannequin utilizing SageMaker coaching jobs, it’s essential to have one ml.p5.48xlarge occasion (with 8 x NVIDIA H100 GPUs) for coaching jobs utilization. In case you don’t have ample limits, request the next SageMaker quotas on the Service Quotas console: P5 occasion (ml.p5.48xlarge) for coaching jobs (ml.p5.48xlarge for cluster utilization): 1.

    It’d take as much as 24 hours for these limits to be accepted. You may also use SageMaker coaching plans to order these situations for a selected timeframe and use case (cluster or coaching jobs utilization). For extra particulars, see Reserve coaching plans in your coaching jobs or HyperPod clusters.

    Subsequent, use your most popular growth setting to arrange the dataset for fine-tuning. You’ll find the total code within the Generative AI utilizing Amazon SageMaker repository on GitHub.

    Information tokenization

    We use the Hugging FaceH4/Multilingual-Pondering dataset, which is a multilingual reasoning dataset containing CoT examples translated into languages comparable to French, Spanish, and German. The recipe helps a sequence size of 4,000 tokens for the GPT-OSS 120B mannequin. The next instance code demonstrates the right way to tokenize the multilingual-thinking dataset. The recipe accepts knowledge in Hugging Face format (arrow). After it’s tokenized, it can save you the processed dataset to disk.

    from datasets import load_dataset
     
    from transformers import AutoTokenizer
    import numpy as np
     
    dataset = load_dataset("HuggingFaceH4/Multilingual-Pondering", cut up="practice")
     
    tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
    messages = dataset[0]["messages"]
    dialog = tokenizer.apply_chat_template(messages, tokenize=False)
    print(dialog)
     
    def preprocess_function(instance):
        return tokenizer.apply_chat_template(instance['messages'], 
                                            return_dict=True, 
                                            padding="max_length", 
                                            max_length=4096, 
                                            truncation=True)
     
    def label(x):
        x["labels"]=np.array(x["input_ids"])
        x["labels"][x["labels"]==tokenizer.pad_token_id]=-100
        x["labels"]=x["labels"].tolist()
        return x
     
    dataset = dataset.map(preprocess_function, 
                          remove_columns=['reasoning_language', 
                                          'developer', 
                                          'user', 
                                          'analysis', 
                                          'final',
                                          'messages'])
    dataset = dataset.map(label)
    
    # for HyperPod, save to mounted FSx quantity
    dataset.save_to_disk("/fsx/multilingual_4096")
    
    # for coaching jobs, save to S3
    dataset.save_to_disk("multilingual_4096")
    
    def upload_directory(local_dir, bucket_name, s3_prefix=''):
        s3_client = boto3.shopper('s3')
        
        for root, dirs, recordsdata in os.stroll(local_dir):
            for file in recordsdata:
                local_path = os.path.be part of(root, file)
                # Calculate relative path for S3
                relative_path = os.path.relpath(local_path, local_dir)
                s3_path = os.path.be part of(s3_prefix, relative_path).change("", "/")
                
                print(f"Importing {local_path} to {s3_path}")
                s3_client.upload_file(local_path, bucket_name, s3_path)
    
    upload_directory('./multilingual_4096/', , 'multilingual_4096')

    Now that you’ve ready and tokenized the dataset, you’ll be able to fine-tune the GPT-OSS mannequin in your dataset, utilizing both SageMaker HyperPod or coaching jobs. SageMaker coaching jobs are perfect for one-off or periodic coaching workloads that want non permanent compute assets, making it a completely managed, on-demand expertise in your coaching wants. SageMaker HyperPod is perfect for steady growth and experimentation, offering a persistent, preconfigured, and failure-resilient cluster. Relying in your selection, skip to the suitable part for subsequent steps.

    Nice-tune the mannequin utilizing SageMaker HyperPod

    To fine-tune the mannequin utilizing HyperPod, begin by organising the digital setting and putting in the mandatory dependencies to execute the coaching job on the EKS cluster. Be sure that the cluster is InService earlier than continuing, and also you’re utilizing Python 3.9 or better in your growth setting.

    python3 -m venv ${PWD}/venv
    supply venv/bin/activate

    Subsequent, obtain and arrange the SageMaker HyperPod recipes repository:

    git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
    cd sagemaker-hyperpod-recipes
    pip3 set up -r necessities.txt 

    Now you can use the SageMaker HyperPod recipe launch scripts to submit your coaching job. Utilizing the recipe entails updating the k8s.yaml configuration file and executing the launch script.

    In recipes_collection/cluster/k8s.yaml, replace the persistent_volume_claims part. It mounts the FSx declare to the /fsx listing of every computing pod:

    - claimName: fsx-claim    
      mountPath: fsx

    SageMaker HyperPod recipes present a launch script for every recipe inside the launcher_scripts listing. To fine-tune the GPT-OSS-120B mannequin, replace the launch scripts positioned at launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh and replace the cluster_type parameter.

    The up to date launch script ought to look just like the next code when working SageMaker HyperPod with Amazon EKS. Ensure that cluster=k8s and cluster_type=k8s are up to date within the launch script:

    #!/bin/bash
    
    # Unique Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com
    
    #Customers ought to setup their cluster sort in /recipes_collection/config.yaml
    
    SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}
    
    HF_MODEL_NAME_OR_PATH="openai/gpt-oss-120b" # HuggingFace pretrained mannequin identify or path
    
    TRAIN_DIR="/fsx/multilingual_4096" # Location of coaching dataset
    VAL_DIR="/fsx/multilingual_4096" # Location of validation dataset
    
    EXP_DIR="/fsx/experiment" # Location to avoid wasting experiment data together with logging, checkpoints, ect
    HF_ACCESS_TOKEN="hf_xxxxxxxx" # Elective HuggingFace entry token
    
    HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/predominant.py" 
        recipes=fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora 
        container="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8" 
        base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/outcomes" 
        recipes.run.identify="hf-gpt-oss-120b-lora" 
    	cluster=k8s  # Imp: add cluster line when working on HP EKS
    	cluster_type=k8s  # Imp: add cluster_type line when working on HP EKS
        recipes.exp_manager.exp_dir="$EXP_DIR" 
        recipes.coach.num_nodes=1 
        recipes.mannequin.knowledge.train_dir="$TRAIN_DIR" 
        recipes.mannequin.knowledge.val_dir="$VAL_DIR" 
        recipes.mannequin.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 
        recipes.mannequin.hf_access_token="$HF_ACCESS_TOKEN" 

    When the script is prepared, you’ll be able to launch fine-tuning of the GPT OSS 120B mannequin utilizing the next code:

    chmod +x launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh 
    bash launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh

    After submitting a job for fine-tuning, you should use the next command to confirm profitable submission. You must be capable to see the pods working in your cluster:

    kubectl get pods
    NAME                                READY  STATUS   RESTARTS   AGE
    hf-gpt-oss-120b-lora-h2cwd-worker-0 1/1    Working  0          14m

    To verify logs for the job, you should use the kubectl logs command:

    kubectl logs -f hf-gpt-oss-120b-lora-h2cwd-worker-0

    You must be capable to see the next logs when the coaching begins and completes. You can see the checkpoints written to the /fsx/experiment/checkpoints folder.

    warnings.warn(
        
    Epoch 0:  40%|████      | 50/125 [08:47<13:10,  0.09it/s, Loss/train=0.254, Norms/grad_norm=0.128, LR/learning_rate=2.2e-6] [NeMo I 2025-08-18 17:49:48 nemo_logging:381] save SageMakerCheckpointType.PEFT_FULL checkpoint: /fsx/experiment/checkpoints/peft_full/steps_50
    [NeMo I 2025-08-18 17:49:48 nemo_logging:381] Saving PEFT checkpoint to /fsx/experiment/checkpoints/peft_full/steps_50
    [NeMo I 2025-08-18 17:49:49 nemo_logging:381] Loading Base mannequin from : openai/gpt-oss-120b
    You are trying to make use of Flash Consideration 2 with out specifying a torch dtype. This may result in sudden behaviour
    Loading checkpoint shards: 100%|██████████| 15/15 [01:49<00:00,  7.33s/it]
    [NeMo I 2025-08-18 17:51:39 nemo_logging:381] Merging the adapter, this may take some time......
    Unloading and merging mannequin: 100%|██████████| 547/547 [00:07<00:00, 71.27it/s]
    [NeMo I 2025-08-18 17:51:47 nemo_logging:381] Checkpointing to /fsx/experiment/checkpoints/peft_full/steps_50/final-model......
    [NeMo I 2025-08-18 18:00:14 nemo_logging:381] Efficiently save the merged mannequin checkpoint.
    `Coach.match` stopped: `max_steps=50` reached.
    Epoch 0:  40%|████      | 50/125 [23:09<34:43,  0.04it/s, Loss/train=0.264, Norms/grad_norm=0.137, LR/learning_rate=2e-6]  

    When the coaching is full, the ultimate merged mannequin will be discovered within the experiment listing path you outlined within the launcher script below /fsx/experiment/checkpoints/peft_full/steps_50/final-model.

    Nice-tune utilizing SageMaker coaching jobs

    You may also use recipes straight with SageMaker coaching jobs utilizing the SageMaker Python SDK. The coaching jobs routinely spin up the compute, load the enter knowledge, run the coaching script, save the mannequin to your output location, and tear down the situations, for a clean coaching expertise.

    The next code snippet exhibits the right way to use recipes with the PyTorch estimator. You should utilize the training_recipe parameter to specify the coaching or fine-tuning recipe for use, and recipe_overrides for any parameters that want alternative. For coaching jobs, replace the enter, output, and outcomes directories to places in /choose/ml as required by SageMaker coaching jobs.

    import os
    import sagemaker,boto3
    from sagemaker.pytorch import PyTorch
    from sagemaker.inputs import FileSystemInput
    
    sagemaker_session = sagemaker.Session()
    function = sagemaker.get_execution_role()
    bucket = sagemaker_session.default_bucket()
    output = os.path.be part of(f"s3://{bucket}", "output")
    
    recipe_overrides = {
        "run": {
            "results_dir": "/choose/ml/mannequin",
        },
        "exp_manager": {
            "exp_dir": "",
            "explicit_log_dir": "/choose/ml/output/tensorboard",
            "checkpoint_dir": "/choose/ml/checkpoints",
        },
        "mannequin": {
            "knowledge": {
                "train_dir": "/choose/ml/enter/knowledge/practice",
                "val_dir": "/choose/ml/enter/knowledge/val",
            },
        },
        "use_smp_model": "False",
    }
    
    
    # create the estimator object
    estimator = PyTorch(
      output_path=output,
      base_job_name=f"gpt-oss-recipe",
      function=function,
      instance_type="ml.p5.48xlarge",
      training_recipe="fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora",
      recipe_overrides=recipe_overrides,
      sagemaker_session=sagemaker_session,
      image_uri="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8",
    )
    
    # submit the coaching job
    estimator.match(
    inputs={
    "practice": f"s3://{bucket}/datasets/multilingual_4096/", 
    "val": f"s3://{bucket}/datasets/multilingual_4096/"}, wait=True)

    After the job is submitted, you’ll be able to monitor the standing of your coaching job on the SageMaker console, by selecting Coaching jobs below Coaching within the navigation pane. Select the coaching job that begins with gpt-oss-recipe to view its particulars and logs. When the coaching job is full, the outputs will likely be saved to an S3 location. You may get the situation of the output artifacts from the S3 mannequin artifact part on the job particulars web page.

    Run inference

    After you fine-tune your GPT-OSS mannequin with SageMaker recipes on both SageMaker coaching jobs or SageMaker HyperPod, the output is a personalized mannequin artifact that merges the bottom mannequin with the personalized PEFT adapters. This remaining mannequin is saved in Amazon S3 and will be deployed straight from Amazon S3 to SageMaker endpoints for real-time inference.

    To serve GPT-OSS fashions, it’s essential to have the newest vLLM containers (v0.10.1 or later). A full record of vllm-openai Docker picture variations is obtainable on Docker hub.

    The steps to deploy your fine-tuned GPT-OSS mannequin are outlined on this part.

    Construct the newest GPT-OSS container in your SageMaker endpoint

    In case you’re deploying the mannequin from SageMaker Studio utilizing JupyterLab or the Code Editor, each environments include Docker preinstalled. Just be sure you’re utilizing the SageMaker Distribution picture v3.0 or later for compatibility.You possibly can construct your deployment container by working the next instructions:

    %%bash # <- use this in the event you're working this inside JupterLab cell
    
    # navigate to deploy dir from the present workdir, to construct container
    cd ./deploy 
    
    # construct a push container
    chmod +X construct.sh
    bash construct.sh
    
    cd .. 

    In case you’re working these instructions from an area terminal or different setting, merely omit the %%bash line and run the instructions as normal shell instructions.

    The construct.sh script is chargeable for routinely constructing and pushing a vllm-openai container that’s optimized for SageMaker endpoints. After it’s constructed, the customized SageMaker endpoint appropriate vllm picture is pushed to Amazon Elastic Container Registry (Amazon ECR). SageMaker endpoints can then pull this picture from Amazon ECR at runtime to spin up the container for inference.

    The next is an instance of the construct.sh script:

    export REGION={area}
    export ACCOUNT_ID={account_id}
    export REPOSITORY_NAME=vllm
    export TAG=v0.10.1
    
    full_name="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPOSITORY_NAME}:${TAG}"
    
    echo "constructing $full_name"
    
    DOCKER_BUILDKIT=0 docker construct . --network sagemaker --tag $full_name --file Dockerfile
    
    aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
    
    # If the repository would not exist in ECR, create it.
    aws ecr describe-repositories --region ${REGION} --repository-names "${REPOSITIRY_NAME}" > /dev/null 2>&1
    
    if [ $? -ne 0 ]
    then
        aws ecr create-repository --region ${REGION} --repository-name "${REPOSITORY_NAME}" > /dev/null
    fi
    
    docker tag $REPOSITORY_NAME:$TAG ${full_name}
    docker push ${full_name}

    The Dockerfile defines how we convert an open supply vLLM Docker picture right into a SageMaker hosting-compatible picture. This entails extending the bottom vllm-openai picture, including the serve entrypoint script, and making it executable. See the next instance Dockerfile:

    FROM vllm/vllm-openai:v0.10.1
    
    COPY serve /usr/bin/serve
    RUN chmod 777 /usr/bin/serve
    
    ENTRYPOINT [ "/usr/bin/serve" ]

    The serve script acts as a translation layer between SageMaker internet hosting conventions and the vLLM runtime. You possibly can keep the identical deployment workflow you’re accustomed to when internet hosting fashions on SageMaker endpoints, whereas routinely changing SageMaker-specific configurations into the format anticipated by vLLM.

    Key factors to notice about this script:

    • It enforces the usage of port 8080, which SageMaker requires for inference containers
    • It dynamically interprets setting variables prefixed with OPTION_ into CLI arguments for vLLM (for instance, OPTION_MAX_MODEL_LEN=4096 adjustments to --max-model-len 4096)
    • It prints the ultimate set of arguments for visibility
    • It lastly launches the vLLM API server with the translated arguments

    The next is an instance serve script:

    #!/bin/bash
    
    # Outline the prefix for setting variables to search for
    PREFIX="OPTION_"
    ARG_PREFIX="--"
    
    # Initialize an array for storing the arguments
    # port 8080 required by sagemaker, https://docs.aws.amazon.com/sagemaker/newest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-container-response
    ARGS=(--port 8080)
    
    # Loop via all setting variables
    whereas IFS='=' learn -r key worth; do
        # Take away the prefix from the important thing, convert to lowercase, and change underscores with dashes
        arg_name=$(echo "${key#"${PREFIX}"}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')
    
        # Add the argument identify and worth to the ARGS array
        ARGS+=("${ARG_PREFIX}${arg_name}")
        if [ -n "$value" ]; then
            ARGS+=("$worth")
        fi
    completed < <(env | grep "^${PREFIX}")
    
    echo "-------------------------------------------------------------------"
    echo "vLLM engine args: [${ARGS[@]}]"
    echo "-------------------------------------------------------------------"
    
    # Move the collected arguments to the primary entrypoint
    exec python3 -m vllm.entrypoints.openai.api_server "${ARGS[@]}"

    Host personalized GPT-OSS as a SageMaker real-time endpoint

    Now you’ll be able to deploy your fine-tuned GPT-OSS mannequin utilizing the ECR picture URI you constructed within the earlier step. On this instance, the mannequin artifacts are saved securely in an S3 bucket, and SageMaker will obtain them into the container at runtime.Full the next configurations:

    • Set model_data to level to the S3 prefix the place your mannequin artifacts are positioned
    • Set the OPTION_MODEL setting variable to /choose/ml/mannequin, which is the place SageMaker mounts the mannequin contained in the container
    • (Elective) In case you’re serving a mannequin from Hugging Face Hub as a substitute of Amazon S3, you’ll be able to set OPTION_MODEL on to the Hugging Face mannequin ID as a substitute

    The endpoint startup may take a number of minutes because the mannequin artifacts are downloaded and the container is initialized.The next is an instance deployment code:

    inference_image = f"{account_id}.dkr.ecr.{area}.amazonaws.com/vllm:v0.10.1"
    
    ...
    ...
    
    lmi_model = sagemaker.Mannequin(
        image_uri=inference_image,
        env={
            "OPTION_MODEL": "/choose/ml/mannequin", # set this to let SM endpoint learn a mannequin saved in s3, else set it to HF MODEL ID
            "OPTION_SERVED_MODEL_NAME": "mannequin",
            "OPTION_TENSOR_PARALLEL_SIZE": json.dumps(num_gpus),
            "OPTION_DTYPE": "bfloat16",
            #"VLLM_ATTENTION_BACKEND": "TRITON_ATTN_VLLM_V1", # not required for vLLM 0.10.1 and above
            "OPTION_ASYNC_SCHEDULING": "true",
            "OPTION_QUANTIZATION": "mxfp4"
        },
        function=function,
        identify=model_name,
        model_data={
            'S3DataSource': {
                'S3Uri': "s3://path/to/gpt-oss/mannequin/artifacts",
                'S3DataType': 'S3Prefix',
                'CompressionType': 'None'
            }
        },
    )
    
    ...
    
    lmi_model.deploy(
        initial_instance_count=1,
        instance_type=instance_type,
        container_startup_health_check_timeout=600,
        endpoint_name=endpoint_name,
        endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
        inference_component_name=inference_component_name,
        assets=ResourceRequirements(requests={"num_accelerators": 1, "reminiscence": 1024*3, "copies": 1,}),
    )

    Pattern inference

    After your endpoint is deployed and within the InService state, you’ll be able to invoke your fine-tuned GPT-OSS mannequin utilizing the SageMaker Python SDK.

    The next is an instance predictor setup:

    pretrained_predictor = sagemaker.Predictor(
        endpoint_name=endpoint_name,
        sagemaker_session=sagemaker.Session(boto3.Session(region_name=boto3.Session().region_name)),
        serializer=serializers.JSONSerializer(),
        deserializer=deserializers.JSONDeserializer(),
        component_name=inference_component_name
    )

    The modified vLLM container is totally appropriate with the OpenAI-style messages enter format, making it simple to ship chat-style requests:

    payload = {
        "messages": [{"role": "user", "content": "Hello who are you?"}],
        "parameters": {"max_new_tokens": 64, "temperature": 0.2}
    }
    
    output = pretrained_predictor.predict(payload)

    You’ve got efficiently deployed and invoked your customized fine-tuned GPT-OSS mannequin on SageMaker real-time endpoints, utilizing the vLLM framework for optimized, low-latency inference. You’ll find extra GPT-OSS internet hosting examples within the OpenAI gpt-oss examples GitHub repo.

    Clear up

    To keep away from incurring further fees, full the next steps to scrub up the assets used on this publish:

    1. Delete the SageMaker endpoint:

    pretrained_predictor.delete_endpoint()

    1. In case you created a SageMaker HyperPod cluster for the needs of this publish, delete the cluster by following the directions in Deleting a SageMaker HyperPod cluster.
    2. Clear up the FSx for Lustre quantity if it’s now not wanted by following directions in Deleting a file system.
    3. In case you used coaching jobs, the coaching situations are routinely deleted when the roles are full.

    Conclusion

    On this publish, we confirmed the right way to fine-tune OpenAI’s GPT-OSS fashions (gpt-oss-120b and gpt-oss-20b) on SageMaker AI utilizing SageMaker HyperPod recipes. We mentioned how SageMaker HyperPod recipes present a strong but accessible resolution for organizations to scale their AI mannequin coaching capabilities with massive language fashions (LLMs) together with GPT-OSS, utilizing both a persistent cluster via SageMaker HyperPod, or an ephemeral cluster utilizing SageMaker coaching jobs. The structure streamlines advanced distributed coaching workflows via its intuitive recipe-based method, lowering setup time from weeks to minutes. We additionally confirmed how these fine-tuned fashions will be seamlessly deployed to manufacturing utilizing SageMaker endpoints with vLLM optimization, offering enterprise-grade inference capabilities with OpenAI-compatible APIs. This end-to-end workflow, from coaching to deployment, helps organizations construct and serve customized LLM options whereas utilizing the scalable infrastructure of AWS and complete ML platform capabilities of SageMaker.

    To start utilizing the SageMaker HyperPod recipes, go to the Amazon SageMaker HyperPod recipes GitHub repo for complete documentation and instance implementations. In case you’re curious about exploring the fine-tuning additional, the Generative AI utilizing Amazon SageMaker GitHub repo has the mandatory code and notebooks. Our staff continues to develop the recipe ecosystem primarily based on buyer suggestions and rising ML traits, ensuring that you’ve the instruments wanted for profitable AI mannequin coaching.

    Particular due to everybody who contributed to the launch: Hengzhi Pei, Zach Kimberg, Andrew Tian, Leonard Lausen, Sanjay Dorairaj, Manish Agarwal, Sareeta Panda, Chang Ning Tsai, Maxwell Nuyens, Natasha Sivananjaiah, and Kanwaljit Khurmi.


    In regards to the authors

    author-surydurgDurga Sury is a Senior Options Architect at Amazon SageMaker, the place she helps enterprise prospects construct safe and scalable AI/ML programs. When she’s not architecting options, you’ll find her having fun with sunny walks along with her canine, immersing herself in homicide thriller books, or catching up on her favourite Netflix exhibits.

    Pranav Murthy is a Senior Generative AI Information Scientist at AWS, specializing in serving to organizations innovate with Generative AI, Deep Studying, and Machine Studying on Amazon SageMaker AI. Over the previous 10+ years, he has developed and scaled superior pc imaginative and prescient (CV) and pure language processing (NLP) fashions to sort out high-impact issues—from optimizing international provide chains to enabling real-time video analytics and multilingual search. When he’s not constructing AI options, Pranav enjoys enjoying strategic video games like chess, touring to find new cultures, and mentoring aspiring AI practitioners. You’ll find Pranav on LinkedIn.

    Sumedha Swamy is a Senior Supervisor of Product Administration at Amazon Net Companies (AWS), the place he leads a number of areas of the Amazon SageMaker, together with SageMaker Studio – the industry-leading built-in growth setting for machine studying, developer and administrator experiences, AI infrastructure, and SageMaker SDK.

    Dmitry Soldatkin is a Senior AI/ML Options Architect at Amazon Net Companies (AWS), serving to prospects design and construct AI/ML options. Dmitry’s work covers a variety of ML use circumstances, with a major curiosity in Generative AI, deep studying, and scaling ML throughout the enterprise. He has helped corporations in lots of industries, together with insurance coverage, monetary companies, utilities, and telecommunications. You possibly can join with Dmitry on LinkedIn.

    Arun Kumar Lokanatha is a Senior ML Options Architect with the Amazon SageMaker staff. He makes a speciality of massive language mannequin coaching workloads, serving to prospects construct LLM workloads utilizing SageMaker HyperPod, SageMaker coaching jobs, and SageMaker distributed coaching. Exterior of labor, he enjoys working, mountaineering, and cooking.

    Anirudh Viswanathan is a Senior Product Supervisor, Technical, at AWS with the SageMaker staff, the place he focuses on Machine Studying. He holds a Grasp’s in Robotics from Carnegie Mellon College and an MBA from the Wharton College of Enterprise. Anirudh is a named inventor on greater than 50 AI/ML patents. He enjoys long-distance working, exploring artwork galleries, and attending Broadway exhibits.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Oliver Chambers
    • Website

    Related Posts

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026

    What OpenClaw Reveals In regards to the Subsequent Part of AI Brokers – O’Reilly

    March 14, 2026

    mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

    March 14, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

    By Declan MurphyMarch 14, 2026

    The Canadian telecoms large Telus is at present selecting up the items after a large…

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026

    Rent Gifted Offshore Copywriters In The Philippines

    March 14, 2026

    5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

    March 14, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.