Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

This publish is the second a part of the GPT-OSS collection specializing in mannequin customization with Amazon SageMaker AI. In Half 1, we demonstrated fine-tuning GPT-OSS fashions utilizing open supply Hugging Face libraries with SageMaker coaching jobs, which helps distributed multi-GPU and multi-node configurations, so you’ll be able to spin up high-performance clusters on demand.

On this publish, we present how one can fine-tune GPT OSS fashions on utilizing recipes on SageMaker HyperPod and Coaching Jobs. SageMaker HyperPod recipes aid you get began with coaching and fine-tuning standard publicly obtainable basis fashions (FMs) comparable to Meta’s Llama, Mistral, and DeepSeek in simply minutes, utilizing both SageMaker HyperPod or coaching jobs. The recipes present pre-built, validated configurations that alleviate the complexity of organising distributed coaching environments whereas sustaining enterprise-grade efficiency and scalability for fashions. We define steps to fine-tune the GPT-OSS mannequin on a multilingual reasoning dataset, HuggingFaceH4/Multilingual-Pondering, so GPT-OSS can deal with structured, chain-of-thought (CoT) reasoning throughout a number of languages.

Answer overview

This resolution makes use of SageMaker HyperPod recipes to run a fine-tuning job on HyperPod utilizing Amazon Elastic Kubernetes Service (Amazon EKS) orchestration or coaching jobs. Recipes are processed via the SageMaker HyperPod recipe launcher, which serves because the orchestration layer chargeable for launching a job on the corresponding structure comparable to SageMaker HyperPod (Slurm or Amazon EKS) or coaching jobs. To study extra, see SageMaker HyperPod recipes.

For particulars on fine-tuning the GPT-OSS mannequin, see Nice-tune OpenAI GPT-OSS fashions on Amazon SageMaker AI utilizing Hugging Face libraries.

Within the following sections, we talk about the stipulations for each choices, after which transfer on to the information preparation. The ready knowledge is saved to Amazon FSx for Lustre, which is used because the persistent file system for SageMaker HyperPod, or Amazon Easy Storage Service (Amazon S3) for coaching jobs. We then use recipes to submit the fine-tuning job, and eventually deploy the skilled mannequin to a SageMaker endpoint for testing and evaluating the mannequin. The next diagram illustrates this structure.

Conditions

To observe alongside, it’s essential to have the next stipulations:

An area growth setting with AWS credentials configured for creating and accessing SageMaker assets, or a distant setting comparable to Amazon SageMaker Studio.
For SageMaker HyperPod fine-tuning, full the next:
For fine-tuning the mannequin utilizing SageMaker coaching jobs, it’s essential to have one ml.p5.48xlarge occasion (with 8 x NVIDIA H100 GPUs) for coaching jobs utilization. In case you don’t have ample limits, request the next SageMaker quotas on the Service Quotas console: P5 occasion (ml.p5.48xlarge) for coaching jobs (ml.p5.48xlarge for cluster utilization): 1.

It’d take as much as 24 hours for these limits to be accepted. You may also use SageMaker coaching plans to order these situations for a selected timeframe and use case (cluster or coaching jobs utilization). For extra particulars, see Reserve coaching plans in your coaching jobs or HyperPod clusters.

Subsequent, use your most popular growth setting to arrange the dataset for fine-tuning. You’ll find the total code within the Generative AI utilizing Amazon SageMaker repository on GitHub.

Information tokenization

We use the Hugging FaceH4/Multilingual-Pondering dataset, which is a multilingual reasoning dataset containing CoT examples translated into languages comparable to French, Spanish, and German. The recipe helps a sequence size of 4,000 tokens for the GPT-OSS 120B mannequin. The next instance code demonstrates the right way to tokenize the multilingual-thinking dataset. The recipe accepts knowledge in Hugging Face format (arrow). After it’s tokenized, it can save you the processed dataset to disk.

from datasets import load_dataset
 
from transformers import AutoTokenizer
import numpy as np
 
dataset = load_dataset("HuggingFaceH4/Multilingual-Pondering", cut up="practice")
 
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
messages = dataset[0]["messages"]
dialog = tokenizer.apply_chat_template(messages, tokenize=False)
print(dialog)
 
def preprocess_function(instance):
    return tokenizer.apply_chat_template(instance['messages'], 
                                        return_dict=True, 
                                        padding="max_length", 
                                        max_length=4096, 
                                        truncation=True)
 
def label(x):
    x["labels"]=np.array(x["input_ids"])
    x["labels"][x["labels"]==tokenizer.pad_token_id]=-100
    x["labels"]=x["labels"].tolist()
    return x
 
dataset = dataset.map(preprocess_function, 
                      remove_columns=['reasoning_language', 
                                      'developer', 
                                      'user', 
                                      'analysis', 
                                      'final',
                                      'messages'])
dataset = dataset.map(label)

# for HyperPod, save to mounted FSx quantity
dataset.save_to_disk("/fsx/multilingual_4096")

# for coaching jobs, save to S3
dataset.save_to_disk("multilingual_4096")

def upload_directory(local_dir, bucket_name, s3_prefix=''):
    s3_client = boto3.shopper('s3')
    
    for root, dirs, recordsdata in os.stroll(local_dir):
        for file in recordsdata:
            local_path = os.path.be part of(root, file)
            # Calculate relative path for S3
            relative_path = os.path.relpath(local_path, local_dir)
            s3_path = os.path.be part of(s3_prefix, relative_path).change("", "/")
            
            print(f"Importing {local_path} to {s3_path}")
            s3_client.upload_file(local_path, bucket_name, s3_path)

upload_directory('./multilingual_4096/', , 'multilingual_4096')

Now that you’ve ready and tokenized the dataset, you’ll be able to fine-tune the GPT-OSS mannequin in your dataset, utilizing both SageMaker HyperPod or coaching jobs. SageMaker coaching jobs are perfect for one-off or periodic coaching workloads that want non permanent compute assets, making it a completely managed, on-demand expertise in your coaching wants. SageMaker HyperPod is perfect for steady growth and experimentation, offering a persistent, preconfigured, and failure-resilient cluster. Relying in your selection, skip to the suitable part for subsequent steps.

Nice-tune the mannequin utilizing SageMaker HyperPod

To fine-tune the mannequin utilizing HyperPod, begin by organising the digital setting and putting in the mandatory dependencies to execute the coaching job on the EKS cluster. Be sure that the cluster is InService earlier than continuing, and also you’re utilizing Python 3.9 or better in your growth setting.

python3 -m venv ${PWD}/venv
supply venv/bin/activate

Subsequent, obtain and arrange the SageMaker HyperPod recipes repository:

git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 set up -r necessities.txt

Now you can use the SageMaker HyperPod recipe launch scripts to submit your coaching job. Utilizing the recipe entails updating the k8s.yaml configuration file and executing the launch script.

In recipes_collection/cluster/k8s.yaml, replace the persistent_volume_claims part. It mounts the FSx declare to the /fsx listing of every computing pod:

- claimName: fsx-claim    
  mountPath: fsx

SageMaker HyperPod recipes present a launch script for every recipe inside the launcher_scripts listing. To fine-tune the GPT-OSS-120B mannequin, replace the launch scripts positioned at launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh and replace the cluster_type parameter.

The up to date launch script ought to look just like the next code when working SageMaker HyperPod with Amazon EKS. Ensure that cluster=k8s and cluster_type=k8s are up to date within the launch script:

#!/bin/bash

# Unique Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com

#Customers ought to setup their cluster sort in /recipes_collection/config.yaml

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="openai/gpt-oss-120b" # HuggingFace pretrained mannequin identify or path

TRAIN_DIR="/fsx/multilingual_4096" # Location of coaching dataset
VAL_DIR="/fsx/multilingual_4096" # Location of validation dataset

EXP_DIR="/fsx/experiment" # Location to avoid wasting experiment data together with logging, checkpoints, ect
HF_ACCESS_TOKEN="hf_xxxxxxxx" # Elective HuggingFace entry token

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/predominant.py" 
    recipes=fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora 
    container="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8" 
    base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/outcomes" 
    recipes.run.identify="hf-gpt-oss-120b-lora" 
	cluster=k8s  # Imp: add cluster line when working on HP EKS
	cluster_type=k8s  # Imp: add cluster_type line when working on HP EKS
    recipes.exp_manager.exp_dir="$EXP_DIR" 
    recipes.coach.num_nodes=1 
    recipes.mannequin.knowledge.train_dir="$TRAIN_DIR" 
    recipes.mannequin.knowledge.val_dir="$VAL_DIR" 
    recipes.mannequin.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 
    recipes.mannequin.hf_access_token="$HF_ACCESS_TOKEN"

When the script is prepared, you’ll be able to launch fine-tuning of the GPT OSS 120B mannequin utilizing the next code:

chmod +x launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh 
bash launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh

After submitting a job for fine-tuning, you should use the next command to confirm profitable submission. You must be capable to see the pods working in your cluster:

kubectl get pods
NAME                                READY  STATUS   RESTARTS   AGE
hf-gpt-oss-120b-lora-h2cwd-worker-0 1/1    Working  0          14m

To verify logs for the job, you should use the kubectl logs command:

kubectl logs -f hf-gpt-oss-120b-lora-h2cwd-worker-0

You must be capable to see the next logs when the coaching begins and completes. You can see the checkpoints written to the /fsx/experiment/checkpoints folder.

warnings.warn(
    
Epoch 0:  40%|████      | 50/125 [08:47<13:10,  0.09it/s, Loss/train=0.254, Norms/grad_norm=0.128, LR/learning_rate=2.2e-6] [NeMo I 2025-08-18 17:49:48 nemo_logging:381] save SageMakerCheckpointType.PEFT_FULL checkpoint: /fsx/experiment/checkpoints/peft_full/steps_50
[NeMo I 2025-08-18 17:49:48 nemo_logging:381] Saving PEFT checkpoint to /fsx/experiment/checkpoints/peft_full/steps_50
[NeMo I 2025-08-18 17:49:49 nemo_logging:381] Loading Base mannequin from : openai/gpt-oss-120b
You are trying to make use of Flash Consideration 2 with out specifying a torch dtype. This may result in sudden behaviour
Loading checkpoint shards: 100%|██████████| 15/15 [01:49<00:00,  7.33s/it]
[NeMo I 2025-08-18 17:51:39 nemo_logging:381] Merging the adapter, this may take some time......
Unloading and merging mannequin: 100%|██████████| 547/547 [00:07<00:00, 71.27it/s]
[NeMo I 2025-08-18 17:51:47 nemo_logging:381] Checkpointing to /fsx/experiment/checkpoints/peft_full/steps_50/final-model......
[NeMo I 2025-08-18 18:00:14 nemo_logging:381] Efficiently save the merged mannequin checkpoint.
`Coach.match` stopped: `max_steps=50` reached.
Epoch 0:  40%|████      | 50/125 [23:09<34:43,  0.04it/s, Loss/train=0.264, Norms/grad_norm=0.137, LR/learning_rate=2e-6]

When the coaching is full, the ultimate merged mannequin will be discovered within the experiment listing path you outlined within the launcher script below /fsx/experiment/checkpoints/peft_full/steps_50/final-model.

Nice-tune utilizing SageMaker coaching jobs

You may also use recipes straight with SageMaker coaching jobs utilizing the SageMaker Python SDK. The coaching jobs routinely spin up the compute, load the enter knowledge, run the coaching script, save the mannequin to your output location, and tear down the situations, for a clean coaching expertise.

The next code snippet exhibits the right way to use recipes with the PyTorch estimator. You should utilize the training_recipe parameter to specify the coaching or fine-tuning recipe for use, and recipe_overrides for any parameters that want alternative. For coaching jobs, replace the enter, output, and outcomes directories to places in /choose/ml as required by SageMaker coaching jobs.

import os
import sagemaker,boto3
from sagemaker.pytorch import PyTorch
from sagemaker.inputs import FileSystemInput

sagemaker_session = sagemaker.Session()
function = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()
output = os.path.be part of(f"s3://{bucket}", "output")

recipe_overrides = {
    "run": {
        "results_dir": "/choose/ml/mannequin",
    },
    "exp_manager": {
        "exp_dir": "",
        "explicit_log_dir": "/choose/ml/output/tensorboard",
        "checkpoint_dir": "/choose/ml/checkpoints",
    },
    "mannequin": {
        "knowledge": {
            "train_dir": "/choose/ml/enter/knowledge/practice",
            "val_dir": "/choose/ml/enter/knowledge/val",
        },
    },
    "use_smp_model": "False",
}


# create the estimator object
estimator = PyTorch(
  output_path=output,
  base_job_name=f"gpt-oss-recipe",
  function=function,
  instance_type="ml.p5.48xlarge",
  training_recipe="fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora",
  recipe_overrides=recipe_overrides,
  sagemaker_session=sagemaker_session,
  image_uri="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8",
)

# submit the coaching job
estimator.match(
inputs={
"practice": f"s3://{bucket}/datasets/multilingual_4096/", 
"val": f"s3://{bucket}/datasets/multilingual_4096/"}, wait=True)

After the job is submitted, you’ll be able to monitor the standing of your coaching job on the SageMaker console, by selecting Coaching jobs below Coaching within the navigation pane. Select the coaching job that begins with gpt-oss-recipe to view its particulars and logs. When the coaching job is full, the outputs will likely be saved to an S3 location. You may get the situation of the output artifacts from the S3 mannequin artifact part on the job particulars web page.

Run inference

After you fine-tune your GPT-OSS mannequin with SageMaker recipes on both SageMaker coaching jobs or SageMaker HyperPod, the output is a personalized mannequin artifact that merges the bottom mannequin with the personalized PEFT adapters. This remaining mannequin is saved in Amazon S3 and will be deployed straight from Amazon S3 to SageMaker endpoints for real-time inference.

To serve GPT-OSS fashions, it’s essential to have the newest vLLM containers (v0.10.1 or later). A full record of vllm-openai Docker picture variations is obtainable on Docker hub.

The steps to deploy your fine-tuned GPT-OSS mannequin are outlined on this part.

Construct the newest GPT-OSS container in your SageMaker endpoint

In case you’re deploying the mannequin from SageMaker Studio utilizing JupyterLab or the Code Editor, each environments include Docker preinstalled. Just be sure you’re utilizing the SageMaker Distribution picture v3.0 or later for compatibility.You possibly can construct your deployment container by working the next instructions:

%%bash # <- use this in the event you're working this inside JupterLab cell

# navigate to deploy dir from the present workdir, to construct container
cd ./deploy 

# construct a push container
chmod +X construct.sh
bash construct.sh

cd ..

In case you’re working these instructions from an area terminal or different setting, merely omit the %%bash line and run the instructions as normal shell instructions.

The construct.sh script is chargeable for routinely constructing and pushing a vllm-openai container that’s optimized for SageMaker endpoints. After it’s constructed, the customized SageMaker endpoint appropriate vllm picture is pushed to Amazon Elastic Container Registry (Amazon ECR). SageMaker endpoints can then pull this picture from Amazon ECR at runtime to spin up the container for inference.

The next is an instance of the construct.sh script:

export REGION={area}
export ACCOUNT_ID={account_id}
export REPOSITORY_NAME=vllm
export TAG=v0.10.1

full_name="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPOSITORY_NAME}:${TAG}"

echo "constructing $full_name"

DOCKER_BUILDKIT=0 docker construct . --network sagemaker --tag $full_name --file Dockerfile

aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

# If the repository would not exist in ECR, create it.
aws ecr describe-repositories --region ${REGION} --repository-names "${REPOSITIRY_NAME}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --region ${REGION} --repository-name "${REPOSITORY_NAME}" > /dev/null
fi

docker tag $REPOSITORY_NAME:$TAG ${full_name}
docker push ${full_name}

The Dockerfile defines how we convert an open supply vLLM Docker picture right into a SageMaker hosting-compatible picture. This entails extending the bottom vllm-openai picture, including the serve entrypoint script, and making it executable. See the next instance Dockerfile:

FROM vllm/vllm-openai:v0.10.1

COPY serve /usr/bin/serve
RUN chmod 777 /usr/bin/serve

ENTRYPOINT [ "/usr/bin/serve" ]

The serve script acts as a translation layer between SageMaker internet hosting conventions and the vLLM runtime. You possibly can keep the identical deployment workflow you’re accustomed to when internet hosting fashions on SageMaker endpoints, whereas routinely changing SageMaker-specific configurations into the format anticipated by vLLM.

Key factors to notice about this script:

It enforces the usage of port 8080, which SageMaker requires for inference containers
It dynamically interprets setting variables prefixed with OPTION_ into CLI arguments for vLLM (for instance, OPTION_MAX_MODEL_LEN=4096 adjustments to --max-model-len 4096)
It prints the ultimate set of arguments for visibility
It lastly launches the vLLM API server with the translated arguments

The next is an instance serve script:

#!/bin/bash

# Outline the prefix for setting variables to search for
PREFIX="OPTION_"
ARG_PREFIX="--"

# Initialize an array for storing the arguments
# port 8080 required by sagemaker, https://docs.aws.amazon.com/sagemaker/newest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-container-response
ARGS=(--port 8080)

# Loop via all setting variables
whereas IFS='=' learn -r key worth; do
    # Take away the prefix from the important thing, convert to lowercase, and change underscores with dashes
    arg_name=$(echo "${key#"${PREFIX}"}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')

    # Add the argument identify and worth to the ARGS array
    ARGS+=("${ARG_PREFIX}${arg_name}")
    if [ -n "$value" ]; then
        ARGS+=("$worth")
    fi
completed < <(env | grep "^${PREFIX}")

echo "-------------------------------------------------------------------"
echo "vLLM engine args: [${ARGS[@]}]"
echo "-------------------------------------------------------------------"

# Move the collected arguments to the primary entrypoint
exec python3 -m vllm.entrypoints.openai.api_server "${ARGS[@]}"

Host personalized GPT-OSS as a SageMaker real-time endpoint

Now you’ll be able to deploy your fine-tuned GPT-OSS mannequin utilizing the ECR picture URI you constructed within the earlier step. On this instance, the mannequin artifacts are saved securely in an S3 bucket, and SageMaker will obtain them into the container at runtime.Full the next configurations:

Set model_data to level to the S3 prefix the place your mannequin artifacts are positioned
Set the OPTION_MODEL setting variable to /choose/ml/mannequin, which is the place SageMaker mounts the mannequin contained in the container
(Elective) In case you’re serving a mannequin from Hugging Face Hub as a substitute of Amazon S3, you’ll be able to set OPTION_MODEL on to the Hugging Face mannequin ID as a substitute

The endpoint startup may take a number of minutes because the mannequin artifacts are downloaded and the container is initialized.The next is an instance deployment code:

inference_image = f"{account_id}.dkr.ecr.{area}.amazonaws.com/vllm:v0.10.1"

...
...

lmi_model = sagemaker.Mannequin(
    image_uri=inference_image,
    env={
        "OPTION_MODEL": "/choose/ml/mannequin", # set this to let SM endpoint learn a mannequin saved in s3, else set it to HF MODEL ID
        "OPTION_SERVED_MODEL_NAME": "mannequin",
        "OPTION_TENSOR_PARALLEL_SIZE": json.dumps(num_gpus),
        "OPTION_DTYPE": "bfloat16",
        #"VLLM_ATTENTION_BACKEND": "TRITON_ATTN_VLLM_V1", # not required for vLLM 0.10.1 and above
        "OPTION_ASYNC_SCHEDULING": "true",
        "OPTION_QUANTIZATION": "mxfp4"
    },
    function=function,
    identify=model_name,
    model_data={
        'S3DataSource': {
            'S3Uri': "s3://path/to/gpt-oss/mannequin/artifacts",
            'S3DataType': 'S3Prefix',
            'CompressionType': 'None'
        }
    },
)

...

lmi_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=600,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    assets=ResourceRequirements(requests={"num_accelerators": 1, "reminiscence": 1024*3, "copies": 1,}),
)

Pattern inference

After your endpoint is deployed and within the InService state, you’ll be able to invoke your fine-tuned GPT-OSS mannequin utilizing the SageMaker Python SDK.

The next is an instance predictor setup:

pretrained_predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker.Session(boto3.Session(region_name=boto3.Session().region_name)),
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
    component_name=inference_component_name
)

The modified vLLM container is totally appropriate with the OpenAI-style messages enter format, making it simple to ship chat-style requests:

payload = {
    "messages": [{"role": "user", "content": "Hello who are you?"}],
    "parameters": {"max_new_tokens": 64, "temperature": 0.2}
}

output = pretrained_predictor.predict(payload)

You’ve got efficiently deployed and invoked your customized fine-tuned GPT-OSS mannequin on SageMaker real-time endpoints, utilizing the vLLM framework for optimized, low-latency inference. You’ll find extra GPT-OSS internet hosting examples within the OpenAI gpt-oss examples GitHub repo.

Clear up

To keep away from incurring further fees, full the next steps to scrub up the assets used on this publish:

Delete the SageMaker endpoint:

pretrained_predictor.delete_endpoint()

In case you created a SageMaker HyperPod cluster for the needs of this publish, delete the cluster by following the directions in Deleting a SageMaker HyperPod cluster.
Clear up the FSx for Lustre quantity if it’s now not wanted by following directions in Deleting a file system.
In case you used coaching jobs, the coaching situations are routinely deleted when the roles are full.

Conclusion

On this publish, we confirmed the right way to fine-tune OpenAI’s GPT-OSS fashions (gpt-oss-120b and gpt-oss-20b) on SageMaker AI utilizing SageMaker HyperPod recipes. We mentioned how SageMaker HyperPod recipes present a strong but accessible resolution for organizations to scale their AI mannequin coaching capabilities with massive language fashions (LLMs) together with GPT-OSS, utilizing both a persistent cluster via SageMaker HyperPod, or an ephemeral cluster utilizing SageMaker coaching jobs. The structure streamlines advanced distributed coaching workflows via its intuitive recipe-based method, lowering setup time from weeks to minutes. We additionally confirmed how these fine-tuned fashions will be seamlessly deployed to manufacturing utilizing SageMaker endpoints with vLLM optimization, offering enterprise-grade inference capabilities with OpenAI-compatible APIs. This end-to-end workflow, from coaching to deployment, helps organizations construct and serve customized LLM options whereas utilizing the scalable infrastructure of AWS and complete ML platform capabilities of SageMaker.

To start utilizing the SageMaker HyperPod recipes, go to the Amazon SageMaker HyperPod recipes GitHub repo for complete documentation and instance implementations. In case you’re curious about exploring the fine-tuning additional, the Generative AI utilizing Amazon SageMaker GitHub repo has the mandatory code and notebooks. Our staff continues to develop the recipe ecosystem primarily based on buyer suggestions and rising ML traits, ensuring that you’ve the instruments wanted for profitable AI mannequin coaching.

Particular due to everybody who contributed to the launch: Hengzhi Pei, Zach Kimberg, Andrew Tian, Leonard Lausen, Sanjay Dorairaj, Manish Agarwal, Sareeta Panda, Chang Ning Tsai, Maxwell Nuyens, Natasha Sivananjaiah, and Kanwaljit Khurmi.

In regards to the authors

Durga Sury is a Senior Options Architect at Amazon SageMaker, the place she helps enterprise prospects construct safe and scalable AI/ML programs. When she’s not architecting options, you’ll find her having fun with sunny walks along with her canine, immersing herself in homicide thriller books, or catching up on her favourite Netflix exhibits.

Pranav Murthy is a Senior Generative AI Information Scientist at AWS, specializing in serving to organizations innovate with Generative AI, Deep Studying, and Machine Studying on Amazon SageMaker AI. Over the previous 10+ years, he has developed and scaled superior pc imaginative and prescient (CV) and pure language processing (NLP) fashions to sort out high-impact issues—from optimizing international provide chains to enabling real-time video analytics and multilingual search. When he’s not constructing AI options, Pranav enjoys enjoying strategic video games like chess, touring to find new cultures, and mentoring aspiring AI practitioners. You’ll find Pranav on LinkedIn.

Sumedha Swamy is a Senior Supervisor of Product Administration at Amazon Net Companies (AWS), the place he leads a number of areas of the Amazon SageMaker, together with SageMaker Studio – the industry-leading built-in growth setting for machine studying, developer and administrator experiences, AI infrastructure, and SageMaker SDK.

Dmitry Soldatkin is a Senior AI/ML Options Architect at Amazon Net Companies (AWS), serving to prospects design and construct AI/ML options. Dmitry’s work covers a variety of ML use circumstances, with a major curiosity in Generative AI, deep studying, and scaling ML throughout the enterprise. He has helped corporations in lots of industries, together with insurance coverage, monetary companies, utilities, and telecommunications. You possibly can join with Dmitry on LinkedIn.

Arun Kumar Lokanatha is a Senior ML Options Architect with the Amazon SageMaker staff. He makes a speciality of massive language mannequin coaching workloads, serving to prospects construct LLM workloads utilizing SageMaker HyperPod, SageMaker coaching jobs, and SageMaker distributed coaching. Exterior of labor, he enjoys working, mountaineering, and cooking.

Anirudh Viswanathan is a Senior Product Supervisor, Technical, at AWS with the SageMaker staff, the place he focuses on Machine Studying. He holds a Grasp’s in Robotics from Carnegie Mellon College and an MBA from the Wharton College of Enterprise. Anirudh is a named inventor on greater than 50 AI/ML patents. He enjoys long-distance working, exploring artwork galleries, and attending Broadway exhibits.

Main Menu

What's Hot

ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

Easy methods to Purchase Used or Refurbished Electronics (2026)

Rent Gifted Offshore Copywriters In The Philippines

Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

What OpenClaw Reveals In regards to the Subsequent Part of AI Brokers – O’Reilly

mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

ShinyHunters Claims 1 Petabyte Information Breach at Telus Digital

Easy methods to Purchase Used or Refurbished Electronics (2026)

Rent Gifted Offshore Copywriters In The Philippines

5 Highly effective Python Decorators for Excessive-Efficiency Information Pipelines

Main Menu

Subscribe to Updates

What's Hot

Nice-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

Answer overview

Conditions

Information tokenization

Nice-tune the mannequin utilizing SageMaker HyperPod

Nice-tune utilizing SageMaker coaching jobs

Run inference

Construct the newest GPT-OSS container in your SageMaker endpoint

Host personalized GPT-OSS as a SageMaker real-time endpoint

Pattern inference

Clear up

Conclusion

In regards to the authors

Related Posts