On the AWS Summit in New York Metropolis, we launched a complete suite of mannequin customization capabilities for Amazon Nova basis fashions. Obtainable as ready-to-use recipes on Amazon SageMaker AI, you should use them to adapt Nova Micro, Nova Lite, and Nova Professional throughout the mannequin coaching lifecycle, together with pre-training, supervised fine-tuning, and alignment.
On this multi-post sequence, we are going to discover these customization recipes and supply a step-by-step implementation information. We’re beginning with Direct Choice Optimization (DPO, an alignment approach that provides an easy strategy to tune mannequin outputs along with your preferences. DPO makes use of prompts paired with two responses—one most popular over the opposite—to information the mannequin towards outputs that higher mirror your required tone, type, or pointers. You possibly can implement this system utilizing both parameter-efficient or full mannequin DPO, based mostly in your information quantity and price concerns. The personalized fashions will be deployed to Amazon Bedrock for inference utilizing provisioned throughput. The parameter-efficient model helps on-demand inference. Nova customization recipes can be found in SageMaker coaching jobs and SageMaker HyperPod, providing you with flexibility to pick the setting that most closely fits your infrastructure and scale necessities.
On this put up, we current a streamlined strategy to customizing Amazon Nova Micro with SageMaker coaching jobs.
Resolution overview
The workflow for utilizing Amazon Nova recipes with SageMaker coaching jobs, as illustrated within the accompanying diagram, consists of the next steps:
- The person selects a particular Nova customization recipe which supplies complete configurations to regulate Amazon Nova coaching parameters, mannequin settings, and distributed coaching methods. You need to use the default configurations optimized for the SageMaker AI setting or customise them to experiment with totally different settings.
- The person submits an API request to the SageMaker AI management airplane, passing the Amazon Nova recipe configuration.
- SageMaker makes use of the coaching job launcher script to run the Nova recipe on a managed compute cluster.
- Primarily based on the chosen recipe, SageMaker AI provisions the required infrastructure, orchestrates distributed coaching, and, upon completion, routinely decommissions the cluster.
This streamlined structure delivers a totally managed person expertise, so you’ll be able to shortly outline Amazon Nova coaching parameters and choose your most popular infrastructure utilizing simple recipes, whereas SageMaker AI handles the end-to-end infrastructure administration—inside a pay-as-you-go pricing mannequin that’s solely billed for the online coaching time in seconds.
The personalized Amazon Nova mannequin is subsequently deployed on Amazon Bedrock utilizing the createcustommodel
API inside Bedrock – and may combine with native tooling equivalent to Amazon Bedrock Data Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Brokers.
Enterprise Use Case – Implementation Stroll-through
On this put up, we deal with adapting the Amazon Nova Micro mannequin to optimize structured perform calling for application-specific agentic workflows. We display how this strategy can optimize Amazon Nova fashions for domain-specific use instances by a 81% enhance in F1 rating and as much as 42% features in ROUGE metrics. These enhancements make the fashions extra environment friendly in addressing a wide selection of enterprise functions, equivalent to enabling buyer help AI assistants to intelligently escalate queries, powering digital assistants for scheduling and workflow automation, and automating decision-making in sectors like ecommerce and monetary providers.
As proven within the following diagram, our strategy makes use of DPO to align the Amazon Nova mannequin with human preferences by presenting the mannequin with pairs of responses—one most popular by human annotators and one much less most popular—based mostly on a given person question and accessible software actions. The mannequin is skilled with the nvidia/When2Call dataset to extend the probability of the tool_call
response, which aligns with the enterprise objective of automating backend actions when applicable. Over many such examples, the Amazon Nova mannequin learns not simply to generate right function-calling syntax, but additionally to make nuanced selections about when and easy methods to invoke instruments in complicated workflows—enhancing its utility in enterprise functions like buyer help automation, workflow orchestration, and clever digital assistants.
When coaching is full, we consider the fashions utilizing SageMaker coaching jobs with the suitable analysis recipe. An analysis recipe is a YAML configuration file that defines how your Amazon Nova massive language mannequin (LLM) analysis job can be executed. Utilizing this analysis recipe, we measure each the mannequin’s task-specific efficiency and its alignment with the specified agent behaviors, so we are able to quantitatively assess the effectiveness of our customization strategy. The next diagram illustrates how these phases will be carried out as two separate coaching job steps. For every step, we use built-in integration with Amazon CloudWatch to entry logs and monitor system metrics, facilitating sturdy observability. After the mannequin is skilled and evaluated, we deploy the mannequin utilizing the Amazon Bedrock Customized Mannequin Import performance as a part of step 3.
Conditions
You could full the next stipulations earlier than you’ll be able to run the Amazon Nova Micro mannequin fine-tuning pocket book:
- Make the next quota enhance requests for SageMaker AI. For this use case, you will have to request a minimal of two
p5.48xlarge
occasion (with 8 x NVIDIA H100 GPUs) and scale to extrap5.48xlarge
cases (relying on time-to-train and cost-to-train trade-offs in your use case). On the Service Quotas console, request the next SageMaker AI quotas:- P5 cases (
p5.48xlarge
) for coaching job utilization: 2
- P5 cases (
- (Non-compulsory) You possibly can create an Amazon SageMaker Studio area (confer with Use fast setup for Amazon SageMaker AI) to entry Jupyter notebooks with the previous position. (You need to use JupyterLab in your native setup, too.)
- Create an AWS Id and Entry Administration (IAM) position with managed insurance policies
AmazonSageMakerFullAccess
,AmazonS3FullAccess
, andAmazonBedrockFullAccess
to present required entry to SageMaker AI and Amazon Bedrock to run the examples. - Assign the next coverage because the belief relationship to your IAM position:
- Clone the GitHub repository with the property for this deployment. This repository consists of a pocket book that references coaching property:
Subsequent, we run the pocket book nova-micro-dpo-peft.ipynb to fine-tune the Amazon Nova mannequin utilizing DPO, and PEFT on SageMaker coaching jobs.
Put together the dataset
To organize the dataset, you have to load the nvidia/When2Call
dataset. This dataset supplies synthetically generated person queries, software choices, and annotated preferences based mostly on actual situations, to coach and consider AI assistants on making optimum tool-use selections in multi-step situations.
Full the next steps to format the enter in a chat completion format, and configure the info channels for SageMaker coaching jobs on Amazon Easy Storage Service (Amazon S3):
- Load the nvidia/When2Call dataset:
The DPO approach requires a dataset containing the next:
- Consumer prompts (e.g., “Write knowledgeable e mail asking for a increase”)
- Most popular outputs (very best responses)
- Non-preferred outputs (undesirable responses)
The next code is an instance from the unique dataset:
- As a part of information preprocessing, we convert the info into the format required by Amazon Nova Micro, as proven within the following code. For examples and particular constraints of the Amazon Nova format, see Making ready information for fine-tuning Understanding fashions.
For the complete information conversion code, see right here.
- Break up the dataset into practice and take a look at datasets:
- Put together the coaching and take a look at datasets for the SageMaker coaching job by saving them as
.jsonl
recordsdata, which is required by SageMaker HyperPod recipes for Amazon Nova, and establishing the Amazon S3 paths the place these recordsdata can be uploaded:
DPO coaching utilizing SageMaker coaching jobs
To fine-tune the mannequin utilizing DPO and SageMaker coaching jobs with recipes, we use the PyTorch Estimator class. Begin by setting the fine-tuning workload with the next steps:
- Choose the occasion kind and the container picture for the coaching job:
- Create the PyTorch Estimator to encapsulate the coaching setup from a particular Amazon Nova recipe:
You possibly can level to the particular recipe with the training_recipe
parameter and override the recipe by offering a dictionary as recipe_overrides
parameter.
The PyTorch Estimator
class simplifies the expertise by encapsulating code and coaching setup immediately from the chosen recipe.
On this instance, training_recipe
: fine-tuning/nova/dpo-peft-nova-micro-v1
is defining the DPO fine-tuning setup with PEFT approach
- Arrange the enter channels for the PyTorch Estimator by creating an TrainingInput objects from the offered S3 bucket paths for the coaching and take a look at datasets:
- Submit the coaching job utilizing the
match
perform name on the created Estimator:
estimator.match(inputs={"practice": train_input, "validation": test_input}, wait=True)
You possibly can monitor the job immediately out of your pocket book output. It’s also possible to refer the SageMaker AI console, which reveals the standing of the job and the corresponding CloudWatch logs for governance and observability, as proven within the following screenshots.

SageMaker coaching jobs console

SageMaker coaching jobs system metrics
After the job is full, the skilled mannequin weights can be accessible in an escrow S3 bucket. This safe bucket is managed by Amazon and makes use of particular entry controls. You possibly can entry the paths shared in manifest recordsdata which are saved in a buyer S3 bucket as a part of the coaching course of.
Consider the fine-tuned mannequin utilizing the analysis recipe
To evaluate mannequin efficiency in opposition to benchmarks or {custom} datasets, we are able to use the Nova analysis recipes and SageMaker coaching jobs to execute an analysis workflow, by pointing to the mannequin skilled within the earlier step. Amongst a number of supported benchmarks, equivalent to mmlu
, math
, gen_qa
, and llm_judge
, within the following steps we’re going to present two choices for gen_qa
and llm_judge
duties, which permit us to judge response accuracy, precision and mannequin inference high quality with the likelihood to make use of our personal dataset and evaluate outcomes with the bottom mannequin on Amazon Bedrock.
Choice A: Consider gen_qa process
- Use the code within the to arrange the dataset, structured within the following format as required by the analysis recipe:
- Save the dataset as
.jsonl
recordsdata, which is required by Amazon Nova analysis recipes, and add them to the Amazon S3 path:
- Create the analysis recipe pointing to skilled mannequin, validation information, and the analysis metrics relevant to your use case:
- Choose the occasion kind, the container picture for the analysis job, and outline the checkpoint path the place the mannequin can be saved. The really helpful occasion sorts for the Amazon Nova analysis recipes are:
ml.g5.12xlarge
for Amazon Nova Micro and Amazon Nova Lite, andml.g5.48xlarge
for Amazon Nova Professional:
- Create the PyTorch Estimator to encapsulate the analysis setup from the created recipe:
- Arrange the enter channels for PyTorch Estimator by creating an TrainingInput objects from the offered S3 bucket paths for the validation dataset:
- Submit the coaching job:
estimator.match(inputs={"practice": eval_input}, wait=False)
Analysis metrics can be saved by the SageMaker coaching Job in your S3 bucket, underneath the required output_path
.
The next determine and accompanying desk present the analysis outcomes in opposition to the bottom mannequin for the gen_qa
process:
F1 | F1 QUASI | ROUGE 1 | ROUGE 2 | ROUGE L | |
Base | 0.26 | 0.37 | 0.38 | 0.28 | 0.34 |
Advantageous-tuned | 0.46 | 0.52 | 0.52 | 0.4 | 0.46 |
% Distinction | 81% | 40% | 39% | 42% | 38% |
Choice B: Consider llm_judge process
- For the
llm_judge
process, construction the dataset with the beneath format, the placeresponse_A
represents the bottom reality andresponse_B
represents our personalized mannequin output:
- Following the identical strategy described for the
gen_qa
process, create an analysis recipe particularly for thellm_judge
process, by specifyingdecide
as technique:
The entire implementation together with dataset preparation, recipe creation, and job submission steps, confer with the pocket book nova-micro-dpo-peft.ipynb.
The next determine reveals the outcomes for the llm_judge
process:
This graph reveals the choice percentages when utilizing an LLM as a decide to judge mannequin efficiency throughout two totally different comparisons. In Graph 1, the fine-tuned mannequin outperformed the bottom reality with 66% choice versus 34%, whereas in Graph 2, the bottom mannequin achieved 56% choice in comparison with the bottom reality’s 44%.
Summarized analysis outcomes
Our fine-tuned mannequin delivers vital enhancements on the tool-calling process, outperforming the bottom mannequin throughout all key analysis metrics. Notably, the F1
rating elevated by 81%, whereas the F1 Quasi
rating improved by 35%, reflecting a considerable enhance in each precision and recall. By way of lexical overlap, the mannequin demonstrated enhanced accuracy in matching generated solutions to reference texts —instruments to invoke and construction of the invoked perform— attaining features of 39% and 42% for ROUGE-1
and ROUGE-2
scores, respectively. The llm_judge
analysis additional validates these enhancements, with the fine-tuned mannequin outputs being most popular in 66.2% in opposition to the bottom reality outputs. These complete outcomes throughout a number of analysis frameworks verify the effectiveness of our fine-tuning strategy in elevating mannequin efficiency for real-world situations.
Deploy the mannequin on Amazon Bedrock
To deploy the fine-tuned mannequin, we are able to use the Amazon Bedrock CreateCustomModel
API and use Bedrock On-demand inference with the native mannequin invocation instruments. To deploy the mannequin, full the next steps:
- Create a {custom} mannequin, by pointing to the mannequin checkpoints saved within the escrow S3 bucket:
- Monitor the mannequin standing. Wait till the mannequin reaches the standing
ACTIVE
orFAILED
:
When the mannequin import is full, you will notice it accessible by means of the AWS CLI:
- Configure Amazon Bedrock Customized Mannequin on-demand inference:
- Monitor the mannequin deployment standing. Wait till the mannequin reaches the standing
ACTIVE
orFAILED
:
- Run mannequin inference by means of AWS SDK:
- Submit the inference request by utilizing the
converse
API:
We get the next output response:
Clear up
To scrub up your sources and keep away from incurring extra expenses, observe these steps:
- Delete unused SageMaker Studio sources
- (Non-compulsory) Delete the SageMaker Studio area
- On the SageMaker console, select Coaching within the navigation pane and confirm that your coaching job isn’t working anymore.
- Delete {custom} mannequin deployments in Amazon Bedrock. To take action, use the AWS CLI or AWS SDK to delete it.
Conclusion
This put up demonstrates how one can customise Amazon Nova understanding fashions utilizing the DPO recipe on SageMaker coaching jobs. The detailed walkthrough with a particular deal with optimizing software calling capabilities showcased vital efficiency enhancements, with the fine-tuned mannequin attaining as much as 81% higher F1 scores in comparison with the bottom mannequin with coaching dataset of round 8k information.
The totally managed SageMaker coaching jobs and optimized recipes simplify the customization course of, so organizations can adapt Amazon Nova fashions for domain-specific use instances. This integration represents a step ahead in making superior AI customization accessible and sensible for organizations throughout industries.
To start utilizing the Nova-specific recipes, go to the SageMaker HyperPod recipes repository, the SageMaker Distributed Coaching workshop and the Amazon Nova Samples repository for instance implementations. Our staff continues to increase the recipe panorama based mostly on buyer suggestions and rising machine studying developments, so you’ve got the instruments wanted for profitable AI mannequin coaching.
Concerning the authors
Mukund Birje is a Sr. Product Advertising Supervisor on the AIML staff at AWS. In his present position he’s centered on driving adoption of Amazon Nova Basis Fashions. He has over 10 years of expertise in advertising and branding throughout a wide range of industries. Outdoors of labor you could find him mountaineering, studying, and making an attempt out new eating places. You possibly can join with him on LinkedIn.
Karan Bhandarkar is a Principal Product Supervisor with Amazon Nova. He focuses on enabling clients to customise the inspiration fashions with their proprietary information to higher deal with particular enterprise domains and business necessities. He’s keen about advancing Generative AI applied sciences and driving real-world influence with Generative AI throughout industries.
Kanwaljit Khurmi is a Principal Worldwide Generative AI Options Architect at AWS. He collaborates with AWS product groups, engineering departments, and clients to offer steering and technical help, serving to them improve the worth of their hybrid machine studying options on AWS. Kanwaljit focuses on aiding clients with containerized functions and high-performance computing options.
Bruno Pistone is a Senior World Broad Generative AI/ML Specialist Options Architect at AWS based mostly in Milan, Italy. He works with AWS product groups and huge clients to assist them totally perceive their technical wants and design AI and Machine Studying options that take full benefit of the AWS cloud and Amazon Machine Studying stack. His experience contains: mannequin customization, generative AI, and end-to-end Machine Studying. He enjoys spending time with buddies, exploring new locations, and touring to new locations.