Introducing the Nova Forge SDK, a seamless way to customize Nova models for enterprise AI

Machine Learning


Large-scale language models (LLMs) have transformed the way we interact with AI, but one size doesn’t quite fit. Out-of-the-box LLMs are trained with extensive general knowledge and refined for a wide range of use cases, but they often fall short when it comes to domain-specific tasks, unique workflows, or unique business requirements. Corporate customers increasingly require specialized LLMs with a deep understanding of their company’s unique data, business processes, and domain-specific terminology. Without customization, you have to choose between accepting a generic response or settling for a middle ground with excessive context engineering. Nova Customization offers an array of features ranging from Amazon Bedrock customization options such as Supervised Fine-Tuning (SFT) and Reinforcement Fine Tuning (RFT) to Amazon SageMaker AI customization features such as SFT, Direct Preference Optimization (DPO), and RFT, as well as LoRA and full rank-based customization.

As models are fine-tuned based on specialized datasets, fundamental features such as the ability to follow instructions, reasoning skills, and broad knowledge expertise are often lost, a phenomenon also known as catastrophic forgetting. Amazon Nova Forge provides tools to overcome this tradeoff by allowing you to build your own frontier models using Nova. Nova Forge customers can start development with an initial model checkpoint, blend it with Amazon Nova-curated datasets, and securely host their custom models on AWS. In some cases, these customization workflows are complex and require technical, infrastructure setup, and significant time investments, creating a high barrier to entry.

To address this issue, we are releasing the Nova Forge SDK, which provides access to LLM customization. This allows teams to leverage the full potential of language models without the challenges of managing dependencies, selecting images, and configuring recipes, ultimately lowering the barrier to entry. Because we view customization as a continuum within a scaling ladder, the Nova Forge SDK supports all customization options, from Amazon Bedrock to Amazon SageMaker AI using Amazon Nova Forge capabilities.

Nova Forge SDK: Purpose built by developers for developers

The Nova Forge SDK provides an integrated toolkit built specifically for Nova customers and developers. It covers the complete customization lifecycle and provides solutions from data preparation tools, training job management to model deployment in one place. The Nova Forge SDK represents an attempt to take the undifferentiated heavy lifting out of LLM customization, so you can focus on experimentation. It complements existing tools by providing workflows with intelligent defaults and guidance, while giving advanced users access to the full functionality of the underlying service SDK when needed. This gives customers both a streamlined workflow for common tasks and complete flexibility for advanced use cases.

The SDK can be understood in three layers.

  • Input layer: This is the layer you pass as input and can include a RuntimeManager object (including the hardware you use, the platform you use, and the IAM role you use in terms of permissions), your training method, your training data, and any hyperparameters you choose to override, and the model you choose to customize.
  • Customizer layer: This is the middle layer that takes these inputs, builds the appropriate recipe configuration behind the scenes, and launches the job using the input values ​​passed to it.
  • output layer: The output layer outputs output artifacts such as Amazon CloudWatch Logs, ML Flow metrics, tensorboard logs, and finally the trained model artifacts. These artifacts can be used to further tune the model using iterative fine-tuning or to deploy the model to Amazon SageMaker AI or Amazon Bedrock for inference.

The following diagram shows a high-level breakdown of these components.

Users of the Nova Forge SDK provide a configured RuntimeManager, a model to customize, and a training method for one of the initialized NovaModelCustomizer API methods. Initializing the customizer involves specifying where training data can be retrieved. This is typically the Amazon Simple Storage Service (Amazon S3) location. Based on these settings, the customizer model handles configuring and starting Amazon SageMaker AI jobs to perform the specified tasks. Finally, the completed task produces an output artifact and (in the case of the “train” API) a trained model, which you can reference through the SDK or directly using the Amazon SageMaker API.

Prerequisites:

Before starting the customization workflow, ensure that your environment has the following settings: This blog post uses Amazon SageMaker Training Jobs (SMTJ) as the computing platform ( do not have You will need an Amazon SageMaker HyperPod cluster to follow this)

Amazon Nova Forge setup is as follows do not have This post is essential as it reviews the basic features available for customizing Nova with Amazon SageMaker AI.

Note: If you are only interested in Amazon SageMaker training jobs, you can completely skip setting up Amazon SageMaker HyperPod.

AWS account and CLI

You will need an AWS account. If you don’t have one, follow the steps to sign up.

Then, follow the instructions to install the AWS Command Line Interface (AWS CLI) and configure it with your credentials. This is used for the first API call used for setup and the AWS CLI credential chain is shared by the Nova Forge SDK.

Finally, configure access to the SageMaker AI platform according to the published documentation. The Nova Forge SDK uses this to provide access to Amazon Nova models and customization features.

The role of IAM

To use the Nova Forge SDK, you need to create two IAM roles. user role, and Execution role. The Nova Forge SDK validates both roles during execution to ensure they have the minimum required privileges. However, we recommend that you complete the following setup steps.

  • User role — The role you assume on your machine when running the SDK and AWS CLI. This role requires Amazon SageMaker AI (CreateTrainingJob, DescribeTrainingJob), Amazon S3 (read/write to data bucket), Amazon CloudWatch Logs (read), and IAM (PassRole) permissions. See the SDK documentation for the complete policy.
  • Execution role — The role in which Amazon SageMaker AI runs training jobs on your behalf. Its trust policy should allow sagemaker.amazonaws.com to assume it. See the SageMaker execution role documentation for the complete set of recommended permissions. For detailed setup instructions, follow the prerequisites to run the SMTJ job.

service quota

This post uses an ml.p5.48xlarge instance for both training and evaluation. Nova Lite 2.0 requires at least the following: 4 instances For SFT training. If you are running training and evaluation jobs at the same time, you may need at least 5 instances.

Request a quota of ml.p5.48xlarge sufficient for your training job usage through the Amazon SageMaker training job service quotas console.

S3 bucket

Create an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region as your training job (this post uses us-east-1), and ensure that your user and execution IAM role have read and write access to the bucket. This is where we will store the training data and output artifacts for this post.

Amazon SageMaker HyperPod (optional)

In addition to Amazon SageMaker Training Jobs (SMTJ), the Nova Forge SDK also supports running jobs on Amazon SageMaker HyperPod (SMHP). Although this post is not focused on customizing SMHP, if you want to run training with SMHP, you will need to set up an Amazon SageMaker HyperPod cluster with a restricted instance group (RIG) so that you can work with Amazon Nova models.

Follow the instructions in the HyperPod RIG Setup Workshop to set up a cluster with a RIG suitable for Amazon Nova customization.

Setting up the Nova Forge SDK

After completing the prerequisites, you can use the following guidance to set up your environment so you can start using the Nova Forge SDK.

Python environment

Nova Forge SDK requires: Python 3.12 or later. We recommend creating a virtual environment to isolate dependencies and avoid conflicts with other packages in your system.

python3.12 -m venv nova-sdk-env
source nova-sdk-env/bin/activate # On Windows: nova-sdk-env\Scripts\activate

Install SDK

You can install the SDK using the following Pip command:

pip install amzn-nova-forge

Verify the installation by importing the main modules in a sample Python file.

from amzn_nova_forge import (
NovaModelCustomizer,
SMTJRuntimeManager,
TrainingMethod,
EvaluationTask,
CSVDatasetLoader,
Model,
)

Below is a brief description of each of these modules.

  • NovaModelCustomizer: Main classes for interacting with the Nova Forge SDK. This contains the core methods of the API and is used to initialize many of your training settings.
  • SMTJ Runtime Manager: Manage the AWS infrastructure required for SMTJ. Customizations, such as instance type selected and number of customization jobs.
  • training method: Enumeration of possible training types that can be used to configure the NovaModelCustomizer.
  • evaluation task: Enumeration of possible evaluation types that can be used to configure the NovaModelCustomizer.
  • CSV dataset loader: Used to load data from CSV files for use with the Nova Forge SDK.
  • model: Enumeration of Amazon Nova models supported by the Nova Forge SDK.

Note: For more information about the various features of the SDK, please refer to the specification document. If you are using the LLM Agent for your coding work, the LLM Agent AGENTS.md Learn about the SDK using files in the repository.

conclusion

The SDK’s unified interface abstracts away data formats and platform-specific configuration complexities, allowing developers to focus on what matters: their data, their domain, and their business goals. Whether you’re starting with fine-tuning your Amazon SageMaker training job or planning to perform your customizations using Amazon SageMaker Hyperpod, the SDK provides a consistent experience throughout your customization journey.

By removing traditional barriers to LLM customization, technical expertise requirements, and time investment, the Nova Forge SDK enables organizations to build models that truly understand their unique contexts without sacrificing the common features that increase the value of the underlying models. The SDK handles configuring compute resources, orchestrating the entire customization pipeline, monitoring training jobs, and deploying endpoints. The result is enterprise AI that is specialized, intelligent, domain expert, and has a wide range of capabilities.

Ready to customize your own Nova models? Get started with the Nova Forge SDK on GitHub, explore the complete documentation, and start building models for your enterprise’s needs.


About the author

Mahima Chaudhary

Mahima Chaudhary is a Machine Learning Engineer on the Amazon Nova Training Experience team, where she works on the Nova Forge SDK and Reinforcement Fine-Tuning (RFT) to help customers customize and fine-tune Nova models on AWS. She brings expertise in MLOps and LLMOps and has a track record of building scalable, production-grade ML systems across aviation, healthcare, insurance, and finance before Amazon. When she’s not shipping models, the California-based actress can be found chasing sunsets on new hiking trails, experimenting in the kitchen, or diving deep down documentary rabbit holes.

Anupam Dewan

Anupam Dewan is a Senior Solutions Architect on the Amazon Nova team with a passion for generative AI and its real-world applications. His focus is on Nova customization and Nova Forge, helping companies realize the true potential of their LLM through the power of customization. I’m also passionate about teaching data science and analytics and helping enterprises build LLMs that work for their business. Outside of work, you can find me hiking, volunteering, or enjoying nature.

Swapneel Singh

Swapneil Singh is a software development engineer on the Amazon Nova Training Experience team, where he builds developer tools to customize Amazon Nova models. He is a core contributor to the Nova Forge SDK and Amazon Nova User Guide, helping customers fine-tune and deploy custom Nova models on AWS. Previously, he worked on telemetry and log processing with AWS Elastic Container Services. Outside of work, you can find him tinkering with AI orchestration and programming languages, or at a library in Boston.



Source link