Implement on-demand deployments using customized Amazon Nova models on Amazon Bedrock

Amazon Bedrock offers model customization capabilities for customers to adjust their foundation model (FMS) versions to suit their specific needs through features such as tweaking and distillation. Today we are announcing the launch of an on-demand deployment of customized models ready for deployment on Amazon Bedrock.

On-demand deployment of customized models offers usage patterns and additional deployment options to shrink. This approach allows you to invoke customized models only when needed, and requests are processed in real time without the need for pre-knowledged computational resources.

The On-Demand Deployment Options includes a token-based pricing model that charges based on the number of tokens processed during inference. This pay-as-you-go approach complements existing provisioned throughput options and provides the flexibility for users to choose the best deployment method for their specific workload requirements and cost goals.

In this post, you will see the Custom Model On-Demand Deployment Workflow for Amazon Bedrock and provide a step-by-step implementation guide using both the AWS Management Console and the API or AWS SDK. It also covers best practices and considerations for deploying customized Amazon Nova models on Amazon Bedrock.

Understanding Custom Model On-Demand Deployment Workflows

The model's customization lifecycle represents an end-to-end journey from conceptualization to deployment. The process begins by defining a specific use case, preparing and formatting appropriate data, and performing model customizations through features such as Amazon Bedrock tweaking and distilling Amazon Bedrock models. Each stage is built on previous stages and creates a pathway to deploy production-ready generative AI capabilities to meet your requirements. The following diagram illustrates this workflow.

After customizing the model, the evaluation and deployment phases determine how the model will be available for inference. This is where on-demand deployment of custom models is valuable, offering deployment options tailored to variable workloads and cost-oriented implementations. When using on-demand deployment, you can use the model identifier to invoke a customized model via the AWS console or standard API operations. On-demand deployments offer flexibility while maintaining performance expectations, allowing you to seamlessly integrate customized models into your applications with the same serverless experience provided by Amazon Bedrock. All computing resources are automatically managed based on actual usage. Workflows support iterative improvements, allowing you to refine your model based on your evaluation results and evolving business needs.

Prerequisites

This post assumes you have a customized Amazon Nova model before deploying using on-demand deployment. On-demand deployment requires a newly customized Amazon Nova model after this launch. Previously customized models are not compatible with this deployment option. See these resources for instructions on creating or customizing NOVA models through fine tuning or distillation.

After successfully customizing your Amazon Nova model, you can continue your deployment using the on-demand deployment options, as detailed in the next section.

Implementation Guide for On-Demand Deployment

There are two main approaches to implementing on-demand deployments on Amazon Bedrock's customized Amazon Nova models: using the Amazon Bedrock Console or using an API or SDK. First, we'll explore how to deploy a model through the Amazon Bedrock Console. It provides a user-friendly interface for configuring and managing deployments.

Step-by-step implementation using the Amazon Bedrock console

To implement on-demand deployment of customized Amazon Nova models on Amazon Bedrock using the console, follow these steps:

In the Amazon Bedrock console, select the customized model you want to deploy (fine tweak or distill the model). choose Set inference Select Deployed for on-demand useas shown in the following screenshot.

under Deployment detailsPlease enter it name And a explanation. There are options to add tagas shown in the following screenshot. choose Create Start on-demand deployment of your customized models.

under Custom model deploymentThe status of the deployment must be deployed, active, or fail, as shown in the following screenshot.

You can find it by selecting the deployment Unfold, Creation time, Last updatedand situation For selected custom models.

The custom model is deployed and ready using on-demand deployment. Try it on a test playground, or Chat/Text Playgroundchoose Custom Model under category. Select and select a model on demand under inferenceselect by expansion name, as shown in the following screenshot.

Step-by-step implementation using API or SDK

After successfully training the model, you can deploy it to assess the quality and latency of the response, or use the model as a production model for use cases. I'll use it CreateCustomModelDeployment An API that creates model deployments for trained models. The following steps show how to use the API to deploy and remove custom model deployments for on-demand inference.

import boto3
import json

# First, create and configure an Amazon Bedrock client:
bedrock_client = boto3.client(
service_name="bedrock",region_name="")

# create custom model deployment 
response = bedrock_client.create_custom_model_deployment(
                        modelDeploymentName="",
                        modelArn="",
                        description="",
                        tags=[
{"key":"",
 "value":""},
   ])

Once you have successfully created a model deployment, you can check it using the status of the deployment. GetCustomModelDeployment Next API:

response = bedrock_client.get_custom_model_deployment( 
			customModelDeploymentIdentifier="")

GetCustomModelDeployment Supports three states: Creating , Active and Failed. If there is a response status Activeyou should be able to use custom models through on-demand deployments InvokeModel or Converse As shown in the following example, the API:

# Define Runtime Client
bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="") 
# invoke a deployed custom model using Converse API
response = bedrock_runtime.converse(
                    modelId="",
                    messages=[
                        {
                            "role": "user",
                            "content": [
                                {
                                    "text": "",
                                }
                            ]
                        }
                    ]
                )

result = response.get('output')
print(result)

# invoke a deployed custom model using InvokeModel API
request_body = {
    "schemaVersion": "messages-v1",
    "messages": [{"role": "user", 
                  "content": [{"text": ""}]}],
    "system": [{"text": ""}],
    "inferenceConfig": {"maxTokens": 500, 
                        "topP": 0.9, 
                        "temperature": 0.0
                        }
}
body = json.dumps(request_body)
response = bedrock_runtime.invoke_model(
        modelId="",
        body=body
    )

# Extract and print the response text
model_response = json.loads(response["body"].read())
response_text = model_response["output"]["message"]["content"][0]["text"]
print(response_text)

By following these steps, you can deploy and use customized models via the Amazon Bedrock API and instantly use efficient, high-performance models tailored to your use case through on-demand deployment.

Best Practices and Considerations

The success of implementing an on-demand deployment using a customized model depends on understanding several operational factors. These considerations (such as latency, regional availability, quota limits, deployment options choice, and cost management strategies) directly affect the ability to deploy effective solutions while optimizing resource utilization. The following guidelines will help you make informed decisions when implementing inference strategies:

Cold Start Latency – When using on-demand deployments, depending on model size, you may experience initial cold start latency that usually lasts a few seconds. This occurs when the deployment is not receiving recent traffic and needs to replicate the computational resources.
Regional availability – Upon release, custom model deployments will be available in the US East (N. Virginia) of Amazon Nova models.
Quota Management – Each custom model deployment has a specific assignment.
- Tokens per minute (TPM)
- Request per minute (rpm)
- Number of Creating Status expansion
- Total on-demand deployments on a single account

Each deployment operates independently within the assigned quota. If the deployment exceeds the TPM or RPM allocation, an incoming request will be slotted. You can request an increase in quota by submitting your ticket or contacting your AWS Account Team.

Deploying Custom Models and Choosing Provisioning Throughput – You can set up inference for your custom model by creating a custom model deployment (for on-demand use) or by purchasing provisioning throughput. The choice depends on the supported areas and models of each inference option, throughput requirements, and cost considerations. These two options work independently and can be used simultaneously for the same custom model.
Cost Management – On-demand deployment uses pay-as-you-go pricing models based on the number of tokens processed during inference. You can use cost allocation tags for on-demand deployments to track and manage inference costs and enable better budget tracking and cost optimization through AWS Cost Explorer.

cleaning

If you are testing your on-demand deployment feature and do not plan to continue using it, it is important to clean up your resources to avoid unnecessary costs. Here's how to remove it using the Amazon Bedrock Console:

Go to Deploy Custom Model
Select the deployment you want to delete
Delete the deployment

Here's how to delete using the API or SDK:

You can use it to remove a custom model deployment DeleteCustomModelDeployment API. The following example shows how to remove a custom model deployment:

# delete deployed custom model deployment
response = bedrock_client.delete_custom_model_deployment(
              customModelDeploymentIdentifier=""
                        )

Conclusion

The introduction of on-demand deployment of customized models on Amazon Bedrock represents a major advancement in making AI models more accessible, cost-effective and flexible for businesses of all sizes. On-demand deployments have the following benefits:

Cost Optimization – Pay-as-you-go pricing allows you to pay only for the calculation resources you are actually using
Operational simplicity – Automatic resource management eliminates the need for manual infrastructure provisioning
Scalability – Seamless handling of variable workloads without prepaid capacity planning
Flexibility – Freedom to choose on-demand and provisioning throughput based on your specific needs

It's easy to get started. Start by completing model customizations through fine tuning or distillation, then select on-demand deployment using the AWS Management Console or API. Configure deployment details, validate model performance in a test environment, and seamlessly integrate it into your production workflow.

Start exploring on-demand deployments of customized models with Amazon Bedrock now! Visit Amazon Bedrock documentation to launch your model customization journey and experience the benefits of a flexible, cost-effective AI infrastructure. For practical implementation examples, see our GitHub repository with detailed code samples for customizing Amazon NOVA models and evaluating them using on-demand custom model deployments.

About the author

Yang Yang Chang He is a senior Generated AI Data Scientist at Amazon Web Services, working as a Generated AI Specialist on cutting-edge AI/ML technologies, helping customers use Generated AI to achieve the desired results. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves to travel, work out and explore new things.

Sovik Kumar Nass AI/ML and is a senior AI solution architect with AWS. He has experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management and IoT. He holds a master's degree from the University of South Florida Friborg University Swiss University and a bachelor's degree from the Indian Institute of Technology Haragpur. Outside of work, Sovik likes to travel, ride ferries and watch movies.

Ishan Sin A Sr. Generated AI Data Scientist at Amazon Web Services, helping customers build innovative, responsible generative AI solutions and products. With a strong AI/ML background, Ishan specializes in building generation AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike paths, and spending time with his wife and dog Bo.

Koushik Mani I am an Associate Solutions Architect at AWS. He worked as a software engineer for two years at Telstra, focusing on machine learning and cloud computing use cases. He received his Master's degree in Computer Science from the University of Southern California. He is passionate about machine learning and generative AI use cases and building solutions.

Rishabh Agrawal I am a senior software engineer working on AWS AI services. In his spare time, he enjoys hiking, traveling and reading.

Sriya Sharma I am AWS Senior Technical Product Manager and is committed to leveraging the power of generator AI to deliver innovative, customer-centric products. Sriya holds a master's degree from Duke University. Outside of work, she loves to travel, dance and sing.