AWS Inferentia and AWS Trainium enable the lowest cost to deploy Llama 3 models with Amazon SageMaker JumpStart

Today, we are excited to announce that Meta Llama 3 inference is now available on AWS Trainium and AWS Inferentia-based instances in Amazon SageMaker JumpStart. Meta Llama 3 models are a collection of pre-trained and fine-tuned generative text models. Amazon Elastic Compute Cloud (Amazon EC2) Trn1 and Inf2 instances powered by AWS Trainium and AWS Inferentia2 provide the most cost-effective way to deploy Llama 3 models on AWS. Deploy up to 50% less than comparable Amazon EC2 instances. These not only reduce the time and cost of training and deploying large-scale language models (LLMs), but also allow developers to easily turn them into high-performance accelerators that meet the scalability and efficiency needs of real-time applications such as chatbots and AI. allow access to. assistant.

In this post, we demonstrate how easy it is to deploy Llama 3 on AWS Trainium and AWS Inferentia-based instances in SageMaker JumpStart.

Meta Llama 3 model on SageMaker Studio

SageMaker JumpStart provides access to a publicly available proprietary foundation model (FM). Foundation models are onboarded and maintained from third-party and proprietary providers. As such, they are released under different licenses specified by the model source. Be sure to check the license of the FM you use. Before downloading or using Content, you are responsible for reviewing and complying with the applicable license terms and determining whether they are acceptable for your use case.

Meta Llama 3 FM can be accessed through SageMaker JumpStart and SageMaker Python SDK in the Amazon SageMaker Studio console. This section describes how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single, web-based visual interface with access to dedicated tools for all machine learning (ML) tasks, from data preparation to building, training, and deploying ML. ) development steps. model. For more information on how to start and set up SageMaker Studio, see Getting Started with SageMaker Studio.

You can selectively access SageMaker JumpStart in the SageMaker Studio console. jump start in the navigation pane. If you are using SageMaker Studio Classic, see Open and use JumpStart in Studio Classic to navigate to a SageMaker JumpStart model.

From the SageMaker JumpStart landing page, you can search for “Meta” in the search box.

Select the Meta Model card to list all models from the SageMaker JumpStart meta.

You can also search for “neuron” to find related model variants. If you don't see your Meta Llama 3 model, try updating your SageMaker Studio version by shutting down and restarting SageMaker Studio.

No-code deployment of Llama 3 Neuron models with SageMaker JumpStart

Select a model card to view details about the model, including its license, data used for training, and usage. There are also two buttons. expand and Notebook previewwhich helps you deploy your model.

when choosing expand, you will see the page shown in the following screenshot. The top section of the page displays the End User License Agreement (EULA) and Terms of Use, which you must accept.

After approving the policy, provide and select the endpoint settings expand Deploy the model endpoint.

Alternatively, you can choose to deploy through a sample notebook. open notebook. The sample notebook provides end-to-end guidance on how to deploy models for inference and clean up resources.

Deploying Meta Llama 3 on AWS Trainium and AWS Inferentia using SageMaker JumpStart SDK

SageMaker JumpStart precompiled Meta Llama 3 models for various configurations to avoid runtime compilation during deployment and fine-tuning. The Neuron Compiler FAQ provides details about the compilation process.

There are two ways to deploy Meta Llama 3 on AWS Inferentia and Trainium-based instances using the SageMaker JumpStart SDK. You can deploy your model with two lines of code for simplicity, or you can focus on having more control over your deployment configuration. The following code snippet shows a simpler deployment mode.

from sagemaker.jumpstart.model import JumpStartModel

model_id = "meta-textgenerationneuron-llama-3-8b"
accept_eula = True
model = JumpStartModel(model_id=model_id)
predictor = model.deploy(accept_eula=accept_eula) ## To set 'accept_eula' to be True to deploy

To perform inference on these models, you must specify arguments accept_eula is true as part of model.deploy() phone. This means that the model has read and agrees to her EULA. The EULA can be found in the model card description or at https://ai.meta.com/resources/models-and-libraries/llama-downloads/.

The default instance type for Meta LIama-3-8B is ml.inf2.24xlarge. Other model IDs supported for deployment are:

meta-textgenerationneuron-llama-3-70b
meta-textgenerationneuron-llama-3-8b-instruct
meta-textgenerationneuron-llama-3-70b-instruct

SageMaker JumpStart has preselected configurations to help you get started, listed in the following table. For more information on how to further optimize these configurations, see Advanced Deployment Configurations.

LIama-3 8B and LIama-3 8B instructions
instance type	OPTION_N_POSITI Oz	OPTION_MAX_ROLLING_BATCH_SIZE	OPTION_TENSOR_PARALLEL_DEGREE	OPTION_DTYPE
ml.inf2.8xlarge	8192	1	2	BF16
ml.inf2.24xlarge (default)	8192	1	12	BF16
ml.inf2.24xlarge	8192	12	12	BF16
ml.inf2.48xlarge	8192	1	twenty four	BF16
ml.inf2.48xlarge	8192	12	twenty four	BF16
LIama-3 70B and LIama-3 70B instructions
ml.trn1.32xlarge	8192	1	32	BF16
ml.trn1.32xlarge (Default)	8192	Four	32	BF16

The following code shows how to customize deployment configurations such as sequence length, tensor parallelism, and maximum rolling batch size.

from sagemaker.jumpstart.model import JumpStartModel

model_id = "meta-textgenerationneuron-llama-3-70b"
model = JumpStartModel(
    model_id=model_id,
    env={
        "OPTION_DTYPE": "bf16",
        "OPTION_N_POSITIONS": "8192",
        "OPTION_TENSOR_PARALLEL_DEGREE": "32",
        "OPTION_MAX_ROLLING_BATCH_SIZE": "4", 
    },
    instance_type="ml.trn1.32xlarge"  
)
## To set 'accept_eula' to be True to deploy 
pretrained_predictor = model.deploy(accept_eula=False)

Now that you have deployed the Meta Llama 3 neuron model, you can call the endpoint to perform inference from the model.

payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
    },
}

response = pretrained_predictor.predict(payload)

Output: 

I believe the meaning of life is
>  to be happy. I believe that happiness is a choice. I believe that happiness 
is a state of mind. I believe that happiness is a state of being. I believe that 
happiness is a state of being. I believe that happiness is a state of being. I 
believe that happiness is a state of being. I believe

For more information about parameters in the payload, see Advanced Parameters.

For more information about passing parameters to control text generation, see Fine-tune and Deploy Llama 2 Models Cost-Effectively with Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium.

cleaning

When your training job is complete and you no longer want to use the existing resources, you can delete them using the following code:

# Delete resources
# Delete the fine-tuned model
predictor.delete_model()

# Delete the fine-tuned model endpoint
predictor.delete_endpoint()

conclusion

Deploying Meta Llama 3 models on AWS Inferentia and AWS Trainium using SageMaker JumpStart demonstrates the lowest cost of deploying large-scale generative AI models like Llama 3 on AWS. These models, including variants such as Meta-Llama-3-8B, Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B, and Meta-Llama-3-70B-Instruct, are suitable for inference on AWS. Use AWS Neuron. Trainium and Inferentia. AWS Trainium and Inferentia offer up to 50% lower deployment costs than comparable EC2 instances.

In this post, we demonstrated how to use SageMaker JumpStart to deploy a Meta Llama 3 model to AWS Trainium and AWS Inferentia. You can deploy these models through the SageMaker JumpStart console and Python SDK, providing flexibility and ease of use. We look forward to seeing how you use these models to build interesting generative AI applications.

To get started using SageMaker JumpStart, see How to Get Started with Amazon SageMaker JumpStart. For more examples of deploying models to AWS Trainium and AWS Inferentia, see our GitHub repository. For more information about how to deploy Meta Llama 3 models on GPU-based instances, see Meta Llama 3 models now available in Amazon SageMaker JumpStart.

About the author

Shinfan I'm a senior applied scientist.
Rachna Chadha I am a Principal Solutions Architect for AI/ML.
Chin Lan Advanced SDE – ML System
pinak panigrahi I am a Senior Solutions Architect at Annapurna ML.
Christopher Witten I'm a software development engineer
Kamran Khan I am in charge of BD/GTM Annapurna ML.
Ashish Ketan I'm a senior applied scientist.
Pradeep Cruz I'm a senior SDM.

Source link

Inscreva-se na binance commented on Apple is looking for ‘generative AI’ engineers, here’s what the JD says: I don't think the title of your article matches th
binance register commented on Everyone’s A System Designer With Heterogeneous Integration: Thanks for sharing. I read many of your blog posts
注册 commented on AI Startups Face Procurement Hurdles for Enterprise SAAS Sales: Your point of view caught my eye and was very inte
创建Binance账户 commented on Google Pixel 8 Pro vs Samsung Galaxy S23 Ultra: I don't think the title of your article matches th
binance registrering commented on Cover Story: Shaping Automation Trends in 2024: Your point of view caught my eye and was very inte

AWS Inferentia and AWS Trainium enable the lowest cost to deploy Llama 3 models with Amazon SageMaker JumpStart

Meta Llama 3 model on SageMaker Studio

No-code deployment of Llama 3 Neuron models with SageMaker JumpStart

Deploying Meta Llama 3 on AWS Trainium and AWS Inferentia using SageMaker JumpStart SDK

cleaning

conclusion

About the author

Leave a Reply

RECENT POSTS

Pengwei Zhu highlights the role of deep statistical machine learning in asthma research and target discovery

AI Use, Growth Challenges, and Funding Cuts: A New Report Examining the State of Nonprofit News

Virtually AI-free Android experience

Meta Llama 3 model on SageMaker Studio

No-code deployment of Llama 3 Neuron models with SageMaker JumpStart

Deploying Meta Llama 3 on AWS Trainium and AWS Inferentia using SageMaker JumpStart SDK

cleaning

conclusion

About the author

Related Posts

Leave a Reply