Meta Llama 3 models are now available on Amazon SageMaker JumpStart

Today, we are excited to announce that Meta Llama 3 foundational models are now available for deploying and running inference through Amazon SageMaker JumpStart. Llama 3 models are a collection of pre-trained and fine-tuned generative text models.

This post explains how to discover and deploy Llama 3 models via SageMaker JumpStart.

What is Metalrama 3?

Llama 3 comes in two parameter sizes (8B and 70B with a context length of 8K) to support a wide range of use cases with improved inference, code generation, and instruction follow-up. Llama 3 uses a decoder-only transformer architecture and a new tokenizer that improves model performance at 128k size. Additionally, Meta has improved the post-training procedure, significantly reducing the false rejection rate, improving alignment, and increasing the diversity of model responses. You can now combine the performance of Llama 3 with the benefits of MLOps control using Amazon SageMaker features such as SageMaker Pipelines, SageMaker Debugger, and container logs. Additionally, the models are deployed in his secure AWS environment under the control of a VPC, which helps provide data security.

What is SageMaker JumpStart?

SageMaker JumpStart allows you to choose from a wide selection of publicly available foundation models. An ML practitioner can deploy the underlying model from a network-isolated environment to his dedicated SageMaker instance and customize the model for model training and deployment using SageMaker. You can now discover and deploy Llama 3 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This will allow you to derive model performance and MLOps control using his SageMaker features such as SageMaker Pipelines, SageMaker Debugger, and container logs. Models are deployed in a secure environment in AWS and under the control of a VPC, which helps provide data security. Llama 3 models are currently available for deployment and inference in Amazon SageMaker Studio. us-east-1 (Northern Virginia), us-east-2 (Ohio), us-west-2 (Oregon), eu-west-1 (Ireland) and ap-northeast-1 (Tokyo) AWS Region.

discover the model

The base model is accessible through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. This section describes how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface with access to dedicated tools for all ML development steps, from data preparation to building, training, and deploying ML models. can be executed. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

SageMaker Studio provides access to SageMaker JumpStart, which includes pre-trained models, notebooks, and pre-built solutions. Pre-built automated solutions.

From the SageMaker JumpStart landing page, you can easily discover different models by browsing through different hubs named after model providers. Llama 3 models can be found on Meta Hub. If you don't see your Llama 3 model, try shutting down and restarting to update your version of SageMaker Studio. For more information, see Shut down and update Studio Classic apps.

You can find the Llama 3 model by searching for “Meta-llama-3” in the search box on the top left.

[メタハブ]You can find all meta models available in SageMaker JumpStart by clicking .

Clicking on a model card opens the corresponding model details page, from which you can easily deploy the model.

Deploy the model

when choosing expand Once you accept the EULA terms, deployment will begin.

You can monitor the progress of the deployment on the page that appears after you click the Deploy button.

Alternatively, you can choose open notebook Deploy through a sample notebook. The sample notebook provides end-to-end guidance on how to deploy models for inference and clean up resources.

To deploy using a notebook, first, model_id. You can deploy any of the selected models to SageMaker using the following code.

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "meta-textgeneration-llama-3-70b-instruct")
predictor = model.deploy(accept_eula=False)

By default accept_eula is set to False. You must manually accept the EULA to successfully deploy the endpoint. This constitutes your acceptance of the User License Agreement and Terms of Use. The license agreement is also available on the Llama website. This will deploy the model to SageMaker with default configurations including the default instance type and default His VPC configuration. You can change these configurations by specifying non-default values. JumpStartModel. For more information, please see the following documentation:

The following table lists all Llama 3 models available in SageMaker JumpStart and model_idsthe default instance type and maximum number of total tokens (the sum of the number of input tokens and the number of generated tokens) supported for each of these models.

Model name	model id	Maximum total number of tokens	Default instance type
Metalrama-3-8B	Metatext Generation-Rama-3-8B	8192	ml.g5.12xlarge
Metalrama-3-8B-Instructions	Metatext Generation-Rama-3-8B-Instruction	8192	ml.g5.12xlarge
Metalrama-3-70B	Metatext Generation-Rama-3-70b	8192	ml.p4d.24xlarge
Meta-Rama-3-70B-Instructions	metatext generation-rama-3-70b-instruction	8192	ml.p4d.24xlarge

perform inference

After you deploy your model, you can run inference against the deployed endpoints through SageMaker predictors. A fine-tuned instruction model (Llama 3: 8B Instructions and 70B Instructions) accepts the history of chats between the user and the chat assistant and generates subsequent chats. Pre-trained models (Llama 3: 8B and 70B) require a string prompt and perform text completion on the provided prompt.

predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The maximum number of new tokens controls the size of the output produced by the model. This is not the same as the number of words, because the model's vocabulary is not the same as the English vocabulary, and each token may not be an English word. The temperature parameter controls the randomness of the output. The higher the temperature, the more creative and hallucinogenic output you will get. All inference parameters are optional.

Example prompt for 70B model

The Llama 3 model can be used for text completion of any text. Through text generation, you can perform various tasks such as question answering, language translation, and sentiment analysis. The input payload to the endpoint looks like the following code.

payload = {
    “inputs”: str,
   (optional)"parameters":{"max_new_tokens":int, "top_p":float, "temperature":float}
}

Below is a sample example prompt and the text generated by the model.All output is generated using inference parameters {"max_new_tokens":64, "top_p":0.9, "temperature":0.6}.

Input: Simply put, the theory of relativity states that
Output: 1) the laws of physics are the same for all observers, and 2) the speed of light is constant for all observers. The first part of the theory is known as the special theory of relativity, and the second part is known as the general theory of relativity.\nThe special theory of relativity states

Input: I believe the meaning of life is
Output:  to live a life of purpose, on purpose. I believe that we are all here to make a difference in the world, and that we are all here to make a difference in the lives of others. I believe that we are all here to make a difference in the world, and that we are all here to mak

Input: A brief message congratulating the team on the launch:\n\nHi everyone,\n\nI just 
Output: wanted to say congratulations on the launch of the new website! It looks great and I'm sure it will be a valuable resource for our customers. I know that a lot of hard work went into making this happen, so thank you all for your efforts.\n\nBest regards,\nJohn Doe\n

The following example shows how to use an Llama 3 model with small-shot in-context learning, which provides training samples available to the model. This process performs inference only on the deployed model and does not change the model weights.

Input:
Translate English to French:\nsea otter => loutre de mer\npeppermint => menthe poivrée\nplush girafe => girafe peluche\ncheese =>
Output:  fromage\npanda => panda\npenguin => pingouin\npolar bear => ours polaire\npuffin => macareux\nsloth => paresseux\nwalrus => morse\nwhale => baleine\nzebra => z\u00e8bre\nTranslate English to German:\n

Example prompts for the 70B-Instruct model

In the Llama 3 instruction model, which is optimized for interaction use cases, the input to the instruction model endpoint is the previous history between the chat assistant and the user. You can ask questions related to the conversation so far. You can also provide system configuration, such as personas, that define the behavior of your chat assistant. The input payload format is the same as the basic pretrained model, but the input text must be formatted in the following way:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

[SYSTEM_TEXT]<|eot_id|><|start_header_id|>user<|end_header_id|>

[USER_TEXT]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This instruction template optionally system Add rolls and include as many alternating rolls as you want in your turn-based history. The final role should always be: assistant Ends with two new line breaks.

Now consider some examples of prompts and responses from the model. In the following example, a user asks the assistant a simple question.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: The classic condiment! Mayonnaise is a thick, creamy emulsion sauce made from a combination of oil, egg yolks, acid (such as vinegar or lemon juice), and seasonings. Here's a simple recipe to make mayonnaise at home:\n\n**Ingredients:**\n\n* 2 egg yolks\n* 1 tablespoon lemon juice or vinegar (such as apple cider vinegar or white wine vinegar)\n* 1/2 teaspoon Dijon mustard (optional, but recommended for flavor)\n* 1/2 cup (120 ml) neutral-tasting oil, such as canola, grapeseed, or sunflower oil\n* Salt, to taste\n\n**Instructions:**\n\n1. **Start with room temperature ingredients**: This is crucial for emulsification to occur.\n2. **In a medium-sized bowl**, whisk together the egg yolks, lemon juice or vinegar, and Dijon mustard (if using) until well combined.\n3. **Slowly add the oil**: While continuously whisking the egg yolk mixture, slowly pour in the oil in a thin, steady stream. Start with a very slow drizzle and gradually increase the flow as the mixture thickens.\n4. **Whisk constantly**: Keep whisking until the mixture has doubled in volume and has a thick, creamy consistency. This should take about 5-7 minutes.\n5. **Check the consistency**: Stop whisking when the mayonnaise has reached the desired consistency. It should be smooth, thick, and hold its shape when the whisk is lifted.\n6. **Season with salt**: Add salt to taste, and whisk until well combined.\n7. **Taste and adjust**: If the mayonnaise is too thick, add a little water. If it's not thick enough, add a little more oil.\n\n**Tips and Variations:**\n\n* Use a neutral-tasting oil to avoid flavoring the mayonnaise.\n* If you're using a stand mixer, use the whisk attachment and start with a slow speed, increasing as needed.\n* Add flavorings like minced garlic, chopped fresh herbs (e.g., parsley, dill, or tarragon), or grated ginger to create different flavor profiles.\n* For a lighter mayonnaise, use only 1 egg yolk or add a little water to thin it out.\n* If the mayonnaise breaks or separates, don't worry! Simply start again with a new egg yolk and slowly whisk in the broken mayonnaise.\n\n**Homemade Mayonnaise vs. Store-Bought:**\n\nWhile store-bought mayonnaise is convenient, homemade mayonnaise offers several advantages:\n\n* Freshness: Homemade mayonnaise is made to order, ensuring optimal flavor and texture.\n* Customization: You can tailor the flavor to your liking with various seasonings and ingredients.\n* No preservatives: Homemade mayonnaise doesn't contain added preservatives, making it a healthier option.\n\nEnjoy your freshly made mayonnaise!

In the following example, a user is having a conversation with an assistant about tourist attractions in Paris. The user then asks about the first option recommended by her chat assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great:\n\n1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).\n2. **Panoramic views**: The Eiffel Tower offers breathtaking 360-degree views of the city from its observation decks. You can see many of Paris' famous landmarks, such as the Seine River, the Louvre, and Montmartre.\n3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and is a popular spot for proposals, weddings, and romantic dinners. The tower's iron latticework and twinkling lights at night add to its romantic ambiance.\n4. **Historical significance**: The Eiffel Tower has played a significant role in French history, serving as a communication tower during World War I and a symbol of French culture and engineering.\n5. **Accessibility**: The Eiffel Tower is easily accessible by public transportation or on foot, and there are various ways to experience it, including taking the stairs, elevator, or a guided tour.\n6. **Nighttime spectacle**: The Eiffel Tower is even more magical at night, when it's illuminated with thousands of twinkling lights. The light show, known as the \"Eiffel Tower Sparkle,\" takes place every hour on the hour from dusk to midnight.\n7. **Iconic status**: The Eiffel Tower has been featured in countless films, literature, and art, making it an instantly recognizable symbol of Paris and France.\n\nOverall, the Eiffel Tower is a must-visit attraction in Paris, offering a unique combination of history, engineering, romance, and stunning views.

The following example sets the configuration of the system.

Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: Eiffel's iron lace\nRiver Seine's gentle flow by\nMontmartre's charm calls
Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nAlways answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Output: ✈️🛬️🇺🇸

cleaning

Once your notebook has finished running, be sure to delete any resources you created during the process so that billing will stop. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

conclusion

In this post, you learned how to get started with Llama 3 models in SageMaker Studio. You now have access to four of his Llama 3 basic models containing billions of parameters. The base model is pre-trained, reducing training and infrastructure costs and also allowing customization for your use case. Check out SageMaker JumpStart for SageMaker Studio to get started today.

About the author

Kyle Ulrich I'm an Applied Scientist II at AWS.
Shinfan I'm a senior applied scientist at AWS.
Chin Lan I'm a senior software development engineer at AWS.
Haotian An I am a software development engineer II at AWS.
Christopher Witten I am a software development engineer II at AWS.
tyler osterberg I am a software development engineer at AWS.
Manan Shah I'm a software development manager at AWS.
Jonathan Guinegani I'm a senior software development engineer at AWS.
adrianna simmons I'm a senior product marketing manager at AWS.
Joon Won I'm a senior product manager at AWS.
Ashish Ketan I'm a senior applied scientist at AWS.
Rachna Chadha I am a Principal Solutions Architect for AI/ML at AWS.
Deepak Rupakula I am a Principal GTM Specialist at AWS.