Mixtral 8x22B now available on Amazon SageMaker JumpStart

Today, the Mixtral-8x22B Large-Scale Language Model (LLM) developed by Mistral AI is available to customers through Amazon SageMaker JumpStart, allowing them to deploy and run inference with one click. You can try this model using SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models to get started with ML. This post explains how to discover and deploy the Mixtral-8x22B model.

What is Mixtral 8x22B?

Mixtral 8x22B is Mistral AI's latest open weight model, setting a new standard for performance and efficiency of available underlying models as measured by Mistral AI across industry standard benchmarks. It is a sparse mixture of experts (SMoE) model that uses only 39 billion of the 141 billion active he parameters, making it cost-effective for its scale. Continuing Mistral AI's belief in the power of public models and widespread distribution to foster innovation and collaboration, Mixtral 8x22B was released with Apache 2.0, allowing you to explore, test, and deploy your models. The Mixtral 8x22B is an attractive option for customers who prioritize quality from commonly available models and for customers who seek the higher quality of mid-sized models such as the Mixtral 8x7B and GPT 3.5 Turbo while maintaining high throughput .

Mixtral 8x22B has the following advantages:

Multilingual native functionality in English, French, Italian, German, and Spanish
Strong math and coding skills
Enables function calls to enable application development and large-scale modernization of technology stacks
A 64,000-token context window lets you recall accurate information from large documents.

About Mistral AI

Mistral AI is a Paris-based company founded by experienced researchers from Meta and Google DeepMind. During his tenure at DeepMind, Arthur Mensch (Mistral CEO) was a lead contributor to major LLM projects such as Flamingo and Chinchilla, while Guillaume Lample (Mistral Principal Investigator) and Timothée Lacroix (Mistral CTO) contributed to his LLaMa LLM during his tenure at DeepMind. led the development of In meta. These three are part of a new breed of founders who combine deep technical expertise with operational experience working on cutting-edge ML technologies at the largest research institutions. Mistral AI has championed small base models with superior performance and commitment to model development. The company continues to pioneer the frontiers of artificial intelligence (AI), delivering models that offer unparalleled cost efficiency at scale and making models available to everyone with attractive performance-to-cost ratios. I am. The Mixtral 8x22B is a natural continuation of the publicly available Mistral AI family of models, including the Mistral 7B and Mixtral 8x7B, also available on SageMaker JumpStart. Most recently, Mistral launched a commercial enterprise-grade model, the Mistral Large, which offers top-class performance and outperforms other popular models with native proficiency across multiple languages.

What is SageMaker JumpStart?

SageMaker JumpStart allows ML practitioners to choose from a growing list of top-performing foundational models. ML practitioners can deploy the underlying model on a dedicated Amazon SageMaker instance in a network-isolated environment and customize the model using SageMaker for model training and deployment. You can now discover and deploy Mixtral-8x22B with just a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to derive model performance and MLOps control using SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, and container logs. . This model is deployed in a secure environment in AWS, under the control of a VPC, and provides data encryption at rest and in transit.

In addition to complying with various regulatory requirements, SageMaker also complies with standard security frameworks such as ISO27001 and SOC1/2/3. Compliance frameworks such as General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), Health Insurance Portability and Accountability Act (HIPAA), and Payment Card Industry Data Security Standard (PCI DSS) are supported. data processing, storage, and processes meet strict security standards.

SageMaker JumpStart availability varies by model. Mixtral-8x22B v0.1 is currently supported in the US East (N. Virginia) and US West (Oregon) AWS regions.

discover the model

The Mixtral-8x22B foundation model can be accessed through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. This section describes how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface with access to dedicated tools for all ML development steps, from data preparation to building, training, and deploying ML models. can be executed. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

SageMaker Studio allows you to selectively access SageMaker JumpStart. jump start in the navigation pane.

From the SageMaker JumpStart landing page, you can search for “Mixtral” in the search box. You will see search results showing the Mixtral 8x22B Instruct, various Mixtral 8x7B models, and Dolphin 2.5 and 2.7 models.

Select a model card to view details about the model, including its license, data used for training, and usage. Also, expand button. It can be used to deploy models and create endpoints.

SageMaker enables seamless logging, monitoring, and auditing of deployed models and natively integrates with services such as AWS CloudTrail for logging and monitoring to provide insight into API calls and with Amazon CloudWatch. You can collect metrics, logs, and event data to inform your model's resources. use.

Deploy the model

Select to start deployment expand. Once the deployment is complete, an endpoint is created. To test the endpoint, pass a sample inference request payload or use the SDK and select the test option. If you select the option to use the SDK, you will be presented with sample code that you can use in your favorite notebook editor in SageMaker Studio. This requires an AWS Identity and Access Management (IAM) role and policy attached to restrict access to the model. Additionally, if you choose to deploy your model endpoint within SageMaker Studio, you will be prompted to select an instance type, initial number of instances, and maximum number of instances. The ml.p4d.24xlarge and ml.p4de.24xlarge instance types are currently the only instance types supported by Mixtral 8x22B Instruct v0.1.

To deploy using the SDK, first: model_id something of value huggingface-llm-mistralai-mixtral-8x22B-instruct-v0-1. You can deploy any of the selected models to SageMaker using the following code. Similarly, you can deploy Mixtral-8x22B instructions using your own model ID.

from sagemaker.jumpstart.model import JumpStartModel model = JumpStartModel(model_id=""huggingface-llm-mistralai-mixtral-8x22B-instruct-v0-1") predictor = model.deploy()

This deploys your model to SageMaker with default settings, such as the default instance type and default VPC settings. You can change these configurations by specifying non-default values in JumpStartModel.

After deployment, you can perform inference on the deployed endpoints via SageMaker predictors.

payload = {"inputs": "Hello!"} 
predictor.predict(payload)

Example prompt

You can work with the Mixtral-8x22B model just like any standard text generation model. The model processes the input sequence and outputs the predicted next word in the sequence. This section provides examples of prompts.

Mixtral-8x22b Instructions

The instruction-adjusted version of Mixtral-8x22B accepts a form of instruction in which the conversation role begins with a user prompt and must alternate between user instructions and assistants (model answers). The imperative form must be strictly respected or the model will produce suboptimal output. The template used to build prompts for the Instruct model is defined as follows:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]]

<s> and </s> are special tokens that represent the beginning of a string (BOS) and the end of a string (EOS). [INST] and [/INST] It's a regular string.

The following code shows how to format the prompt in imperative format.

from typing import Dict, List

def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
    """Format instructions where conversation roles must alternate user/assistant/user/assistant/..."""
    prompt: List[str] = []
    for user, answer in zip(instructions[::2], instructions[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])
    prompt.extend(["<s>", "[INST] ", (instructions[-1]["content"]).strip(), " [/INST] ","</s>"])
    return "".join(prompt)


def print_instructions(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Input{unbold}\n{prompt}\n\n{bold}> Output{unbold}\n{response[0]['generated_text']}\n")

summary prompt

You can use the following code to get the summary response.

instructions = [{"role": "user", "content": """Summarize the following information. Format your response in short paragraph.

Article:

Contextual compression - To address the issue of context overflow discussed earlier, you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context, so only pertinent information is kept and processed. This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance, as illustrated in the following diagram. This streamlined approach, facilitated by the contextual compression retriever, greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information. It tackles the issue of information overload and irrelevant data processing head-on, leading to improved response quality, more cost-effective LLM operations, and a smoother overall retrieval process. Essentially, it’s a filter that tailors the information to the query at hand, making it a much-needed tool for developers aiming to optimize their RAG applications for better performance and user satisfaction.
"""}]
prompt = format_instructions(instructions)
payload = {
"inputs": prompt,
"parameters": {"max_new_tokens": 1500}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

Below is an example of the expected output.

> > Input
<s>[INST] Summarize the following information. Format your response in short paragraph.

Article:

Contextual compression - To address the issue of context overflow discussed earlier, you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context, so only pertinent information is kept and processed. This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance, as illustrated in the following diagram. This streamlined approach, facilitated by the contextual compression retriever, greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information. It tackles the issue of information overload and irrelevant data processing head-on, leading to improved response quality, more cost-effective LLM operations, and a smoother overall retrieval process. Essentially, it’s a filter that tailors the information to the query at hand, making it a much-needed tool for developers aiming to optimize their RAG applications for better performance and user satisfaction. [/INST] </s>
> Output
<s>[INST] Summarize the following information. Format your response in short paragraph.

Article:

Contextual compression - To address the issue of context overflow discussed earlier, you can use contextual compression to compress and filter the retrieved documents in alignment with the query’s context, so only pertinent information is kept and processed. This is achieved through a combination of a base retriever for initial document fetching and a document compressor for refining these documents by paring down their content or excluding them entirely based on relevance, as illustrated in the following diagram. This streamlined approach, facilitated by the contextual compression retriever, greatly enhances RAG application efficiency by providing a method to extract and utilize only what’s essential from a mass of information. It tackles the issue of information overload and irrelevant data processing head-on, leading to improved response quality, more cost-effective LLM operations, and a smoother overall retrieval process. Essentially, it’s a filter that tailors the information to the query at hand, making it a much-needed tool for developers aiming to optimize their RAG applications for better performance and user satisfaction. [/INST] </s>Contextual compression is a technique used to manage the issue of context overflow in information retrieval. This method involves compressing and filtering the retrieved documents to align with the query's context, ensuring that only relevant information is processed. This is achieved through a two-step process: a base retriever fetches the initial set of documents, and a document compressor refines these documents by either reducing their content or excluding them entirely based on relevance.

The contextual compression retriever significantly improves the efficiency of Retrieval-Augmented Generation (RAG) applications by extracting and utilizing only the essential information from a large pool of data. This approach addresses the problem of information overload and irrelevant data processing, leading to improved response quality, cost-effective operations, and a smoother retrieval process. In essence, contextual compression acts as a filter that tailors the information to the specific query, making it an indispensable tool for developers aiming to optimize their RAG applications for better performance and user satisfaction.

multilingual translation prompts

You can use the following code to get the multilingual translation response.

Prompt

instructions = [{"role": "user", "content": """\
<You are a multilingual assistant. Translate the following sentences in the order in which they are presented into French, German, and Spanish. Make sure to label each section as French, German, and Spanish. [/INST]

1.	Customer: "I recently ordered a set of wireless headphones, but I received a different model. What steps should I take to receive the correct product I ordered?"
2.	Customer: "I purchased a customizable laptop last month and opted for specific upgrades. However, the laptop's performance isn't as expected. Can I have a technician look into it, or should I consider returning it?"
3.	Customer: "My order for a designer handbag was supposed to include a matching wallet as part of a promotional deal, but the wallet was not in the package. How can this issue be resolved?"
4.	Customer: "I see that the tracking information for my order of ceramic cookware shows it was delivered, but I haven't received it. Could you assist in determining where my package might be?"
5.	Customer: "I'm trying to buy an antique mirror from your vintage collection, but the website keeps giving me an error when I try to check out. Is there another way to complete my purchase?" 
"""}]
prompt = format_instructions(instructions)
payload = {
"inputs": prompt,
"parameters": {"max_new_tokens": 2000, "do_sample": True}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

Below is an example of the expected output.

> Input
<s>[INST] <You are a multilingual assistant. Translate the following sentences in the order in which they are presented into French, German, and Spanish. Make sure to label each section as French, German, and Spanish. [/INST]


1. Customer: "I recently ordered a set of wireless headphones, but I received a different model. What steps should I take to receive the correct product I ordered?"
2. Customer: "I purchased a customizable laptop last month and opted for specific upgrades. However, the laptop's performance isn't as expected. Can I have a technician look into it, or should I consider returning it?"
3. Customer: "My order for a designer handbag was supposed to include a matching wallet as part of a promotional deal, but the wallet was not in the package. How can this issue be resolved?"
4. Customer: "I see that the tracking information for my order of ceramic cookware shows it was delivered, but I haven't received it. Could you assist in determining where my package might be?"
5. Customer: "I'm trying to buy an antique mirror from your vintage collection, but the website keeps giving me an error when I try to check out. Is there another way to complete my purchase?" [/INST] </s>
> Output
<s>[INST] <You are a multilingual assistant. Translate the following sentences in the order in which they are presented into French, German, and Spanish. Make sure to label each section as French, German, and Spanish. [/INST]


1. Customer: "I recently ordered a set of wireless headphones, but I received a different model. What steps should I take to receive the correct product I ordered?"
2. Customer: "I purchased a customizable laptop last month and opted for specific upgrades. However, the laptop's performance isn't as expected. Can I have a technician look into it, or should I consider returning it?"
3. Customer: "My order for a designer handbag was supposed to include a matching wallet as part of a promotional deal, but the wallet was not in the package. How can this issue be resolved?"
4. Customer: "I see that the tracking information for my order of ceramic cookware shows it was delivered, but I haven't received it. Could you assist in determining where my package might be?"
5. Customer: "I'm trying to buy an antique mirror from your vintage collection, but the website keeps giving me an error when I try to check out. Is there another way to complete my purchase?" [/INST] </s>

French:

1. Client : "J'ai récemment commandé un casque audio sans fil, mais j'ai reçu un modèle différent. Quelles sont les étapes à suivre pour recevoir le produit correct que j'ai commandé ?"
2. Client : "J'ai acheté un ordinateur portable personnalisable le mois dernier et j'ai opté pour des mises à niveau spécifiques. Cependant, les performances de l'ordinateur portable ne sont pas à la hauteur de mes attentes. Puis-je avoir un technicien qui vérifie cela, ou devrais-je envisager de le retourner ?"
3. Client : "Ma commande pour un sac à main de designer devait inclure un portefeuille assorti dans le cadre d'une offre promotionnelle, mais le portefeuille ne se trouvait pas dans le paquet. Comment puis-je résoudre ce problème ?"
4. Client : "Je vois que les informations de suivi de ma commande de batterie de cuisine en céramique indiquent qu'elle a été livrée, mais je ne l'ai pas reçue. Pourriez-vous m'aider à déterminer où se trouve mon colis ?"
5. Client : "J'essaie d'acheter un miroir antique de votre collection vintage, mais le site continue de me donner une erreur lorsque j'essaie de passer à la caisse. Existe-t-il un autre moyen de finaliser mon achat ?"

German:

1. Kunde: "Ich habe kürzlich ein Set kabelloser Kopfhörer bestellt, aber ich habe ein anderes Modell erhalten. Welche Schritte sollte ich unternehmen, um das richtige Produkt zu erhalten, das ich bestellt habe?"
2. Kunde: "Ich habe letzten Monat einen anpassbaren Laptop gekauft und habe mich für spezifische Upgrades entschieden. Allerdings entspricht die Leistung des Laptops nicht meinen Erwartungen. Kann ich einen Techniker hinzuziehen lassen oder sollte ich eine Rückgabe in Erwägung ziehen?"
3. Kunde: "Meine Bestellung für eine Designer-Handtasche sollte inklusive eines passenden Portemonnaies als Teil einer Werbeaktion sein, aber das Portemonnaie war nicht im Paket. Wie kann dieses Problem gelöst werden?"
4. Kunde: "Ich sehe, dass die Sendungsverfolgungsinformationen für meine Bestellung von Keramik-Kochgeschirr anzeigen, dass es geliefert wurde, aber ich habe es nicht erhalten. Könnten Sie mir dabei helfen, festzustellen, wo sich mein Paket befindet?"
5. Kunde: "Ich versuche ein Antikspiegel aus Ihrer Vintage-Kollektion zu kaufen, aber die Website gibt mir jedes Mal einen Fehler, wenn ich versuche, den Bestellvorgang abzuschließen. Gibt es einen anderen Weg, meinen Kauf abzuschließen?"

Spanish:

1. Cliente: "Recientemente ordené un conjunto de audífonos inalámbricos, pero recibí un modelo diferente. ¿Cuáles son los pasos que debo seguir para recibir el producto correcto que ordené?"
2. Cliente: "Compré una computadora personalizable el mes pasado y opté por actualizaciones específicas. Sin embargo, el rendimiento de la computadora no está a la altura de mis expectativas. ¿Puedo tener un técnico que revise esto, o debería considerar devolverla?"
3. Cliente: "Mi pedido de un bolso de diseñador debería haber incluido una billetera a juego como parte de una oferta promocional, pero la billetera no estaba en el paquete. ¿Cómo puedo resolver este problema?"
4. Cliente: "Veo que la información de seguimiento de mi pedido de utensilios de cocina de cerámica indica que ha sido entregado, pero aún no lo he recibido. ¿Podría ayudarme a determinar dónde se encuentra mi paquete?"
5. Cliente: "Estoy tratando de comprar un espejo antiguo de su colección de vintage, pero el sitio sigue dándome un error cada vez que trato de realizar el pago. ¿Hay otra forma de completar mi compra?"

code generation

You can get the code generation response using the following code:

instructions = [{"role": "user", "content": "Write a program on Dijstra's algorithm using a priority queue in python:"}]
prompt = format_instructions(instructions)
payload = {
"inputs": prompt,
"parameters": {"max_new_tokens": 1000}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

You will get output like this:

> Input
<s>[INST] Write a program on Dijstra's algorithm using a priority queue in python: [/INST] </s>

> Output
<s>[INST] Write a program on Dijstra's algorithm using a priority queue in python: [/INST] </s>Here is a Python implementation of Dijkstra's algorithm using a priority queue (heapq module):
```python
import heapq

def dijkstra(graph, start):
    # Initialize the priority queue with the starting vertex and its distance (0)
    priority_queue = [(0, start)]
    # Initialize the distances dictionary with infinite distances for all vertices except the starting vertex
    distances = {vertex: float('inf') for vertex in graph}
    distances[start] = 0
    # Initialize the previous dictionary to keep track of the shortest path
    previous = {vertex: None for vertex in graph}

    while priority_queue:
        # Get the vertex with the smallest distance from the priority queue
        current_distance, current_vertex = heapq.heappop(priority_queue)

        # If the current distance is greater than the stored distance, skip this vertex
        if current_distance > distances[current_vertex]:
            continue

        # Update the distances of the adjacent vertices
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            # If the new distance is smaller than the stored distance, update the distance and previous vertex
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                previous[neighbor] = current_vertex
                # Add the neighbor to the priority queue
                heapq.heappush(priority_queue, (distance, neighbor))

    return distances, previous

# Example usage:
graph = {
    'A': {'B': 1, 'C': 4},
    'B': {'A': 1, 'C': 2, 'D': 5},
    'C': {'A': 4, 'B': 2, 'D': 1},
    'D': {'B': 5, 'C': 1}
}

distances, previous = dijkstra(graph, 'A')
print(distances)  # {'A': 0, 'B': 1, 'C': 3, 'D': 4}
```
In this implementation, the `graph` parameter is a dictionary where the keys are the vertices and the values are dictionaries that map adjacent vertices to their weights. The `start` parameter is the starting vertex for the algorithm. The function returns two dictionaries: `distances` and `previous`. The `distances` dictionary maps each vertex to its shortest distance from the starting vertex, and the `previous` dictionary maps each vertex to its previous vertex in the shortest path.

Reasoning and Mathematics

You can use the following code to get the inference and math responses.


instructions = [{"role": "user", "content": """Sarah went to a car shop to buy a car from Fred for $20,000 in 2024. She plans to sell the car but it depreciates by 2% each year after she has bought it. She went to a dealer in which that dealer told her that the car has only depreciated by 1.4% each year. After 7 years of using the car, Sarah decides to sell it directly to another person.

How much did Sarah sell the car for and what year is it? Explain the steps before answering. It's ok to make some assumptions as you come to your answer.

"""}]
prompt = format_instructions(instructions)
payload = {
"inputs": prompt,
"parameters": {"max_new_tokens": 2000, "do_sample": True}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

I get the following output:

<s>[INST] Sarah went to a car shop to buy a car from Fred for $20,000 in 2024. She plans to sell the car but it depreciates by 2% each year after she has bought it. She went to a dealer in which that dealer told her that the car has only depreciated by 1.4% each year. After 7 years of using the car, Sarah decides to sell it directly to another person. 

How much did Sarah sell the car for and what year is it? Explain the steps before answering. It's ok to make some assumptions as you come to your answer. [/INST] </s>To solve this problem, we need to calculate the value of the car after 7 years, assuming two different annual depreciation rates: 2% and 1.4%.

First, let's calculate the value of the car after 7 years assuming a 2% annual depreciation rate:

Year 1: $20,000 - ($20,000 * 2%) = $19,600
Year 2: $19,600 - ($19,600 * 2%) = $19,208
Year 3: $19,208 - ($19,208 * 2%) = $18,821.60
Year 4: $18,821.60 - ($18,821.60 * 2%) = $18,442.22
Year 5: $18,442.22 - ($18,442.22 * 2%) = $18,071.62
Year 6: $18,071.62 - ($18,071.62 * 2%) = $17,709.76
Year 7: $17,709.76 - ($17,709.76 * 2%) = $17,356.58

So, assuming a 2% annual depreciation rate, the value of the car after 7 years would be approximately $17,356.58.

Next, let's calculate the value of the car after 7 years assuming a 1.4% annual depreciation rate:

Year 1: $20,000 - ($20,000 * 1.4%) = $19,728
Year 2: $19,728 - ($19,728 * 1.4%) = $19,463.23
Year 3: $19,463.23 - ($19,463.23 * 1.4%) = $19,205.80
Year 4: $19,205.80 - ($19,205.80 * 1.4%) = $18,955.66
Year 5: $18,955.66 - ($18,955.66 * 1.4%) = $18,712.73
Year 6: $18,712.73 - ($18,712.73 * 1.4%) = $18,476.95
Year 7: $18,476.95 - ($18,476.95 * 1.4%) = $18,248.33

So, assuming a 1.4% annual depreciation rate, the value of the car after 7 years would be approximately $18,248.33.

Now, since Sarah sells the car directly to another person, we can assume that she sold it for the higher of the two values. Therefore, she sold the car for $18,248.33.

As for the year in which she sold the car, since we know that Sarah purchased the car in 2024 and used it for 7 years, we can calculate the year as follows:

2024 + 7 = 2031

Therefore, Sarah sold the car in the year 2031.

So, Sarah sold the car for $18,248.33 in the year 2031.

cleaning

Once the notebook has finished running, delete all resources created in the process to stop billing. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

conclusion

In this post, you learned how to get started with Mixtral-8x22B in SageMaker Studio and deploy a model for inference. The base model is pre-trained, reducing training and infrastructure costs and allowing customization for your use case. Visit SageMaker JumpStart in SageMaker Studio to get started today.

Now that you understand Mistral AI and its Mixtral 8x22B model, we recommend that you deploy an endpoint in SageMaker to run inference tests and try out the responses yourself. For more information, see the following resources:

About the author

Marco Punio is a solutions architect focused on conducting generative AI strategies, applied AI solutions, and research to help customers hyperscale on AWS. He is a qualified engineer with a passion for machine learning, artificial intelligence, and mergers and acquisitions. Marco is based in Seattle, Washington and enjoys writing, reading, exercising, and building applications in his free time.

preston tackle is a senior specialist solutions architect working on generative AI.

Joon Won I am a product manager for Amazon SageMaker JumpStart. He focuses on making foundational models easy to discover and use so customers can build generative AI applications. The Amazon experience also includes the Mobile His Shopping application and Last Miles Shipping.

Dr. Ashish Khetan He is a Senior Applied Scientist for Amazon SageMaker Embedded Algorithms and helps develop machine learning algorithms. He received his Ph.D. from the University of Illinois at Urbana-Champaign. He is an active researcher in machine learning and statistical inference and has published many papers at NeurIPS, ICML, ICLR, JMLR, ACL, and his EMNLP conferences.

shane rye is a Principal GenAI Specialist at the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to address their most pressing and innovative business needs using his wide range of cloud-based AI/ML services on AWS, including models provided by top-tier underlying model providers. is being solved.

hemant singh I am an applied scientist with experience with Amazon SageMaker JumpStart. He completed his master's degree from Courant Institute of Mathematical Sciences and his bachelor's degree from Delhi Institute of Technology. He has experience working on various machine learning problems in the areas of natural language processing, computer vision, and time series analysis.