Amazon SageMaker launches Cohere Command R fine-tuning models

AWS announced the availability of Cohere Command R fine-tuning models on Amazon SageMaker, a new addition to the SageMaker suite of machine learning (ML) capabilities that enables enterprises to harness the power of large language models (LLMs) to unlock their full potential for a wide range of applications.

Cohere Command R is a scalable, state-of-the-art LLM designed to handle enterprise-grade workloads with ease. Cohere Command R is optimized for conversational interactions and long context tasks. It targets the category of scalable models that balance high performance with high accuracy, enabling enterprises to move from proof of concept to production. The model boasts high accuracy on search augmentation generation (RAG) and tool usage tasks, low latency and high throughput, long context length of 128,000 tokens, and powerful capabilities across 10 major languages.

In this article, we'll explain why you might want to fine-tune your model and the process for achieving this using Cohere Command R.

Fine-tuning: Customizing LLM for your specific use case

Fine-tuning is an effective technique for adapting an LLM such as Cohere Command R to a specific domain or task, resulting in significant performance improvements over baseline models. Evaluations of fine-tuned Cohere Command R models have demonstrated over 20% performance improvements across a range of enterprise use cases in industries including financial services, technology, retail, medical, legal, and healthcare. The small size of fine-tuned Cohere Command R models allows them to be processed more efficiently compared to much larger models in their class.

We recommend using a dataset that contains at least 100 examples.

Cohere Command R uses a RAG approach to retrieve relevant context from external knowledge bases to improve the output. However, fine-tuning can further specialize the model. Fine-tuning a text generation model like Cohere Command R is essential to achieve the best performance in some scenarios.

Domain-specific adaptation – RAG models may not perform optimally in highly specialized domains such as finance, law, medicine, etc. Fine-tuning allows the model to adapt to the nuances of these domains, improving accuracy.
Data Augmentation – Fine-tuning allows you to incorporate additional data sources and techniques to expand the model's knowledge base and make it more robust, especially in the case of sparse data.
Fine-grained control – RAG provides good general functionality, but fine-tuning allows fine-grained control over the model's behavior and allows it to be tuned precisely to the desired task, achieving the highest accuracy.

Combining the power of RAG and fine-tuned LLM can tackle a variety of challenges with unmatched versatility and effectiveness. The introduction of Cohere Command R fine-tuning in SageMaker enables enterprises to customize and optimize model performance for their unique requirements. By fine-tuning with domain-specific data, enterprises can increase the accuracy, relevance, and effectiveness of Cohere Command R for use cases such as natural language processing, text generation, and question answering.

By combining the scalability and robustness of Coherence Command R with the performance fine-tuning in SageMaker, AWS helps enterprises navigate the complexities of adopting AI and harness its transformative power to drive innovation and growth across industries and sectors.

Customer data, including prompts, imputations, custom models, and data used for fine-tuning and ongoing pre-training, remains private in the customer's AWS account and is not shared with third-party model providers.

Solution overview

The following sections walk you through the steps to fine-tune a Coherence R model in SageMaker, including preparing data, deploying the model, preparing for fine-tuning, creating an inference endpoint, and running inference.

Prepare fine-tuning data

Before you start a fine-tuning job, you need to upload a dataset that contains training data and (optionally) evaluation data.

First, make sure your data is in jsonl format, it should have the following structure:

message – Contains a list of messages in the conversation. A message consists of the following parts:
role – Specify the current speaker. You can choose between the system, a user, or a chatbot.
content – This includes the content of the message.

Below are some examples to train your chatbot to answer questions. The documents span multiple lines for readability. Make sure your dataset contains one complete example on each line.

{
  "messages": [
    {
      "role": "System",
      "content": "You are a large language model trained by Cohere."
    },
    {
      "role": "User",
      "content": "Hi! What were Time magazines top 10 cover stories in the last 10 years?"
    },
    {
      "role": "Chatbot",
      "content": "Time magazines top 10 cover stories in the last 10 years were:\\n\\n1. Volodymyr Zelenskyy\\n2. Elon Musk\\n3. Martin Luther King Jr.\\n4. How Earth Survived\\n5. Her Lasting Impact\\n6. Nothing to See Here\\n7. Meltdown\\n8. Deal With It\\n9. The Top of America\\n10. Bitter Pill"
    },
    {
      "role": "User",
      "content": "Who is Volodymyr Zelenskyy?"
    },
    {
      "role": "Chatbot",
      "content": "Volodymyr Zelenskyy is a Ukrainian politician who was elected President of Ukraine in 2019. He previously served as the Minister of Internal Affairs in the government of Prime Minister Volodymyr Groysman."
    },
    {
      "role": "User",
      "content": "Thank you!"
    }
  ]
}

Deploy the model

To deploy the model, follow these steps:

Subscribe to the Coherence Command R model on AWS Marketplace

After you subscribe to a model, you can configure the model and create a training job.

choose View in Amazon SageMaker
Follow the instructions in the UI to create a training job.

Alternatively, you can use the following example notebook to create a training job:

Preparing for fine tuning

To fine-tune your model, you need:

Product ARN – Provided after subscribing to the product.
Training and evaluation datasets – Prepare the dataset for fine-tuning.
Amazon S3 location – Specify the Amazon Simple Storage Service (Amazon S3) location where you want to store the training and evaluation datasets.
Hyperparameters – Fine-tuning typically involves adjusting various hyperparameters such as learning rate, batch size, number of epochs, etc. For the fine-tuning task, you need to specify appropriate hyperparameter ranges or values.

Create an endpoint for inference

Once fine-tuning is complete, you can create an endpoint for inference with the fine-tuned model. To create an endpoint, create_endpoint method. If the endpoint already exists, connect_to_endpoint Method.

Run inference

You can now use the endpoint to perform real-time inference. Below is a sample message to use as input.

message = "Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly -- and in the best way."
result = co.chat(message=message)
print(result)

The following screenshot shows the output of the fine-tuned model.

Optionally, you can also use the evaluation data to test the accuracy of your model (sample_finetune_scienceQA_eval.jsonl).

cleaning

Once you're done running your notebook and experimenting with fine-tuning models in Cohere Command R, it's important to clean up the resources you provisioned. Not doing so can result in unnecessary charges on your account. To prevent this, use the following code to delete the resources and stop the billing process:

co.delete_endpoint()
co.close()

summary

Fine-tuning Cohere Command R allows users to customize models to perform for their business, domain, and industry. In addition to fine-tuned models, users also benefit from Cohere Command R's proficiency in the most commonly used business languages (10 languages) and RAGs with citations for accurate, validated information. Fine-tuned Cohere Command R delivers high levels of performance while using less resources for targeted use cases. Companies can realize reduced operational costs, improved latency, and increased throughput without large computational demands.

Get started building with Cohere fine-tuning models on SageMaker today.

About the Author

Chassis Liner Shashi is a Senior Partner Solutions Architect at Amazon Web Services (AWS) specializing in supporting Generative AI (GenAI) startups. With nearly six years of experience at AWS, Shashi has developed deep expertise across multiple domains including DevOps, Analytics, and Generative AI.

James Yi He is a Senior AI/ML Partner Solutions Architect with the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy, and extend AI/ML applications to derive business value. Outside of work, he enjoys playing soccer, traveling, and spending time with his family.

Pradeep Prabhakaran Pradeep is a Customer Solutions Architect at Cohere. In his current role at Cohere, he serves as a trusted technical advisor to customers and partners, providing guidance and strategies to unlock the full potential of Cohere's cutting edge Generative AI platform. Prior to Cohere, he was a Principal Customer Solutions Manager at Amazon Web Services, where he led enterprise cloud transformation programs for large enterprises. Prior to AWS, he held various leadership roles at consulting firms such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor's degree in Engineering and is based in Dallas, Texas.