Strengthen LLM training and evaluation with the new Sage Maker AI Generation AI Tool

I look forward to introducing you today Text Ranking and Questions and Answers Sagemaker AI UI templates for customers. Text Ranking Templates allow human annotators to rank multiple responses from large language models (LLMs) based on custom criteria such as relevance, clarity, and factual accuracy. This ranked feedback provides important insights to help refine the model through reinforcement learning from human feedback (RLHF), generating responses that are more compatible with human preferences. Questions and Answers The templates make it easy to create high-quality Q&A pairs based on the text passages provided. These pairs work like this Demo data For monitored fine tuning (SFT), we model how to respond accurately to similar inputs.

In this blog post, we will show you how to set up these templates in SageMaker to create high-quality datasets for training large language models. Let's find out how to take advantage of these new tools.

Text Ranking

Text Ranking templates allow annotators to rank multiple text responses generated by large language models based on customizable criteria such as relevance, clarity, and accuracy. The annotator displays prompts and some model-generated responses. This is ranked according to use case specific guidelines. Ranked data is captured in a structured format, detailing the re-ranking indexes for each criterion, such as “transparency” and “comprehensibility.” This information is invaluable in fine-tuning models using RLHF, and coordinates the model's output more closely to human preferences. Furthermore, this template is extremely effective in assessing the quality of LLM outputs by being able to see how well the responses match the intended criteria.

Set up in Sagemaker AI Console

new Generation AI Categories are added under Task Types in the Sagemaker AI Console and you can select these templates. To configure a labeling job using the AWS Management Console, complete the following steps:

Sage Maker AI Console, under Ground Truth In the navigation pane, select Labeling Job.
choose Create a labeling job.
Specifies the location and output path of the input manifest. To configure a text ranking input file, use Manual Data Setup under Create a labeling job Enter the JSON file with the prompt stored under the source field, and the list of model responses is placed under the response field. Text rankings are not supported Automatic data setup.

An example of the input manifest file is:

Upload this input manifest file to an S3 location and provide an S3 path to this file Location of the input dataset:

Select Generation AI Select the Text Ranking UI as the type of task.
choose Next.
Please enter labeling instructions. Enter the dimensions you want to include Ranking Dimensions section. For example, in the image above, the dimensions are Usability and Clarityhowever, you can add, remove or customize them based on your specific needs by clicking the + button to add new dimensions or trash icons and removing them. Plus there are options Allow Thai rankings Select the check box. This option allows the annotator to rank two or more responses equally if the response appears to be of the same quality in a particular dimension.
choose Preview View UI templates for review.
choose Create Create a labeling job.

Once the annotator submits an evaluation, their responses are stored directly in the specified S3 bucket. The output manifest file contains the original data field and the worker response REF pointing to the worker response file in S3. This worker response file contains ranked responses for each specified dimension and can be used to fine-tune or evaluate the output of the model. If multiple annotators are operating on the same data object, individual annotations are included in this file under the answer key. This is an array of responses. Each response includes annotator input and metadata such as acceptance time, submission time, and worker ID. Here is an example of an output JSON file that contains annotations:

Questions and Answers

Question and Answer templates allow you to create monitored fine tuning (SFT) datasets by generating question and answer pairs from text passages. The annotator reads the provided text and creates related questions and corresponding answers. This process acts as a source of Demo dataderives a model for how to handle similar tasks. This template supports flexible input and allows annotators to refer to the entire passage of a more targeted Q&A or specific section. The color-coded matching feature visually links questions to relevant sections, helping to streamline the annotation process. By using these Q&A pairs, you improve your ability to follow the model's instructions and respond accurately to actual inputs.

Set up in Sagemaker AI Console

The process of setting up a labeling job using question and answer templates follows the same steps as a text ranking template. However, there are differences in how you configure the input file, and you choose the appropriate UI template for your Q&A task.

Sage Maker AI Console, under Ground Truth In the navigation pane, select Labeling Job.
choose Create a labeling job.
Specifies the location and output path of the input manifest. To configure the question and answer the input file, Manual Data Setup Upload a JSON file containing the text passages in the source field. The annotator uses this text to generate questions and answers. Note that you can load text from a .txt or .csv file to use ground truth Automatic data setup Convert to the required JSON format.

An example of the input manifest file is:

Upload this input manifest file to an S3 location and provide an S3 path to this file Input dataset location

Select Generation AI Select as the task type Questions and Answers Ui
choose Next.
Please enter labeling instructions. Additional settings can be configured to control the task. You can specify the minimum and maximum number of Q&A pairs that workers should generate from the provided text passages. Additionally, you can define minimum and maximum word counts for both the question and answer fields so that the answer meets your requirements. You can also add optional question tags to categorize questions and answer them in pairs. For example, you can include tags that lead to tasks such as “What”, “How”, “Reason”. If these predefined tags are insufficient, you have the option to allow workers to enter their own custom tags. Allows workers to specify custom tags Features. This flexibility makes it easier to annotate to meet the specific needs of your use case.
Once these settings are set, you can select them Preview The UI will ensure that it meets your needs before proceeding.
choose Create Create a labeling job.

Once the annotator submits work, their responses are stored directly in the specified S3 bucket. Output Manifest The file contains the original data field along with a Worker – Response – Ref It refers to the worker response file in S3. This worker response file contains detailed annotations provided by workers, including ranked answers and question-answer pairs generated for each task.

This is an example of what the output will look like.

CreateLabelingJob API

In addition to creating these labeling jobs through the Amazon Sagemaker AI console, customers can also Create a labeling job API Set up a text ranking, programmatically ask jobs and answer them. This method provides greater flexibility in automation and integration into existing workflows. The API allows you to define job configurations, input manifests, and worker task templates to monitor job progress directly from your application or system.

For a step-by-step guide on how to implement this, you can refer to the following notebook walking through the entire process of setting up a human loop (HITL) workflow for reinforcement learning from human feedback (RLHF) using both text rankings and templates for both questions and answers. These notebooks guide you through setting up the required ground truth prerequisites, downloading sample JSON files for prompts and responses, converting ground truth input manifests, creating worker task templates, and monitoring labeling jobs. It also covers post-processing of results and creates a consolidated dataset with ranked responses.

Conclusion

With the introduction of text rankings and question and answer templates, Amazon Sagemaker AI can generate high-quality datasets to train large language models more efficiently. These built-in features simplify the process of fine-tuning models for specific tasks and tailor its output to human preferences through monitored tweaking or reinforcement learning from human feedback. By leveraging these templates, you can better evaluate and refine your models to meet the needs of your specific application, and help you achieve a more accurate, reliable, user-positioned output. Whether you're creating datasets to train the output of your model, Sagemaker AI provides the tools you need to succeed in building cutting-edge generator AI solutions. Start creating a fine-tuning dataset with a new template.

About the author

Sundar Raghavan AWS Generation AI Specialist Solutions Architect helps you design, build and deploy AI agents and scalable generated AI applications using Amazon Bedrock and Next Generation AWS services. During the free time, Sundar loves to explore new places, sample local eateries and embrace the wonderful outdoors.

Jesse Manders I am the senior product manager for Amazon Bedrock, an AWS Generic AI developer service. He works at the intersection of AI-human interactions and aims to create and improve generative AI products and services to meet our needs. Previously, Jesse was a senior scientist at Silicon Valley startups, serving as an engineering team leadership role at Apple and Lumileds. He has an MS and a PhD. He was an MBA at the University of Florida, University of California, Berkeley, and the Haas School of Business.

Niharika Jayanti A front-end engineer at Amazon, he designs and develops user interfaces to please its customers. She helped the successful launch of the LLM evaluation tool at Amazon Bedrock and Amazon Sagemaker Unified Studio. Outside of work, Niharika enjoys swimming, bumping into the gym and crocheting.

Muyoon Yang I am a senior software engineer for the Amazon Web Services (AWS) Sagemaker AI team. For over six years at AWS, she specializes in developing machine learning-based labeling platforms. Her work focuses on building and deploying innovative software applications for labeling solutions, providing customers with access to cutting-edge labeling capabilities. Muyoon holds an MS in Computer Engineering from Boston University.

Kavya Kotra He is a software engineer for the Amazon Sagemaker Ground Truth team, helping you build scalable and reliable software applications. Kavya played an important role in the development and launch of Sagemaker's generation AI tools. Previously, Kavya was an engineering role within AWS EC2 networking, where she was responsible for Amazon Audible. In her free time, she enjoys painting and explores the natural scenes of Seattle.

Alan Ismael I am an AWS software engineer based in New York City. He focuses on building and maintaining scalable AI/ML products such as Amazon Sagemaker Ground Truth and Amazon Bedrock. Outside of work, Alan has learned how to play pickleball, with mixed results.