Improve employee productivity with automated meeting summaries using LLM for Amazon Transcribe, Amazon SageMaker, and Hugging Face

The proliferation of virtual business meetings in enterprises, largely accelerated by the COVID-19 pandemic, will continue. According to his 2023 study by American Express, by 2024, 41% of business meetings are expected to be held in a hybrid or virtual format. Attending multiple meetings every day and keeping track of all the ongoing topics becomes increasingly difficult to manage over time. This can have negative effects in a variety of ways, from delays in project schedules to loss of customer trust. Writing a meeting summary is the usual solution to overcome this challenge, but it interferes with the concentration needed to listen to the ongoing conversation.

A more efficient way to manage meeting summaries is to use generative artificial intelligence (AI) and speech-to-text technology to automatically create meeting summaries at the end of a call. This allows participants to focus solely on the conversation as the transcript is automatically available at the end of the call.

In this post, we present a solution that automatically generates meeting summaries from recorded virtual meetings with multiple participants (for example, using Amazon Chime). The recording is transcribed to text using Amazon Transcribe and processed using the Amazon SageMaker Hugging Face container to generate a meeting summary. The Hugging Face container hosts large language models (LLMs) from the Hugging Face Hub.

If you want to use Amazon Bedrock instead of Amazon SageMaker to generate post-call recording summaries, check out this Bedrock sample solution. For a generative AI-powered Live Meeting Assistant that not only creates post-call summaries but also provides live transcription, translation, and contextual assistance based on your in-house knowledge base, check out our new LMA Check out our solutions.

Solution overview

The entire solution infrastructure is provisioned using the AWS Cloud Development Kit (AWS CDK). It is an Infrastructure as Code (IaC) framework for programmatically defining and deploying AWS resources. This framework provisions resources in a safe and repeatable way, allowing you to significantly accelerate the development process.

Amazon Transcribe is a fully managed service that seamlessly runs automatic speech recognition (ASR) workloads in the cloud. This service allows you to easily capture audio data, create easy-to-read transcriptions, and improve accuracy with custom vocabulary. Amazon Transcribe's new ASR foundation model supports over 100 language variants. In this post, we will use the speaker diarization feature. This allows Amazon Transcribe to distinguish between up to 10 unique speakers and label the conversation accordingly.

Hugging Face is an open source machine learning (ML) platform that provides tools and resources for developing AI projects. Its flagship product is Hugging Face Hub, which hosts a huge collection of over 200,000 pre-trained models and 30,000 datasets. AWS partnership with Hugging Face enables seamless integration with his Hugging Face estimators and predictors via SageMaker, a set of deep learning containers (DLC) for training and inference, and the SageMaker Python SDK Masu.

Generative AI CDK Constructs, an open source extension to AWS CDK, provides well-designed multiservice patterns to quickly and efficiently create the repeatable infrastructure required for generative AI projects on AWS. To do. This post describes how deploying a foundation model (FM) using Hugging Face or Amazon SageMaker JumpStart and SageMaker real-time inference is simplified. This provides a persistent, fully managed endpoint for hosting your ML models. They are designed for real-time, interactive, low-latency workloads and provide autoscaling to manage load fluctuations. For every language supported by Amazon Transcribe, you can find a Hugging Face FM that supports summarization in the corresponding language.

The following diagram shows the automatic meeting summary workflow.

The workflow consists of the following steps:

Users upload meeting recordings as audio or video files to their project's Amazon Simple Storage Service (Amazon S3) bucket. /recordings folder.
Whenever a new recording is uploaded to this folder, the AWS Lambda Transcribe function is called to start an Amazon Transcribe job that converts the meeting recording to text. The transcript is stored in the project's S3 bucket. /transcriptions/TranscribeOutput/.
This triggers an inference Lambda function that preprocesses the transcript file into a format suitable for ML inference and stores it in your project's S3 bucket under the prefix. /summaries/InvokeInput/processed-TranscribeOutput/, calls the SageMaker endpoint. The endpoint hosts a Hugging Face model that summarizes the processed transcript. The summary is loaded into your S3 bucket under the prefix. /summaries. Note that the prompt template used in this example contains a single instruction, but for more advanced requirements, you can easily extend the template and tailor the solution to your own use case. please.
This S3 event triggers a notification Lambda function that pushes a summary to an Amazon Simple Notice Service (Amazon SNS) topic.
All subscribers to the SNS topic (such as meeting attendees) will receive the summary in their email inbox.

In this post, we will deploy Mistral 7B Instruct, an LLM available in Hugging Face Model Hub, to a SageMaker endpoint to perform a summarization task. Mistral 7B Instruct was developed by Mistral AI. It has over 7 billion parameters and can process and generate text based on user instructions. It is trained on an extensive corpus of text data to understand the different contexts and nuances of language. This model is designed to perform tasks such as answering questions, summarizing information, and creating content by following specific prompts from users. Its effectiveness is measured by metrics such as complexity, accuracy, and F1 score, and it is fine-tuned to respond to instructions with relevant and consistent text output.

Prerequisites

To proceed with this post, you must meet the following prerequisites:

Deploy the solution

To deploy the solution in your own AWS account, browse the GitHub repository to access the complete source code for the AWS CDK project for Python.

git clone https://github.com/aws-samples/audio-conversation-summary-with-hugging-face-and-transcribe.git
cd audio-conversation-summary-with-hugging-face-and-transcribe/infrastructure
pip install -r requirements.txt

The first time you deploy AWS CDK assets to your AWS account and specified AWS Region, you must first run a bootstrap command. Configure the baseline AWS resources and permissions that the AWS CDK requires to deploy AWS CloudFormation stacks in your specific environment.

cdk bootstrap aws://<ACCOUNT_ID>/<AWS_REGION>

Finally, run the following command to deploy the solution.Email address of summary recipient SubscriberEmailAddress Parameters:

cdk deploy --parameters SubscriberEmailAddress="<SUBSCRIBER_MAIL_ADDRESS>"

Test the solution

Several sample meeting recordings are provided in the data folder of the project repository. You can upload the test.mp4 recording to your project's S3 bucket. /recordings folder. The summary is stored in Amazon S3 and sent to subscribers. If you enter about 250 tokens, the end-to-end time is about 2 minutes.

The following diagram provides an overview of the input conversation and output.

Limitations

This solution has the following limitations:

This model provides highly accurate completion of English. Other languages such as Spanish, French, and Portuguese are also available, but the quality of the suggestions may be lower. You can find other hugging face models suitable for other languages.
The model used in this post has a context length limit of approximately 8,000 tokens (equivalent to approximately 6,000 words). If you need a longer context length, you can replace the model by referencing the new model ID in each AWS CDK structure.
Like other LLMs, Mistral 7B Instruct may hallucinate and produce content that deviates from the facts or contains fabricated information.
The recording format must be .mp4, .mp3, or .wav.

cleaning

To remove the deployed resource and stop incurring charges, run the following command:

Alternatively, to use the AWS Management Console, follow these steps:

In the AWS CloudFormation console, choose stack in the navigation pane.
Select the stack called Text-summarization-Infrastructor-stack, erase.

conclusion

In this post, we proposed an architectural pattern that automatically transforms meeting notes into insightful conversation summaries. In this workflow, the AWS Cloud and Hugging Face help accelerate generative AI application development by coordinating a combination of managed AI services such as Amazon Transcribe and externally sourced ML models from Hugging Face Hub such as Mistral AI. Here's how it helps.

If you are interested in learning more about how conversation summarization can be applied to your contact center environment, you can implement this technique in your suite of solutions for live and postcall analytics.

References

Mistral 7B release post, by Mistral AI

our team

This post was created by AWS Professional Services, a global team of experts who can help you achieve your desired business outcomes when using the AWS Cloud. We work with your team and selected members of his AWS Partner Network (APN) to implement your enterprise cloud computing initiatives. Our team provides assistance through a suite of services that help you achieve specific outcomes related to enterprise cloud adoption. We also provide focused guidance through our global practice, which covers a variety of solutions, technologies, and industries.

About the author

Gabriel Rodriguez Garcia is a Machine Learning Engineer at AWS Professional Services in Zurich. In his current role, he has helped customers achieve their business goals with a variety of his ML use cases, from setting up MLOps inference pipelines to developing fraud detection applications. When I'm not working, I enjoy being physically active, listening to podcasts, and reading books.

Jahed Zaidi is an AI and machine learning specialist at AWS Professional Services in Paris. He is a builder and trusted advisor to companies across industries, helping them innovate faster and at scale with technologies ranging from generative AI to scalable ML platforms. . Outside of his work, Jahed enjoys discovering new cities and cultures and his activities outdoors.

Mateusz Zaremba I'm a DevOps Architect with AWS Professional Services. Mateusz supports customers at the intersection of machine learning and DevOps disciplines, helping them deliver value efficiently and securely. Beyond technology, he is also an aerospace engineer and an avid sailor.

Zhang Kemeng I currently work at AWS Professional Services in Zurich, Switzerland, specializing in AI/ML. She has been involved in multiple of her NLP projects ranging from behavior modification to fraud detection in digital communication. Apart from that, she is interested in her UX design and playing cards.

Source link

Improve employee productivity with automated meeting summaries using LLM for Amazon Transcribe, Amazon SageMaker, and Hugging Face

Solution overview

Prerequisites

Deploy the solution

Test the solution

Limitations

cleaning

conclusion

References

our team

About the author

Leave a Reply

RECENT POSTS

AMD to announce Ryzen AI 400 series mobile CPUs for AI-powered laptops in 2026

How to avoid the AI bubble: BofA stock is the ‘perfect’ trade for investors

How AI is reinventing rocket propulsion for missions to Mars and beyond

Solution overview

Prerequisites

Deploy the solution

Test the solution

Limitations

cleaning

conclusion

References

our team

About the author

Related Posts

Leave a Reply