Building a serverless audio summary solution using Amazon Bedrock and Whisper

Machine Learning


Records of business meetings, interviews and customer interactions have become essential to maintaining critical information. However, manually transcribing and summarizing these recordings is often time consuming and labor intensive. Advances in Generated AI and Automatic Speech Recognition (ASR) have emerged automated solutions to make this process faster and more efficient.

Protecting Personally Identifiable Information (PII) is an important aspect of data security driven by both ethical liability and legal requirements. This post shows you how to use the Open AI Whisper Foundation model (FM) Whisper Large V3 Turbo, available on Amazon Bedrock Marketplace. These transcriptions are processed by Amazon Bedrock for the purposes of summarizing and editing of sensitive information.

Amazon Bedrock is a fully managed service that allows you to choose high-performance FMS, including leading AI companies such as AI21 Labs, Ai21 Labs, Ai21 Labs, Cohereek, Luma, Luma, Metral AI, Poolside (coming soon), Stability AI, Amazon Nova, and more, to reinforce the responsibility of AI responsibility for AI's wide range of AI applications via a single API. Additionally, Amazon Bedrock Guardrails allows you to automatically edit sensitive information, including PII, from the transcription summary, to support your compliance and data protection needs.

In this post, we will advance an end-to-end architecture that combines a reaction-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step capabilities to coordinate workflows and facilitate seamless integration and processing.

Solution overview

The solution highlights the power to integrate serverless technology with AI to automate and extend content processing workflows. The user's journey begins with uploading recordings via React Frontend applications hosted on Amazon CloudFront and supported by Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. Once the file is uploaded, it uses AI models and lambda functions to use seamless data flow and transformations to make the step machine work to coordinate core processing steps. The following diagram illustrates the solution architecture.

AWS Serverless Architecture for Audio Processing: Cloud Front to S3, Eventbridge Trigger, Lambda, and Bedlock for Transcription and Summary

The workflow consists of the following steps:

  1. The React application is hosted in an S3 bucket and provides fast, global access to users via CloudFront. The API Gateway handles the interaction between front-end and back-end services.
  2. Users upload audio or video files directly from the app. These recordings are stored in the specified S3 bucket for processing.
  3. Amazon Eventbridge rules detect S3 upload events, trigger the step function state machine, and start an AI-powered processing pipeline.
  4. The state machine performs audio transcription, summary, and editing by sequentially adjusting multiple Amazon bedrock models. Whispering in transcription, Claude summary, and use Guardrails to edit sensitive data.
  5. The edited summary is returned to the FrontEnd application and displayed to the user.

The following diagram illustrates the state machine workflow.

AWS Step Functions State Machine for Audio Processing: Whisper Transcription, Speaker Identification, and Bedrock Summary Tasks

The Step Functional State Machine coordinates a set of tasks for transcription, summarizing, and editing sensitive information from uploaded audio/video recordings.

  1. The Lambda function is triggered to collect input details (Amazon S3 object path, metadata, etc.) and prepare the transcription payload.
  2. The payload is sent via the Amazon Bedrock Marketplace to an Openai Whisper Large V3 Turbo model to generate near-real-time transcription of the recording.
  3. The raw transcript is passed through Amazon Bedrock to Anthropic's Claude Sonnet 3.5 to generate a concise and consistent summary of conversations and content.
  4. The second Lambda function validates and forwards the summary to the edit step.
  5. The summary is processed through Amazon Bedrock Guardrails and automatically edits PII and other sensitive data.
  6. The edited summary is saved or returned to the front-end application via the API and is displayed to the user.

Prerequisites

Before you begin, make sure you have the following prerequisites:

Create Guardrail in the Amazon Bedrock console

For instructions on creating GuardRails on Amazon Bedrock, see Creating GuardRail. For more information about PII detection and editing, use the Sensitive Information Filter to remove PII from your conversation. Configure the guardrail with the following key settings:

  • Enables PII detection and handling
  • Set PII actions to edit
  • Add the related PII types like this:
    • Name and identity
    • telephone number
    • email address
    • Physical address
    • Financial information
    • Other sensitive personal information

Once you have deployed GuardRail, note the Amazon Resource Name (ARN) and you will use this when deploying the model.

Expanding the whispering model

Complete the following steps to unfold the large V3 turbo model with whispers.

  1. Select on the Amazon Bedrock console Model Catalog under Basic model In the navigation pane.
  2. Search and select Whispering a big V3 turbo.
  3. In the options menu (3 dots), select Expand.

View a filtered model catalog with large V3 turbo voice recognition models and deployment options from Amazon Bedrock Console Whisper

  1. Change the endpoint name, number of instances, and instance type for your specific use case. This post uses the default settings.
  2. I'll change it Advanced Settings Sections according to your use case. This post uses the default settings.
  3. choose Expand.

This creates a new AWS Identity and Access Management IAM role and deploys the model.

You can choose Marketplace development Navigation pane, and Managed deployment In the section you can see the status of the endpoint create. Wait for the endpoint to finish the deployment and change the status On servicethen copy the endpoint name and use this when expanding

Amazon bedrock console: "How it works" Overview, Administrative Deployment Tables Using Endpoints in the Whisper Model

Deploy the solution infrastructure

In GitHub Repo, follow the instructions in the README file to clone the repository and deploy the front-end and back-end infrastructure.

Define and deploy your infrastructure using the AWS Cloud Development Kit (AWS CDK). The AWS CDK code expands the following resources:

  • Respond to FrontEnd application
  • Backend infrastructure
  • S3 bucket for storing uploads and processed results
  • Step Functions State Machine with Lambda Functions for Audio Processing and PII Editing
  • API Gateway Endpoints for Processing Requests
  • IAM Role and Policy for Secure Access
  • Cloud Front Distribution to Host Front Ends

Implementation Deep Diving

The backend consists of a sequence of lambda functions, each of which processes a specific stage in the audio processing pipeline.

  • Upload the handler – Receive audio files and save them to Amazon S3
  • Whisper transcript – Convert speech to text using the Whisper model
  • Speaker detection – Distinguish and label individual speakers within the audio
  • Summary using Amazon Bedrock – Extract and summarise keypoints from transcripts
  • PIIEdit – Use Amazon Bedrock Guardrails to remove sensitive information for privacy compliance

Let's look at some of the key components.

The Transcription Lambda function uses a whispering model to convert an audio file to text.

def transcribe_with_whisper(audio_chunk, endpoint_name):
    # Convert audio to hex string format
    hex_audio = audio_chunk.hex()
    
    # Create payload for Whisper model
    payload = {
        "audio_input": hex_audio,
        "language": "english",
        "task": "transcribe",
        "top_p": 0.9
    }
    
    # Invoke the SageMaker endpoint running Whisper
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(payload)
    )
    
    # Parse the transcription response
    response_body = json.loads(response['Body'].read().decode('utf-8'))
    transcription_text = response_body['text']
    
    return transcription_text

Use Amazon Bedrock to generate a concise summary from the transcription.

def generate_summary(transcription):
    # Format the prompt with the transcription
    prompt = f"{transcription}\n\nGive me the summary, speakers, key discussions, and action items with owners"
    
    # Call Bedrock for summarization
    response = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
        body=json.dumps({
            "prompt": prompt,
            "max_tokens_to_sample": 4096,
            "temperature": 0.7,
            "top_p": 0.9,
        })
    )
    
    # Extract and return the summary
    result = json.loads(response.get('body').read())
    return result.get('completion')

A key component of the solution is automatic editing of PII. I implemented this using Amazon Bedrock Guardrails to help you comply with privacy regulations.

def apply_guardrail(bedrock_runtime, content, guardrail_id):
# Format content according to API requirements
formatted_content = [{"text": {"text": content}}]

# Call the guardrail API
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
source="OUTPUT",  # Using OUTPUT parameter for proper flow
content=formatted_content
)

# Extract redacted text from response
if 'action' in response and response['action'] == 'GUARDRAIL_INTERVENED':
if len(response['outputs']) > 0:
output = response['outputs'][0]
if 'text' in output and isinstance(output['text'], str):
return output['text']

# Return original content if redaction fails
return content

When PII is detected, it will be replaced by a type indicator (for example, {phone} or {email}) to ensure that the overview is useful while protecting sensitive data.

To manage complex processing pipelines, use step functions to tune lambda functions.

{
"Comment": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Next": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Next": "GenerateSummary"
},
"GenerateSummary": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"End": true
}
}
}

This workflow ensures that each step completes successfully before proceeding to the next step, and includes automatic error handling and retry logic.

Test the solution

Once your deployment is successful, you can test the solution functionality using the CloudFront URL.

Audio/Video Upload and Summary Interface with Completed File Upload for Recording Analysis of Team Meetings

Security Considerations

Security is an important aspect of this solution, and we have implemented several best practices to support data protection and compliance.

  • Confidential data editing – Automatically edit PII to protect your privacy.
  • Fine grain IAM permission – Apply the principle of least privilege to AWS services and resources.
  • Amazon S3 Access Control – Restrict access to certified users and roles using strict bucket policies.
  • API Security – Secure API endpoints using Amazon Cognito for user authentication (optional but recommended).
  • Cloud Front Protection – Enforce HTTP and apply the latest TLS protocols to promote secure content delivery.
  • Amazon bedrock data security – Amazon Bedrock (including the Amazon Bedrock Marketplace) protects customer data and does not use customer data to send providers or training to providers. This ensures that your own information remains secure when using AI features.

cleaning

To prevent unnecessary charges, remove any resources provisioned for this solution once completed.

  1. Remove Amazon Bedrock Guardrail:
    1. In the Amazon Bedrock console, in the navigation menu, click guardrail.
    2. Select Guardrail and select erase.
  2. Removes the large V3 turbo model of Whisper deployed through the Amazon Bedrock Marketplace.
    1. Select on the Amazon Bedrock console Marketplace development In the navigation pane.
    2. in Managed deployment Select the section, expanded endpoint, and select erase.
  3. Run the command to delete the AWS CDK stack cdk destroyremove the AWS infrastructure.

Conclusion

This serverless audio summary solution demonstrates the benefits of combining AWS services to create sophisticated, secure and scalable applications. Using Amazon Bedrock for AIA features, Lambda for serverless processing and CloudFront for content delivery, we have built a solution that efficiently handles large amounts of audio content in line with security best practices.

Automatic PII editing features support privacy regulations compliance, and this solution is suitable for regulatory industries such as healthcare, finance and legal services where data security is paramount. To get started, deploy this architecture within your AWS environment to accelerate your audio processing workflow.


About the author

Kaiyin WhoKaiyin Who He is a senior solution architect for strategic accounts at Amazon Web Services and has years of experience in companies, startups and professional services. Now she helps her clients build cloud solutions and promotes adoption of Genai in the cloud. Previously, Kaiyin worked in the Smart Home domain, helping customers integrate voice and IoT technology.

Sid VantairSid Vantair I'm a solution architect with AWS covering strategic accounts. He has managed to solve complex technical problems to overcome customer hurdles. Outside of work, he cherishes his time with his family and cultivates the curiosity of his children.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *