Speed ​​up custom entity recognition using Claude tools on Amazon Bedrock

Machine Learning


Companies across industries face a common challenge: how to efficiently extract valuable information from vast amounts of unstructured data. Traditional approaches often involve resource-intensive processes and inflexible models. This post introduces an innovative solution. Claude Tool, used by Amazon Bedrock, harnesses the power of large-scale language models (LLMs) to perform dynamic, adaptive entity recognition without extensive setup or training.

In this post, we will discuss:

  • What is the use of Claude Tool (function call) and how does it work?
  • The Claude Tool usage is used to extract structured data using natural language prompts.
  • Set up a serverless pipeline using Amazon Bedrock, AWS Lambda, and Amazon Simple Storage Service (S3)
  • Implement dynamic entity extraction for different types of documents
  • Deploy production-ready solutions following AWS best practices

What is the usage of the load tool (function call)?

Using Claude tools (also known as function calls) is a powerful feature that allows you to enhance Claude’s capabilities by establishing and calling external functions or tools. This feature enhances its functionality by providing Claude with a collection of pre-established tools that can be accessed and used as needed.

How using Claude Tool works with Amazon Bedrock

Amazon Bedrock is a fully managed generative artificial intelligence (AI) service that offers a variety of high-performance foundational models (FM) from industry leaders such as Anthropic. Amazon Bedrock makes implementing Claude’s tools extremely easy.

  1. Users define a set of tools such as name, input schema, and description.
  2. Provides user prompts that require the use of one or more tools.
  3. Claude evaluates the prompt and determines whether there are any tools that can help address the user’s question or task.
  4. If applicable, Claude chooses which tool to use with which input.

Solution overview

This post shows you how to extract custom fields from a driver’s license using the Claude Tool with Amazon Bedrock. This serverless solution processes documents in real-time and extracts information such as names, dates, and addresses without traditional model training.

architecture

Our custom entity recognition solution uses serverless architecture to efficiently process documents and uses Amazon Bedrock’s Claude model to extract relevant information. This approach minimizes the need for complex infrastructure management and provides scalable, on-demand processing capabilities.

The solution architecture uses multiple AWS services to create a seamless pipeline. Here’s how the process works:

  1. User uploads document to Amazon S3 for processing
  2. S3 PUT event notification triggers AWS Lambda function
  3. Lambda processes the document and sends it to Amazon Bedrock.
  4. Amazon Bedrock calls Anthropic Claude for entity extraction
  5. Results are logged to Amazon CloudWatch for monitoring

The following diagram shows how these services work together.

AWS architecture diagram showing a serverless driver's license information extraction system using Amazon S3, Lambda, Bedrock with Claude 4.5 Sonnet, and CloudWatch Logs, along with Lambda configuration screens and sample input/output data.

architecture components

  • Amazon S3: Save the input document
  • AWS Lambda: triggers a file upload, sends the prompt and data to Claude, and saves the results
  • Amazon Bedrock (Claude): Process input and extract entities
  • Amazon CloudWatch: Monitor and record workflow performance.

Prerequisites

Step-by-step implementation guide:

This implementation guide explains how to build a serverless document processing solution using Amazon Bedrock and related AWS services. By following these steps, you can create a system that automatically extracts information from documents such as driver’s licenses, avoiding manual data entry and reducing processing time. Whether you’re processing a few documents or thousands, this solution automatically scales to meet your needs while maintaining consistent data extraction accuracy.

  1. Setting up the environment (10 minutes)
    1. Create a source S3 bucket for input (for example, driver-license-input).
    2. Configure IAM roles and permissions.
{
     "Version": "2012-10-17",
     "Statement": [
       {
         "Effect": "Allow",
         "Action": "bedrock:InvokeModel",
         "Resource": "arn:aws:bedrock:*::foundation-model/*", "arn:aws:bedrock:*:111122223333:inference-profile/*”
       },
       {
         "Effect": "Allow",
         "Action": "s3:GetObject",
         "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
       }
     ]
   }
  1. Create a Lambda function (30 minutes)

    This Lambda function is automatically triggered when a new image is uploaded to your S3 bucket. Read the image, base64 encode it, and send it to Claude 4.5 Sonnet via Amazon Bedrock using the tool usage API. This function defines a single tool called . Extract license field For demonstration purposes. However, you can define the tool name and schema based on your use case, such as extracting insurance card data, ID badges, or business forms. Claude dynamically chooses whether to invoke the tool based on prompt relevance and input structure.

    what we use is “tool_choice”: “Auto” Let Claude decide when to call the function. For production use cases, it can also be hard-coded. “tool_choice”: { “type”: “tool”, “name”: “your_tool_name” } For decisive action.

    1. Go to the AWS Lambda console
      • choose Create a function.
      • choice Author from scratch.
      • Set the runtime as follows Python3.12.
      • choose Create a function.



    2. Configuring Lambda timeouts
      • In the Lambda function settings, Common configuration tab.
      • under Common configurationclick edit
      • for timeoutincrease from the default 3 seconds to at least 30 seconds. For large images, we recommend setting it to 1-2 minutes.
      • choose keep.



        Note:
        This adjustment is very important because Claude’s processing of images can take longer than Lambda’s default timeouts, especially when processing high-resolution images or multiple fields. Monitor function execution time in CloudWatch Logs to fine-tune this configuration for your specific use case.

    3. this code lambda function.py Code file:
      import boto3, json
      import base64
      
      def lambda_handler(event, context):
          bedrock = boto3.client("bedrock-runtime")
          s3 = boto3.client("s3")
          
          bucket = event["Records"][0]["s3"]["bucket"]["name"]
          key = event["Records"][0]["s3"]["object"]["key"]
          file = s3.get_object(Bucket=bucket, Key=key)
          
          # Convert image to base64
          image_data = file["Body"].read()
          base64_image = base64.b64encode(image_data).decode('utf-8')
          
          # Define tool schema
          tools = [{
              "name": "extract_license_fields",
              "input_schema": {
                  "type": "object",
                  "properties": {
                      "first_name": { "type": "string" },
                      "last_name": { "type": "string" },
                      "issue_date": { "type": "string" },
                      "license_number": { "type": "string" },
                      "address": {
                          "type": "object",
                          "properties": {
                              "street": { "type": "string" },
                              "city": { "type": "string" },
                              "state": { "type": "string" },
                              "zip": { "type": "string" }
                          }
                      }
                  },
                  "required": ["first_name", "last_name", "issue_date", "license_number", "address"]
              }
          }]
          
          payload = {
              "anthropic_version": "bedrock-2023-05-31",
              "max_tokens": 2048,
              "messages": [{
                  "role": "user",
                  "content": [
                      {
                          "type": "image",
                          "source": {
                              "type": "base64",
                              "media_type": "image/jpeg",
                              "data": base64_image
                          }
                      },
                      {
                          "type": "text",
                          "text": "Extract the driver's license fields from this image."
                      }
                  ]
              }],
              "tools": tools
          }
          
          try:
              response = bedrock.invoke_model(
                  modelId="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
                  body=json.dumps(payload)
              )
              
              result = json.loads(response["body"].read())
              
              # Print every step for debugging
              print("1. Raw Response:", json.dumps(result, indent=2))
              
              if "content" in result:
                  print("2. Content found in response")
                  for content in result["content"]:
                      print("3. Content item:", json.dumps(content, indent=2))
                      
                      if isinstance(content, dict):
                          print("4. Content type:", content.get("type"))
                          
                          if content.get("type") == "text":
                              print("5. Text content:", content.get("text"))
                          
                          if content.get("type") == "tool_calls":
                              print("6. Tool calls found")
                              extracted = json.loads(content["tool_calls"][0]["function"]["arguments"])
                              print("7. Extracted data:", json.dumps(extracted, indent=2))
              
              return {
                  "statusCode": 200,
                  "body": json.dumps({
                      "message": "Process completed",
                      "raw_response": result
                  }, indent=2)
              }
              
          except Exception as e:
              print(f"Error occurred: {str(e)}")
              return {
                  "statusCode": 500,
                  "body": json.dumps({
                      "error": str(e),
                      "type": str(type(e))
                  })
              }

    4. Deploy the Lambda function: After pasting the code, expand Click the button on the left side of the code editor and wait until you see the deployment confirmation message.

      important: Always remember to deploy your code after making changes. This ensures that your latest code is saved and executed when your Lambda function is triggered.
  2. Using Claude toolsWorking with schemas
    1. Amazon Bedrock and Claude 4.5 Sonnet support calling functions using tooling. Tool Usage defines callable tools with a clear JSON schema. A valid tool entry must include:
      • name: Tool identifier (e.g. extract_license_fields)
      • Input schema: JSON Schema that defines required fields, types, and structures
    2. Example of tool usage definition:
      [{
        "name": "extract_license_fields",
        "input_schema": {
          "type": "object",
          "properties": {
            "first_name": { "type": "string" },
            "last_name": { "type": "string" },
            "issue_date": { "type": "string" },
            "license_number": { "type": "string" },
            "address": {
              "type": "object",
              "properties": {
                "street": { "type": "string" },
                "city": { "type": "string" },
                "state": { "type": "string" },
                "zip": { "type": "string" }
              }
            }
          },
          "required": ["first_name", "last_name", "issue_date", "license_number", "address"]
        }
      }]

    3. Multiple tools can be defined. tool array. Claude chooses one (or none) depending on the situation. Select tools The value and how closely the prompt matches a particular schema.
  3. Configuring S3 event notifications (5 minutes)
    1. Open the Amazon S3 console.
      • Select your S3 bucket.
      • Click. properties tab.
      • Scroll down and Event announcement.
      • click Create an event notification.
      • Enter a name for your notification (for example, “LambdaTrigger”).
      • under Event typeselect put.
      • under destinationselect Lambda function.
      • Select your Lambda function from the dropdown.
      • click Save your changes.
  4. Testing and validation (15 minutes)
    1. Supported formats: Claude 4.5 supports image input in JPEG, PNG, WebP, and single-frame GIF formats. Note: This implementation currently only supports .jpeg You can extend support for other formats by editing images. media type Use fields in your Lambda function to match the MIME type of the uploaded file.
    2. Size and resolution limitations:
      • Maximum image size: 20 MB
      • Recommended resolution: 300 DPI or higher
      • Maximum dimensions: 4096 x 4096 pixels
      • Images larger than this may fail to process or produce inaccurate results.
  5. Preprocessing tips to increase accuracy:
    1. Tightly crop your images to remove noise and extraneous parts.
    2. Adjust contrast and brightness to make sure your text is clearly readable.
    3. Straightens the scan so that the text is horizontally aligned.
    4. Avoid low-resolution screenshots or images with large compression artifacts.
    5. For maximum OCR clarity, we recommend a white background and dark text.
  6. Upload test image:

    1. Open your S3 bucket
    2. Upload an image of your driver’s license (supported formats: .jpeg, .jpg).
    3. Note: For best results, make sure the image is clear and easy to read.
  7. Monitor CloudWatch logs
    1. Go to the Amazon CloudWatch console.
    2. Please click log group In the left navigation.
    3. Find your Lambda function name invoke_drivers_license.
    4. Click on the latest log stream (sorted by timestamp).
    5. Display the execution results. The following sample output is displayed.
{ 
  "type": "tool_use",
  "id": "toolu_bdrk_01Ar6UG7BcARjqAKsiSPyNdf",
  "name": "extract_license_fields", 
  "input": { 
        "first_name": "JANE",
        "last_name": "DOE", 
        "issue_date": "05/05/2025", 
        "license_number": "111222333", 
        "address": { 
            "street": "123 ANYWHERE STREET", 
            "city": "EXAMPLE CITY", 
            "state": "VA", 
            "zip": "00000"
               } 
           }
 }         

Performance optimization

  • Configure Lambda memory and timeout settings
  • Implement batch processing of multiple documents
  • Use S3 event notifications for automated processing
  • Add CloudWatch metrics for monitoring

Security best practices

  • Implement encryption at rest for S3 buckets
  • Use AWS Key Management Service (KMS) keys for sensitive data
  • Apply least privilege IAM policy
  • Enable a Virtual Private Cloud (VPC) endpoint for private network access

Error handling and monitoring

  1. Claude’s output is structured as a list of content blocks and may include text responses. tool callor other data types. To debug:
    1. Always record live responses from Claude.
    2. Please check if tool_calls is present in the response.
    3. Use try-excel blocks around function calls to detect errors such as malformed payloads or model timeouts.
  2. The minimal error handling pattern is:
try:
    result = json.loads(response["body"].read())
    if "tool_calls" in result.get("content", [{}])[0]:
        args = result["content"][0]["tool_calls"][0]["function"]["arguments"]
        print("Extracted Fields:", json.dumps(json.loads(args), indent=2))
except Exception as e:
    print("Error occurred:", str(e))

cleaning

  1. Delete the S3 bucket and content.
  2. Delete the Lambda function.
  3. Delete the IAM role and policy.
  4. Disable Bedrock access if you no longer need it.

conclusion

Using Claude Tool with Amazon Bedrock provides a powerful solution for custom entity extraction and minimizes the need for complex machine learning (ML) models. This serverless architecture enables scalable and cost-effective document processing with minimal setup and maintenance. By harnessing the power of language models at scale through Amazon Bedrock, organizations can achieve new levels of efficiency, insight, and innovation in processing unstructured data.

next step

We encourage you to explore this solution further by implementing the sample code in your environment and customizing it for your specific use case. Join the discussion about entity extraction solutions in the AWS re:Post community to share your experiences and learn from other developers.

For deeper technical insights, see our comprehensive documentation on Amazon Bedrock, AWS Lambda, and Amazon S3. Consider enhancing your implementation by integrating with Amazon Textract for additional document processing features and Amazon Comprehend for advanced text analysis. To stay up to date on similar solutions, subscribe to the AWS Machine Learning blog and explore other examples in the AWS Samples GitHub repository. If you’re new to AWS machine learning services, check out AWS Machine Learning University or explore the AWS Solutions Library. For enterprise solutions and support, please contact us through your AWS account team.


About the author

Kimo El Mehri

Kimo is an AWS Solutions Architect with expertise across infrastructure, storage, security, GenAI, data analytics, and more. He is passionate about working with customers across industries to help them leverage AWS services to drive digital transformation and meet their business needs.

Johana Herrera

Johana Herrera is a Solutions Architect at AWS, helping businesses modernize and grow in the cloud. She specializes in generative AI and analytics and is passionate about helping customers design solutions with security and resiliency in mind. In his free time, he enjoys spending time with his two dogs and watching sports.



Source link