Human context search using Amazon Bedrock Knowledge Bases

Machine Learning


For AI models to be effective in specialized domains, access to relevant background knowledge is required. For example, a customer support chat assistant needs detailed information about the business that provides services, and legal analytics tools should utilize a comprehensive database of past cases.

To equip this knowledge into large-scale language models (LLM), developers often use Search Extension Generation (RAG). This technique retrieves appropriate information from the knowledge base and incorporates it into user prompts, greatly improving the response of the model. However, an important limitation of traditional RAG systems is that they often lose contextual nuances when encoding data, leading to irrelevant or incomplete searches from the knowledge base.

Traditional rag challenges

In traditional rags, documents are often split into small chunks to optimize search efficiency. This method works well in many cases, but can introduce challenges when individual chunks are lacking in the required context. For example, if you say remote work requires a “6 month tenure” (chunk 1) and a “HR approval exception” (chunk 3), the middle chunk link exception with manager approval is omitted. This occurs because the isolated chunks fail to maintain dependencies between clauses, highlighting the important limitations of the basic chunking strategy in RAG systems.

Context Search enhances traditional rags by adding a chunk-specific descriptive context to each chunk before generating an embedding. This approach enriches the vector representation with relevant context information, allowing for more accurate searching for semantically related content when responding to user queries. For example, when you ask about eligibility for remote work, you can get both tenure requirements and HR exception clauses and provide an accurate response such as “usually no, but HR may approve the exception.” By intelligently sewing fragmented information, contextual search reduces the pitfalls of rigid chunking and provides a more reliable and subtle answer.

This post shows how to use contextual search in the bedrock knowledge base of humanity and Amazon.

Solution overview

This solution uses Amazon's bedrock knowledge base to incorporate custom lambda functions to transform data during the knowledge base intake process. This Lambda function processes documents in Amazon Simple Storage Service (Amazon S3) and chuns them into small pieces, enrich each chunk with contextual information using anthropic claud on Amazon bedrock, and returns the results to an intermediate S3 bucket. Here's a step-by-step explanation:

  1. Reads the input file from the S3 bucket specified in the event.
  2. Chunk input data into small chunks.
  3. Generate context information for each chunk using Anthropic's Claude3 Haiku
  4. Use metadata to return chunks that have been processed back to the intermediate S3 bucket

The following diagram shows the solution architecture.

Prerequisites

To implement the solution, complete the following prerequisites:

Before you begin, you can download the required files and follow the instructions in the corresponding GitHub repository to deploy this solution. This architecture is built around implementing context search using the Amazon Bedrock Knowledge Base using the proposed chunking solution.

Implement context search on Amazon bedrock

This section shows how to implement context search using the Amazon Bedrock Knowledge Base using the proposed custom chunking solution. Developers can use custom chunking strategies in Amazon Bedrock to optimize how smaller documents or datasets are split into smaller, more manageable pieces for processing by the underlying model (FMS). This approach allows for more efficient and effective handling of long-length content, improving the quality of response. By adjusting the chunking method to the specific characteristics of the data and the requirements of the task at hand, developers can improve the performance of natural language processing applications built on Amazon Bedrock. Custom chunking can include techniques such as semantic segmentation, sliding windows with overlaps, and creating logical departments in text using document structures.

To implement context search in Amazon Bedrock, complete the following steps: This can be found in the GitHub repository notebook.

To set up your environment, follow these steps:

  1. Install the required dependencies.
    %pip install --upgrade pip --quiet %pip install -r requirements.txt --no-deps

  2. Import the required libraries and set up the AWS client.
    import os
    import sys
    import time
    import boto3
    import logging
    import pprint
    import json
    from pathlib import Path
    
    # AWS Clients Setup
    s3_client = boto3.client('s3')
    sts_client = boto3.client('sts')
    session = boto3.session.Session()
    region = session.region_name
    account_id = sts_client.get_caller_identity()["Account"]
    bedrock_agent_client = boto3.client('bedrock-agent')
    bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')
    
    # Configure logging
    logging.basicConfig(
        format="[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s",
        level=logging.INFO
    )
    logger = logging.getLogger(__name__)

  3. Define the parameters for the knowledge base.
    # Generate unique suffix for resource names
    timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(time.time()))[-7:]
    suffix = f"{timestamp_str}"
    
    # Resource names
    knowledge_base_name_standard = 'standard-kb'
    knowledge_base_name_custom = 'custom-chunking-kb'
    knowledge_base_description = "Knowledge Base containing complex PDF."
    bucket_name = f'{knowledge_base_name_standard}-{suffix}'
    intermediate_bucket_name = f'{knowledge_base_name_standard}-intermediate-{suffix}'
    lambda_function_name = f'{knowledge_base_name_custom}-lambda-{suffix}'
    foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"
    
    # Define data sources
    data_source=[{"type": "S3", "bucket_name": bucket_name}]

Create a knowledge base with a variety of chunking strategies

To create a knowledge base with different chunk strategies, use the following code:

  1. Standard modified chunking:
    # Create knowledge base with fixed chunking
    knowledge_base_standard = BedrockKnowledgeBase(
        kb_name=f'{knowledge_base_name_standard}-{suffix}',
        kb_description=knowledge_base_description,
        data_sources=data_source,
        chunking_strategy="FIXED_SIZE",
        suffix=f'{suffix}-f'
    )
    
    # Upload data to S3
    def upload_directory(path, bucket_name):
        for root, dirs, files in os.walk(path):
            for file in files:
                file_to_upload = os.path.join(root, file)
                if file not in ["LICENSE", "NOTICE", "README.md"]:
                    print(f"uploading file {file_to_upload} to {bucket_name}")
                    s3_client.upload_file(file_to_upload, bucket_name, file)
                else:
                    print(f"Skipping file {file_to_upload}")
    
    upload_directory("../synthetic_dataset", bucket_name)
    
    # Start ingestion job
    time.sleep(30)  # ensure KB is available
    knowledge_base_standard.start_ingestion_job()
    kb_id_standard = knowledge_base_standard.get_knowledge_base_id()

  2. Custom chunking using lambda functions
    # Create Lambda function for custom chunking
    def create_lambda_function():
        with open('lambda_function.py', 'r') as file:
            lambda_code = file.read()
       
        response = lambda_client.create_function(
            FunctionName=lambda_function_name,
            Runtime="python3.9",
            Role=lambda_role_arn,
            Handler="lambda_function.lambda_handler",
            Code={'ZipFile': lambda_code.encode()},
            Timeout=900,
            MemorySize=256
        )
        return response['FunctionArn']
    
    # Create knowledge base with custom chunking
    knowledge_base_custom = BedrockKnowledgeBase(
        kb_name=f'{knowledge_base_name_custom}-{suffix}',
        kb_description=knowledge_base_description,
        data_sources=data_source,
        lambda_function_name=lambda_function_name,
        intermediate_bucket_name=intermediate_bucket_name,
        chunking_strategy="CUSTOM",
        suffix=f'{suffix}-c'
    )
    
    # Start ingestion job
    time.sleep(30)
    knowledge_base_custom.start_ingestion_job()
    kb_id_custom = knowledge_base_custom.get_knowledge_base_id()

Evaluate performance using the Ragas framework

To evaluate performance using the Ragas framework, follow these steps:

  1. Set the rating for Ragas:
    from ragas import SingleTurnSample, EvaluationDataset
    from ragas import evaluate
    from ragas.metrics import (
    context_recall,
    context_precision,
    answer_correctness
    )
    
    # Initialize Bedrock models for evaluation
    TEXT_GENERATION_MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"
    EVALUATION_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"
    
    llm_for_evaluation = ChatBedrock(model_id=EVALUATION_MODEL_ID, client=bedrock_client)
    bedrock_embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    client=bedrock_client
    )

  2. Preparing the evaluation data set:
    # Define test questions and ground truths
    questions = [
    "What was the primary reason for the increase in net cash provided by operating activities for Octank Financial in 2021?",
    "In which year did Octank Financial have the highest net cash used in investing activities, and what was the primary reason for this?",
    # Add more questions...
    ]
    
    ground_truths = [
    "The increase in net cash provided by operating activities was primarily due to an increase in net income and favorable changes in operating assets and liabilities.",
    "Octank Financial had the highest net cash used in investing activities in 2021, at $360 million...",
    # Add corresponding ground truths...
    ]
    
    def prepare_eval_dataset(kb_id, questions, ground_truths):
    samples = []
    for question, ground_truth in zip(questions, ground_truths):
    # Get response and context
    response = retrieve_and_generate(question, kb_id)
    answer = response["output"]["text"]
    
    # Process contexts
    contexts = []
    for citation in response["citations"]:
    context_texts = [
    ref["content"]["text"]
    for ref in citation["retrievedReferences"]
    if "content" in ref and "text" in ref["content"]
    ]
    contexts.extend(context_texts)
    
    # Create sample
    sample = SingleTurnSample(
    user_input=question,
    retrieved_contexts=contexts,
    response=answer,
    reference=ground_truth
    )
    samples.append(sample)
    
    return EvaluationDataset(samples=samples)

  3. Perform the assessment and compare the results.
    # Evaluate both approaches
    contextual_chunking_dataset = prepare_eval_dataset(kb_id_custom, questions, ground_truths)
    default_chunking_dataset = prepare_eval_dataset(kb_id_standard, questions, ground_truths)
    
    # Define metrics
    metrics = [context_recall, context_precision, answer_correctness]
    
    # Run evaluation
    contextual_chunking_result = evaluate(
    dataset=contextual_chunking_dataset,
    metrics=metrics,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
    )
    
    default_chunking_result = evaluate(
    dataset=default_chunking_dataset,
    metrics=metrics,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
    )
    
    # Compare results
    comparison_df = pd.DataFrame({
    'Default Chunking': default_chunking_result.to_pandas().mean(),
    'Contextual Chunking': contextual_chunking_result.to_pandas().mean()
    })
    
    # Visualize results
    def highlight_max(s):
    is_max = s == s.max()
    return ['background-color: #90EE90' if v else '' for v in is_max]
    
    comparison_df.style.apply(
    highlight_max,
    axis=1,
    subset=['Default Chunking', 'Contextual Chunking']

Performance Benchmark

We used the AWS decision guide to assess the performance of the proposed context search approach. Select Generate AI Service as the RAG Test Document. We have set up two Amazon bedrock knowledge bases for evaluation.

  • One knowledge base with default chunking strategies. This uses 300 tokens per chunk with 20% overlap
  • Another knowledge base with a custom context search chunking approach. This includes a custom context search lambda trans in addition to a fixed chunking strategy that uses 300 tokens per chunk with a 20% overlap.

Using the Ragas framework, we evaluated the performance of these two approaches using a small dataset. Specifically, we looked into the following metrics:

  • context_recall – Measurement of the number of related documents (or parts of information) that are successfully retrieved to measure
  • context_precision – Context Accuracy is a metric that measures the percentage of related chunks retrieved_contexts
  • answer_correctness – Evaluation of response accuracy involves measuring the accuracy of generated responses when compared to ground truth
from ragas import SingleTurnSample, EvaluationDataset
from ragas import evaluate
from ragas.metrics import (
    context_recall,
    context_precision,
    answer_correctness
)

#specify the metrics here
metrics = [
    context_recall,
    context_precision,
    answer_correctness
]

questions = [
    "What are the main AWS generative AI services covered in this guide?",
    "How does Amazon Bedrock differ from the other generative AI services?",
    "What are some key factors to consider when choosing a foundation model for your use case?",
    "What infrastructure services does AWS offer to support training and inference of large AI models?",
    "Where can I find more resources and information related to the AWS generative AI services?"
]
ground_truths = [
    "The main AWS generative AI services covered in this guide are Amazon Q Business, Amazon Q Developer, Amazon Bedrock, and Amazon SageMaker AI.",
    "Amazon Bedrock is a fully managed service that allows you to build custom generative AI applications with a choice of foundation models, including the ability to fine-tune and customize the models with your own data.",
    "Key factors to consider when choosing a foundation model include the modality (text, image, etc.), model size, inference latency, context window, pricing, fine-tuning capabilities, data quality and quantity, and overall quality of responses.",
    "AWS offers specialized hardware like AWS Trainium and AWS Inferentia to maximize the performance and cost-efficiency of training and inference for large AI models.",
    "You can find more resources like architecture diagrams, whitepapers, and solution guides on the AWS website. The document also provides links to relevant blog posts and documentation for the various AWS generative AI services."
]

The results obtained using the default chunking strategy are shown in the following table.

The results obtained using the context search chunk strategy are shown in the following table. It shows improved performance across key metrics evaluated, such as context recall, context accuracy, and response accuracy.

By aggregating the results, we can observe that the context chunking approach surpassed the default chunking strategy. context_recall, context_precisionand answer_correctness metric. This illustrates the advantages of the more sophisticated context search techniques implemented.

Implementation Considerations

When implementing context search using Amazon Bedrock, several factors need to be considered carefully. First, custom chunking strategies need to be optimized for both performance and accuracy, and require thorough testing on a variety of document types and sizes. The memory allocation and timeout settings for the Lambda function should be adjusted based on the expected document complexity and processing requirements, which will allow the initial recommendations of 1024 MB memory and 900 seconds of timeout to function as a baseline configuration. Organizations must also configure Iam's role with the principle of minimal privilege, while maintaining sufficient authority for Lambda to interact with Amazon S3 and Amazon Bedrock services. Furthermore, the vectorization process and knowledge base structure must be fine-tuned to balance search accuracy and computational efficiency, especially when scaling to larger data sets.

Infrastructure scalability and monitoring considerations are equally important for successful implementations. Organizations need to implement robust error handling mechanisms within Lambda functions to gracefully manage various document formats and potential processing failures. Monitoring systems must be established to track key metrics such as chunk performance, search accuracy, and system latency to enable proactive optimization and maintenance.

Using Langfuse with Amazon Bedrock is a good option for introducing observability into this solution. S3 bucket structures for both source and intermediate storage should be designed using clear lifecycle policies and access controls to consider local availability and data residency requirements. Furthermore, implementing a step-by-step deployment approach starting with a subset of data before scaling to a full production workload can help identify and address potential bottlenecks or optimization opportunities early in the implementation process.

cleaning

Once your solution experiments are complete, clean up any resources you have created to avoid future charges.

Conclusion

Combining Anthropic's sophisticated language model with Amazon Bedrock's robust infrastructure, organizations can now implement intelligent systems for information retrieval that provides deep contextualized and subtle responses. The implementation steps outlined in this post provide a clear pathway for organizations to use the context search feature via Amazon Bedrock. From configuring IAM permissions to deploying custom chunking strategies, developers and organizations can unlock the full potential of context-enabled AI systems.

By leveraging Anthropic's language model, organizations can remain at the forefront of AI innovation while delivering more accurate and meaningful results to their users. Today we can start with context search using Anthropic's language model through Amazon Bedrock, allowing us to translate how AI uses existing data to process information with small proof of concepts. For personalized guidance on implementation, please contact our AWS account team.


About the author

Suheel Farooq He is a leading engineer in AWS Support Engineering and specializes in generator AI, artificial intelligence and machine learning. As a subject expert for Amazon Bedrock and Sagemaker, he helps design, build, modernize and scale AI/ML and generation AI workloads on AWS. During his free time, Sueheel enjoys working out and hiking.

Author's Blue WeiBlue wei li I am a machine learning specialist at Amazon Web Services. He received his PhD. In Operations Research after he broke the advisor's research grant account and failed to offer the Nobel Prize he had promised. He now helps clients in the financial services and insurance industry build machine learning solutions on AWS. In my spare time, I like reading and teaching.

Vinita I am AWS Senior Serverless Specialist Solution Architect. She combines AWS knowledge with powerful business insights to create extraordinary and innovative solutions to promote quantifiable value for customers and navigate complex challenges. Vinita's technical expertise in application modernization, Genai, cloud computing, and the ability to drive measurable business impact will have a major impact on customer journeys with AWS.

Sharon Lee I am AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. Passionate about leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generation AI solutions on the AWS Cloud Platform.

Venkata Moparti He is a senior solution architect and specializes in cloud migration, generation AI, and secure architectures for financial services and other industries. He combines technical expertise with customer-centric strategies to accelerate digital transformation and drive business outcomes through optimized cloud solutions.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *