For AI models to be effective in specialized domains, access to relevant background knowledge is required. For example, a customer support chat assistant needs detailed information about the business that provides services, and legal analytics tools should utilize a comprehensive database of past cases.
To equip this knowledge into large-scale language models (LLM), developers often use Search Extension Generation (RAG). This technique retrieves appropriate information from the knowledge base and incorporates it into user prompts, greatly improving the response of the model. However, an important limitation of traditional RAG systems is that they often lose contextual nuances when encoding data, leading to irrelevant or incomplete searches from the knowledge base.
Traditional rag challenges
In traditional rags, documents are often split into small chunks to optimize search efficiency. This method works well in many cases, but can introduce challenges when individual chunks are lacking in the required context. For example, if you say remote work requires a “6 month tenure” (chunk 1) and a “HR approval exception” (chunk 3), the middle chunk link exception with manager approval is omitted. This occurs because the isolated chunks fail to maintain dependencies between clauses, highlighting the important limitations of the basic chunking strategy in RAG systems.
Context Search enhances traditional rags by adding a chunk-specific descriptive context to each chunk before generating an embedding. This approach enriches the vector representation with relevant context information, allowing for more accurate searching for semantically related content when responding to user queries. For example, when you ask about eligibility for remote work, you can get both tenure requirements and HR exception clauses and provide an accurate response such as “usually no, but HR may approve the exception.” By intelligently sewing fragmented information, contextual search reduces the pitfalls of rigid chunking and provides a more reliable and subtle answer.
This post shows how to use contextual search in the bedrock knowledge base of humanity and Amazon.
Solution overview
This solution uses Amazon's bedrock knowledge base to incorporate custom lambda functions to transform data during the knowledge base intake process. This Lambda function processes documents in Amazon Simple Storage Service (Amazon S3) and chuns them into small pieces, enrich each chunk with contextual information using anthropic claud on Amazon bedrock, and returns the results to an intermediate S3 bucket. Here's a step-by-step explanation:
- Reads the input file from the S3 bucket specified in the event.
- Chunk input data into small chunks.
- Generate context information for each chunk using Anthropic's Claude3 Haiku
- Use metadata to return chunks that have been processed back to the intermediate S3 bucket
The following diagram shows the solution architecture.
Prerequisites
To implement the solution, complete the following prerequisites:
Before you begin, you can download the required files and follow the instructions in the corresponding GitHub repository to deploy this solution. This architecture is built around implementing context search using the Amazon Bedrock Knowledge Base using the proposed chunking solution.
Implement context search on Amazon bedrock
This section shows how to implement context search using the Amazon Bedrock Knowledge Base using the proposed custom chunking solution. Developers can use custom chunking strategies in Amazon Bedrock to optimize how smaller documents or datasets are split into smaller, more manageable pieces for processing by the underlying model (FMS). This approach allows for more efficient and effective handling of long-length content, improving the quality of response. By adjusting the chunking method to the specific characteristics of the data and the requirements of the task at hand, developers can improve the performance of natural language processing applications built on Amazon Bedrock. Custom chunking can include techniques such as semantic segmentation, sliding windows with overlaps, and creating logical departments in text using document structures.
To implement context search in Amazon Bedrock, complete the following steps: This can be found in the GitHub repository notebook.
To set up your environment, follow these steps:
- Install the required dependencies.
- Import the required libraries and set up the AWS client.
- Define the parameters for the knowledge base.
Create a knowledge base with a variety of chunking strategies
To create a knowledge base with different chunk strategies, use the following code:
- Standard modified chunking:
- Custom chunking using lambda functions
Evaluate performance using the Ragas framework
To evaluate performance using the Ragas framework, follow these steps:
- Set the rating for Ragas:
- Preparing the evaluation data set:
- Perform the assessment and compare the results.
Performance Benchmark
We used the AWS decision guide to assess the performance of the proposed context search approach. Select Generate AI Service as the RAG Test Document. We have set up two Amazon bedrock knowledge bases for evaluation.
- One knowledge base with default chunking strategies. This uses 300 tokens per chunk with 20% overlap
- Another knowledge base with a custom context search chunking approach. This includes a custom context search lambda trans in addition to a fixed chunking strategy that uses 300 tokens per chunk with a 20% overlap.
Using the Ragas framework, we evaluated the performance of these two approaches using a small dataset. Specifically, we looked into the following metrics:
context_recall
– Measurement of the number of related documents (or parts of information) that are successfully retrieved to measurecontext_precision
– Context Accuracy is a metric that measures the percentage of related chunksretrieved_contexts
answer_correctness
– Evaluation of response accuracy involves measuring the accuracy of generated responses when compared to ground truth
The results obtained using the default chunking strategy are shown in the following table.
The results obtained using the context search chunk strategy are shown in the following table. It shows improved performance across key metrics evaluated, such as context recall, context accuracy, and response accuracy.
By aggregating the results, we can observe that the context chunking approach surpassed the default chunking strategy. context_recall
, context_precision
and answer_correctness
metric. This illustrates the advantages of the more sophisticated context search techniques implemented.
Implementation Considerations
When implementing context search using Amazon Bedrock, several factors need to be considered carefully. First, custom chunking strategies need to be optimized for both performance and accuracy, and require thorough testing on a variety of document types and sizes. The memory allocation and timeout settings for the Lambda function should be adjusted based on the expected document complexity and processing requirements, which will allow the initial recommendations of 1024 MB memory and 900 seconds of timeout to function as a baseline configuration. Organizations must also configure Iam's role with the principle of minimal privilege, while maintaining sufficient authority for Lambda to interact with Amazon S3 and Amazon Bedrock services. Furthermore, the vectorization process and knowledge base structure must be fine-tuned to balance search accuracy and computational efficiency, especially when scaling to larger data sets.
Infrastructure scalability and monitoring considerations are equally important for successful implementations. Organizations need to implement robust error handling mechanisms within Lambda functions to gracefully manage various document formats and potential processing failures. Monitoring systems must be established to track key metrics such as chunk performance, search accuracy, and system latency to enable proactive optimization and maintenance.
Using Langfuse with Amazon Bedrock is a good option for introducing observability into this solution. S3 bucket structures for both source and intermediate storage should be designed using clear lifecycle policies and access controls to consider local availability and data residency requirements. Furthermore, implementing a step-by-step deployment approach starting with a subset of data before scaling to a full production workload can help identify and address potential bottlenecks or optimization opportunities early in the implementation process.
cleaning
Once your solution experiments are complete, clean up any resources you have created to avoid future charges.
Conclusion
Combining Anthropic's sophisticated language model with Amazon Bedrock's robust infrastructure, organizations can now implement intelligent systems for information retrieval that provides deep contextualized and subtle responses. The implementation steps outlined in this post provide a clear pathway for organizations to use the context search feature via Amazon Bedrock. From configuring IAM permissions to deploying custom chunking strategies, developers and organizations can unlock the full potential of context-enabled AI systems.
By leveraging Anthropic's language model, organizations can remain at the forefront of AI innovation while delivering more accurate and meaningful results to their users. Today we can start with context search using Anthropic's language model through Amazon Bedrock, allowing us to translate how AI uses existing data to process information with small proof of concepts. For personalized guidance on implementation, please contact our AWS account team.
About the author
Suheel Farooq He is a leading engineer in AWS Support Engineering and specializes in generator AI, artificial intelligence and machine learning. As a subject expert for Amazon Bedrock and Sagemaker, he helps design, build, modernize and scale AI/ML and generation AI workloads on AWS. During his free time, Sueheel enjoys working out and hiking.
Blue wei li I am a machine learning specialist at Amazon Web Services. He received his PhD. In Operations Research after he broke the advisor's research grant account and failed to offer the Nobel Prize he had promised. He now helps clients in the financial services and insurance industry build machine learning solutions on AWS. In my spare time, I like reading and teaching.
Vinita I am AWS Senior Serverless Specialist Solution Architect. She combines AWS knowledge with powerful business insights to create extraordinary and innovative solutions to promote quantifiable value for customers and navigate complex challenges. Vinita's technical expertise in application modernization, Genai, cloud computing, and the ability to drive measurable business impact will have a major impact on customer journeys with AWS.
Sharon Lee I am AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. Passionate about leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generation AI solutions on the AWS Cloud Platform.
Venkata Moparti He is a senior solution architect and specializes in cloud migration, generation AI, and secure architectures for financial services and other industries. He combines technical expertise with customer-centric strategies to accelerate digital transformation and drive business outcomes through optimized cloud solutions.