How Baz uses Amazon Bedrock AgentCore to improve AI agent code review accuracy

Code reviews have always been manual and inefficient due to the inherent disconnect between code and product. Developers can review whether the code compiles and works, but they cannot verify whether it meets all functional and design requirements. Previously, QA teams would spend hours manually clicking through preview environments to ensure features worked as expected, and even more time aligning the implementation with design intent. This manual validation slowed down delivery, introduced inconsistencies, and increased the likelihood of regression. As the development team became faster, Baz wanted to automate this missing layer of validation and bring intent, behavior, and implementation together into a single review workflow.

In this post, Baz describes how he built a Spec Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We discuss architectural decisions, implementation details, and business outcomes achieved by leveraging these AWS services to automate the code review process.

The main problem Buzz is trying to solve

Baz is built to go beyond traditional delta-only reviews to validate whether features meet intended product requirements. Buzz realized early on that his team struggled with reviews that focused on syntax rather than behavior, forcing them to manually answer key questions like “does it work,” “does it match spec,” and “does it work as intended” later in the process? The gap between code and product intent slowed down the team, led to design inconsistencies, and required heavy reliance on undocumented internal QA knowledge. Baz set out to bridge this gap by building an agent that can evaluate the actual experience being delivered, not just the code.

Solution overview

The Baz Spec Review agent orchestrates a sophisticated multi-step validation pipeline. Triggers (webhooks or manual calls) simultaneously query Figma via MCP and Jira via REST API to aggregate comprehensive requirements artifacts across technical, product, and design specifications. The system then spawns isolated subagent workers (one per requirement) tasked with validating the requirements. This subagent combines code checking through a source code repository with dynamic runtime validation using the Amazon Bedrock AgentCore browser tool. The subagent interacts with the ephemeral environment and performs DOM inspection, event simulation, and visual testing to ensure that the deployed implementation matches both Figma’s design specifications and operational requirements, and provides end-to-end validation throughout the specification-to-implementation lifecycle through AWS native orchestration.

The following diagram shows the Spec Reviewer architecture, a joint Baz and AWS solution that enables automated design and product validation within code review workflows. The entire agent flow is powered by large-scale language models provided through Amazon Bedrock, providing scalable and secure AI inference throughout the pipeline. This flow begins when a GitHub webhook is triggered on a new pull request and routes traffic to your Amazon EKS cluster through your Application Load Balancer (ALB) and Network Load Balancer (NLB). The Baz platform acts as a central orchestration layer, coordinating the review process across multiple agents.

Within an Amazon EKS cluster, Baz’s Spec Review Agent divides the validation workflow into specialized subagents. The specification subagent, powered by Amazon Bedrock, takes both visual specifications from Figma and functional specifications from Jira and decomposes them into separate requirements: visual requirements (spacing, colors, component hierarchy, etc.) and functional requirements (such as acceptance criteria and user story intent).

Implementation subagents are the core of this architecture. These Amazon Bedrock-powered agents perform deep code analysis on extracted specifications, but what sets them apart is their integration with the Amazon Bedrock AgentCore Browser Use feature. Rather than relying solely on static code analysis, the implementation subagent can render the actual implementation in a live preview environment and visually verify that the UI matches the intended Figma design and that the functionality behaves as specified in Jira. This combination of code understanding and browser-based validation allows Baz to uncover inconsistencies that would be completely missed by traditional code review tools.

The report generator consolidates findings from all subagents into a consistent review summary. Once the review is complete, the findings will be distributed to the appropriate channels. Comments are posted directly to GitHub PR, notifications are sent to Slack for team visibility, and identified issues are automatically linked to Jira for tracking and resolution.

How Baz implemented Amazon Bedrock AgentCore to address these challenges

Amazon Bedrock AgentCore became the foundation for building an AI code reviewer that can verify real-world product behavior. A secure, isolated, serverless browser session allows the Spec Reviewer agent to open a preview environment, navigate between features, and inspect UI behavior just as a user would. By running an MCP server that integrates with your ticketing system using the Amazon Bedrock AgentCore runtime, and combining Amazon Bedrock AgentCore browser tools with lightweight automation and context modules, Baz Reviewer can compare live behavior and code to tickets and design specifications without the need for browser infrastructure or custom orchestration. Amazon Bedrock AgentCore’s isolation, sandboxing, and observability help Baz scale across multiple MCP servers and enable agents to safely and reliably perform full-stack verification at scale.

Enabling intelligent code reviews with Amazon Bedrock

Amazon Bedrock powers the reasoning and decision-making layer behind the Spec Reviewer agent, allowing it to interpret requirements, understand design intent, and evaluate the relevance of behavior observed in the browser. Using the Amazon Bedrock management foundation model, agents can synthesize specification context, analyze UI state, and draw accurate, actionable conclusions about whether functionality meets expectations. Amazon Bedrock provides the reliability, security, and scale needed for production-grade agent workflows, allowing Baz to isolate browser execution within AgentCore while offloading complex interpretation and validation logic to high-performance LLM. This combination allows reviewers to bridge the gap between what was intended and what was actually built.

conclusion

The Baz Spec Review Agent demonstrates how organizations can use Amazon Bedrock and Amazon Bedrock AgentCore to automate product validation workflows that previously required extensive manual effort. By leveraging the Amazon Bedrock foundation model for requirements interpretation and decision-making, combined with AgentCore’s secure browser automation capabilities, Baz created a solution that validates implementations against specifications throughout the development lifecycle, reducing reported bugs by up to 50% and reducing time to merge by 30-70%.

Customers adopting Spec Reviewer now move functional verification earlier in the development cycle and automatically occur during pull requests, significantly reducing manual product verification efforts. Teams report faster reviews, fewer regressions, and greater confidence that changes meet requirements before merging.