HubSpot engineers introduced Sidekick, an internal AI-powered code review agent designed to analyze pull request changes and provide automated feedback to developers. This system uses an extensive language model to review your code and post comments directly to your repository on GitHub. According to the engineering team, this tool has reduced the time to first feedback on pull requests by approximately 90%, allowing developers to identify issues earlier in the review process.
Code reviews are essential in software development, but they can be delayed if reviewers are not available. At HubSpot, engineers realized that while AI coding assistants were speeding up code creation, manual reviews were slowing down. Sidekick provides instant feedback on pull requests, freeing up human reviewers to focus on architecture and higher-level design, increasing efficiency and reducing review bottlenecks.
As Emily Adams explained in the company’s blog post,
You might be surprised by what we discovered. Our AI code reviewers see real problems, understand HubSpot’s unique context, maintain a high signal-to-noise ratio, and often leave no comments at all.
The first version of the system ran on an internal platform called Crucible. The large-scale language model agent ran in a Kubernetes environment and interacted with GitHub repositories via the command line. The agent captured the changes in the pull request and generated review comments with prompts to identify potential issues and improvements. Although this approach demonstrated that LLM could provide useful feedback, it added operational complexity. Each review required a separate containerized workload, increasing latency and infrastructure overhead, and limiting control over agent interactions with developer tools and internal services.
To address these limitations, the engineering team migrated the system to a Java-based agent framework called Aviator. It’s integrated with HubSpot’s development platform, allowing you to run your review agent within your existing service instead of in an isolated workload. Aviator supports multiple model providers, including Anthropic, OpenAI, and Google, and enables experimentation and fallback options. Through RPC-based tool abstractions, agents capture repository context such as configuration settings and coding conventions, improving the relevance and accuracy of automated review comments.
The main challenge identified during implementation was the quality of feedback. Early versions generated redundant or overly positive comments that were considered noise. To address this, the team introduced a “judge agent” that evaluates comments before posting them to pull request discussions. According to HubSpot engineers, this rating pattern reduces low-value comments and improves signal-to-noise ratio. Developers can also react to automated comments to provide feedback that helps with quick adjustments and model selection. The system consistently receives a high rating of 80% from engineers, demonstrating strong adoption and trust.

See the agent evaluation loop from agent to judge (Source: HubSpot blog post)
Brian L., VP of Engineering at HubSpot, said on LinkedIn:
The most impactful change is adding a second agent to rate reviews before posting. The result is fewer, better, and more actionable comments. We knew we were right when our engineers started asking us to review feedback on Sidekick even before opening a PR.
HubSpot engineers say future work will include improving understanding of related code changes, such as adding persistent memory for review agents and expanding context retrieval across repositories.
