Before diving into the deep dive, we looked at the latest developments in the rapidly changing world of AI agents.
-
Using the Gemini 2.5 computer: Google’s new model functions as a virtual user, able to interact with a computer screen, click buttons, fill out forms, and scroll. This is a transition from agents that just know things to agents that can perform tasks directly in the browser.
-
Vibe coding in AI Studio: A new approach to building apps where you describe the “feel” of your application and let AI process the boilerplate. Includes an annotation mode to adjust specific UI elements with simple instructions such as “Please change this to green.”
-
DeepSeek-OCR and context compression: DeepSeek introduced a way to treat documents like images to understand their layout, compressing 10-20 text tokens into a single visual token. This significantly improves speed and reduces the cost of tasks with long contexts.
-
Google Veo 3.1 and flows: New updates to the AI video model add rich audio generation and powerful editing features. Creators can now use Insert to add characters and Delete to erase objects from existing video footage, giving creators iterative control.
Ravin Kumar talks about building open models
We spoke with Ravin to detail the end-to-end process of creating an open model with agent capabilities. Although this process mirrors a traditional ML lifecycle, we found the components to be significantly more complex.
Defining agent data
Timestamp: 14:55
Ravin explained that agent training data is very different from standard text datasets. It starts with identifying what your users actually need. The data itself is a collection of trajectories and is a complex example of how a model makes decisions or uses tools. Ravin said they use a combination of human-curated data and synthetic data generated by their own internal “teacher” models and APIs to create a playground for open models to learn.
Training methods: SFT and reinforcement learning
Timestamp: 17:14
Once the data is ready, the training process involves a two-step approach. First, there is supervised fine-tuning (SFT), where the framework updates the model’s weights to guide the model to new behavior based on examples. However, it relies on reinforcement learning (RL) to handle generalization (new situations that are not in the original training data). Ravin highlighted the difficulty of setting rewards in RL, warning that the model is prone to “reward hacking,” where intermediate rewards can be collected without completing the final task.
Rating bet
Timestamp: 20:10
Mr. Rabin emphasized that evaluation is the most important and high-stakes part of the process. You can’t just rely on the training process. A rigorous final exam is required. Combine extensive public benchmarks to measure common functionality and specific custom assessments to ensure your models are safe and effective for your intended user use cases.
conclusion
This conversation with Ravin Kumar made it clear that building an open agent model is a highly structured and rigorous process. It requires the creation of high-quality trajectories of data, a careful combination of supervised learning and reinforcement learning, and, importantly, intensive evaluation.
it’s your turn to build
As Lavin advised, the best place to start is at the end. Before you write a single line of training code, define what success looks like by creating a small final exam of 50 examples for your agents. If you can’t measure it, you can’t improve it. We also recommend that you try a combination of different approaches. For example, use a powerful API model like Gemini as a router and an open source model that is specialized for a specific task.
Check out the full episode for more details and tune in next time.
