Zed Industries leader Ben Kunkle detailed the process of building Zeta2, an AI model designed to predict the next edit as users type. In his presentation, Kunkle discussed the technical pipeline and data considerations involved in training such models, and highlighted the challenges and solutions encountered in production.
Ben Kunkle talks about building Zed’s Zeta2 predictive model — From an AI engineer
Visual TL;DR. Predicting code edits leads to the Zeta2 model. Ultra-low latency requires Zeta2 model. The training pipeline trains the Zeta2 model. Data considerations inform the training pipeline. The Teacher Frontier model uses a training pipeline. Offline evaluation leads to production monitoring. Zeta2 models allow faster coding.
Predict code edits: AI models predict the next code edit as you type.
Zeta2 Model: Specialized small AI model for fast keystroke prediction
Ultra-low latency: Must operate in less than 300 milliseconds per keystroke for real-time use.
Training pipeline: Ingest production and synthetic data for model training
Data considerations: Focus on “settled data” vs. production data and synthetic sources
Teacher Frontier Model: Generate training data for the Zeta2 predictive model.
Offline evaluation: Evaluate model performance before production deployment
Production monitoring: Continuously track model performance in a live environment.
Faster coding: Enables users to write code faster and more efficiently.
Visual TL;DR
Understand edit predictions
Kunkle began by defining edit prediction as the task of providing context around the user’s cursor and recent edits to a model to predict subsequent edits, along with type and variable definitions, diagnostics, and errors. This process needs to be very fast, with a latency budget of less than 300ms for every keystroke, requiring a small and specialized model.
training pipeline
At the core of the training process is a pipeline that ingests both “production data” (snapshots of user activity) and “synthetic data” (git commits). This data is input into the “Teacher Frontier” model to generate predictions. These predictions are then evaluated, and failing predictions are sent to a “repair” stage, where the teacher model attempts to correct the predictions. The modified data is fed back into the distillation process to train the student model. Kunkle emphasized that each stage of this pipeline enriches data, converts JSONL input into enriched “samples,” and outputs JSONL. This is important for efficiently managing large datasets across experiments.
Data considerations and “settled data”
A major challenge in training edit prediction models is the inherent noise in the data. Kunkle explained that they use a concept called “settled data” to address this. This involves waiting for the prediction region to stabilize and then taking the final state of the code as the “answer”. By comparing your model’s predictions to this “steady state,” you can filter out noisy examples and identify high-quality training data. This method allows training on ideal examples where the match between predictions and final code is clear and unambiguous.
Offline evaluation and production monitoring
Regarding offline evaluation, Kunkle mentioned metrics such as “deltaChrF” (character F score), exact line match, reversal rate, and keep rate. These metrics are used to evaluate the model’s performance on the retained test set. He also touched on the importance of tracking model performance in production after deployment. This includes using structured logging of latency, retention, and token counts, as well as dashboards to monitor acceptance rates and A/B test results across different model versions. The goal is to continuously monitor and improve the effectiveness of the model in real-world use.