Existing Longcott inference models have achieved cutting-edge performance in mathematical inference by generating inference trajectories through iterative self-validation and improvements. However, open source longcott models rely solely on natural language inference traces, making them computationally expensive and error-prone without a validation mechanism. Tool-assisted inference increases the efficiency and reliability of large-scale numerical calculations through frameworks such as open hands that integrate code interpreters, but these agent approaches combat abstract or conceptually complex inference problems.
DualDistill Framework and Agent R1 Model
Proposals from Carnegie Mellon University dualdistilla distillation framework that combines trajectories from two complementary teachers to create a unified student model. This framework is developed using one reasoning-oriented teacher and one tooled teacher Agent R1A model that learns to dynamically select the most appropriate strategy for each problem type. Agent R1 executes code for arithmetic and algorithmic tasks while employing natural language inference for abstract problems. DualDistill uses the trajectory structure to distill knowledge from both complementary teachers, then self-resistance. Additionally, the researchers used the open hand as the agent's inference teacher and Deepseek-R1 as the text-based inference teacher.


Ratings and benchmarks
The proposed method is evaluated on multiple benchmarks such as deepmath-l and combinatorics300 Tests various aspects of mathematical reasoning. It is compared to baseline deepseek-r1-distill and Qwen-2.5-Instruct. Agent R1 in the student model shows excellent performance improvements that benefit from both agent and inference strategies. It is superior to two similarly sized models, each specializing in tool assist (QWEN2.5-7B-instruct) or pure inference (deepseek-r1-distill7b) strategies. Agent R1 outperforms tool-based models by intelligently using inference strategies when needed, while maintaining greater efficiency compared to pure inference models for standard mathematical tasks.
Qualitative analysis and tool usage patterns
The qualitative example shows that Agent R1 shows the use patterns of intelligent tools. 79.2% Computationally demanding Combinatorics300 problems while reducing activation 52.0% For simpler AMC dataset problems. Agent R1 learns to properly invoke the tool with just monitored fine tunings, without explicit indication, without effectively balancing computational efficiency and inference accuracy.
Robustness for incomplete teachers
The framework remains effective even when guided by an incomplete teacher. For example, an agent teacher only achieves 48.4% Combinatorics300 has accuracy, but the student model has been improved 44.7% In 50.9%In the end, he surpasses the teacher.
Conclusion
In summary, dualdistill The framework effectively combines the strengths of natural language inference and tool-assisted problem-solving by distilling complementary knowledge from two specialist teacher models into a single, versatile student model. Agent R1. Through orbital composition and self-resistance, Agent R1 learns to dynamically select the most appropriate strategy for each problem, balancing accuracy and computational efficiency. Evaluations of the diverse mathematical inference benchmarks show that agent R1 outweighs both pure inference and tool-based models, even when learning from incomplete teachers. This study highlights a promising approach to building adaptive AI agents that can integrate heterogeneous problem-solving strategies for more robust and efficient inference.
Please check Paper and github pages. All credits for this study will be directed to researchers in this project.
Meet the AI Dev newsletter read by Nvidia, Openai, Deepmind, Meta, Microsoft, JP Morgan Chase, Amgen, Aflac, Wells Fargo, 100s 40k+ Devs and researchers [SUBSCRIBE NOW]

Sajjad Ansari is the final year of IIT Kharagpur. As a technology enthusiast, he delves into practical applications of AI, focusing on understanding the impact of AI technology and its real-world meaning. He aims to clarify complex AI concepts in clear and accessible ways.
