Microsoft has introduced Fara-7B, a new 7 billion parameter model designed to act as a Computer Use Agent (CUA) that can perform complex tasks directly on users’ devices. Fara-7B delivers new cutting-edge results for its size, providing a way to build AI agents that do not rely on large-scale cloud-dependent models and can run on compact systems with low latency and enhanced privacy.
Although this model is an experimental release, its architecture addresses data security, a key barrier to enterprise adoption. Fara-7B is small enough to run locally, allowing users to automate sensitive workflows such as managing internal accounts and processing sensitive company data without any information ever leaving the device.
How Fara-7B views the web
Fara-7B is designed to interact with the user interface using the same tools as humans (mouse and keyboard). The model works by visually recognizing web pages through screenshots and predicting specific coordinates for actions such as clicking, typing, and scrolling.
Importantly, Fara-7B does not rely on the “accessibility tree,” the basic code structure that browsers use to describe web pages to screen readers. Instead, it relies solely on pixel-level visual data. This approach allows agents to interact with websites even if the underlying code is obfuscated or complex.
According to Yash Lara, senior PM lead at Microsoft Research, processing all visual input on the device creates true “pixel sovereignty” because the screenshots and inferences needed for automation remain on the user’s device. “This approach helps organizations meet the stringent requirements of regulatory areas including HIPAA and GLBA,” he told VentureBeat in written comments.
This visual-first approach produced excellent results in benchmark tests. above web voyagerFara-7B, the standard benchmark for web agents, achieved a task success rate of 73.5%. This provides better performance than large, resource-intensive systems such as: GPT-4owhen asked to act as a computer, it acts as an agent (65.1%) and as a native UI-TARS-1.5-7B model (66.4%).

Efficiency is another key differentiator. In comparison tests, the Fara-7B completed the task in an average of about 16 steps, compared to about 41 steps for the UI-TARS-1.5-7B model.
Responding to risks
However, the transition to autonomous agents is not without risks. Microsoft notes that Fara-7B has limitations common to other AI models, including the possibility of hallucinations, mistakes when following complex instructions, and reduced accuracy on complex tasks.
To mitigate these risks, the model was trained to recognize “critical points.” A critical point is defined as a situation where a user’s personal data or consent is required before an irrevocable action can occur, such as sending an email or completing a financial transaction. When reaching such a crossroads, Fara-7B is designed to pause and explicitly request user approval before continuing.
Managing this interaction without frustrating users is a key design challenge. “It’s important to balance robust safeguards such as critical points with seamless user journeys,” Lara said. “UIs like Microsoft Research’s Magentic-UI are essential to avoid approval fatigue while giving users the opportunity to intervene when needed.” Magentic-UI is a research prototype specifically designed to facilitate human-agent interaction. Fara-7B is designed to run on Magentic-UI.
Extract complexity into a single model
The development of Fara-7B was distillation of knowledgethe functionality of complex systems is compressed into smaller, more efficient models.
Creating a CUA typically requires a large amount of training data that shows how to navigate the web. Collecting this data via human annotation is prohibitively expensive. To solve this, Microsoft used a synthetic data pipeline built on top. magentic onea multi-agent framework. In this configuration, the “Orchestrator” agent created the plan, directed the “WebSurfer” agent to browse the web, and generated 145,000 successful task trajectories.
The researchers then “extracted” this complex interaction data into Fara-7B. Fara-7B is built on Qwen2.5-VL-7B. This base model was chosen for its long context window (up to 128,000 tokens) and powerful ability to connect text instructions to visual elements on the screen. Although data generation required a large multi-agent system, Fara-7B itself is a single model, and it has been shown that advanced behaviors can be effectively learned in small models without the need for complex scaffolding at runtime.
The training process relies on supervised fine-tuning, where the model learns by imitating successful examples produced by the synthesis pipeline.
I’m looking forward to it
The current version was trained on a static dataset, but future iterations will focus on making the model smarter, not necessarily bigger. “We will continue to strive to maintain the small size of our models,” Lara said. “Our ongoing research is focused on making agent models smarter and safer, not just bigger.” This includes exploring techniques such as: reinforcement learning (RL) A live sandbox environment that allows models to learn through trial and error in real time.
Microsoft has made this model available under the MIT License in Hugging Face and Microsoft Foundry. However, Lara cautions that while the license allows for commercial use, the model is not yet ready for production. “Under the MIT license, you are free to experiment and prototype the Fara-7B, but it is best suited for pilots and proof-of-concepts rather than mission-critical deployments,” he says.
