Google has fully integrated computer use into the gemini 3.5 flash. This allows models to interact directly with browsers, mobile interfaces, and desktop applications. This means you can click, tap, scroll, and perform multi-step tasks. Previously, this feature was included in a separate model. Starting June 24, 2026, it will run natively within the standard Flash model. For developers, this means a model that allows them to see, think, and act all at once.
Summary of key points
- Starting June 24, 2026, Computer Use will be an integral part of Gemini 3.5 Flash and will no longer be a separate model.
- AI controls browsers, smartphone interfaces, and desktop applications through screenshots and simulated input.
- In the OSWorld Verified Benchmark, this model achieves 78.4%, which is almost on par with GPT-5.5.
- Two optional security systems are designed to thwart prompt injection attacks and protect critical actions.
- It is available through the Gemini API and Gemini Enterprise Agent Platform, as well as demo environments and reference code.
What your computer can actually do with Gemini 3.5 Flash
Until now, Computer Use at Google was a special model based on Gemini 2.5. Although I could interact with the user interface, I couldn’t access other tools like Google Search or Map Grounding at the same time. This very separation has now been eliminated. In Gemini 3.5 Flash, screen controls are one of the built-in tools, along with function calls and the familiar search and map integration.
In practice, it works like this: The model receives a screenshot of the current interface, recognizes buttons, text fields, menus, and decides what to do next. Click buttons, fill out forms, switch tabs, and enter text. Google cites a functional analysis of its Gemini app and a self-audit of its accessibility documentation as examples. Therefore, this model covers three environments: web browsers, mobile operating systems, and traditional desktop software.
The real charm lies in the boring work. Continuous software testing, clicking through multiple business applications, and knowledge work across different tools – tasks that involve many steps and previously required significant manual effort.
Using Gemini computers in benchmarks: doing well but not at the top
The big question, of course, is how well the whole thing actually works. The benchmark used is OSWorld-Verified, which tests Computer Use agents across Ubuntu, Windows, and macOS. Gemini 3.5 Flash had a score of 78.4% there. By comparison, its predecessor, the Gemini 3 Flash, scored 65.1%, an increase of more than 13 points from generation to generation.
| model | OSWorld certified |
|---|---|
| Claude Op. 4.8 | 83.4% |
| GPT-5.5 | 78.7% |
| gemini 3.5 flash | 78.4% |
| gemini 3.1 pro | 76.2% |
| gemini 3 flash | 65.1% |
There are two things to note. First, all numbers on the OSWorld Verified leaderboard are self-reported by the vendor and will not be independently verified until June 2026. Benchmarks can help you get a general idea, but be careful about comparing directly to the decimal point. Second, the difference between Flash and GPT-5.5 is only 0.3 points. The real difference is in the price. Gemini 3.5 Flash costs $1.50 per million input tokens and $9 per million output tokens, while GPT-5.5 costs $5 and $30, respectively. If the agent has a large workload, that amount can quickly increase.
While the raw numbers look appealing, in computational use there is a larger gap between benchmarks and production environments than in most other AI tasks. OSWorld measures predefined tasks in a stable environment. Real-world agents, on the other hand, work in applications that are constantly changing, require logins, and display screen states that the model has never seen before. Google itself advises against using computers for important decisions, sensitive data, or situations where errors cannot be corrected.
Security: How does Google plan to manage risk?
An AI that navigates browsers, forms, and file systems on its own has a completely different scope than a text-only chatbot. If power user privileges are granted, that functionality itself becomes a vulnerability. Therefore, Google relies on targeted adversarial training in Gemini 3.5 Flash to reduce the risk of instant injection in live environments.
In addition, there are two optional protection systems for businesses. One requires explicit user confirmation before performing sensitive or irreversible actions. The other automatically stops the task as soon as it detects an indirect prompt injection. Google also recommends a defense-in-depth approach using secure sandboxes, human involvement, and strict permissions. Google provides more details in its best practices documentation.
How computer use fits into the Gemini strategy
This integration is not a one-time step and is in line with Google’s approach over the past few months. Gemini 3.5 Flash was designed from the beginning as an agent-based model, and since its announcement at Google I/O 2026, it has powered features such as the persistent agent Gemini Spark. For more information about the model itself and its agent-based features, see the article Introducing Gemini 3.5 Flash.
This approach is also evident in everyday use. On Android, Gemini Intelligence is increasingly taking on proactive tasks and automating workflows across multiple apps. And within the Gemini app itself, Google has been shifting its focus from a pure chatbot to an active assistant for several months. 3.5 The use of computers in Flash is essentially the technical foundation upon which many of these promises are built.
Availability: How to get started
If you want to try using a computer, you have several options. Developers and enterprises can access this functionality through the Gemini API and Gemini Enterprise Agent Platform. For quick testing, there is a demo environment hosted by Browserbase, and to help you get started, Google provides a reference implementation on GitHub.
conclusion
With the use of computers in Gemini 3.5 Flash, Google is making the leap from pure assistance to execution. The model of being able to simultaneously use Google search, call your own functions, and operate your browser on the side is a real game-changer for automation and enterprise workflows. Its cost advantage over GPT-5.5 makes it particularly attractive when dealing with many parallel agents.
At the same time, a calm evaluation is also necessary. Benchmark numbers are self-reported, and the transition from a test environment to an actual production environment is especially delicate in computer usage. The fact that Google itself recommends sandboxing, human oversight, and caution when handling critical tasks should be taken seriously. Nevertheless, it remains exciting as the next logical step, Gemini 3.5 Pro, is already in the starting stages.
Frequently asked questions about using computers with Gemini 3.5 flash
What is “computer use” for Gemini 3.5 Flash?
This is a built-in tool that allows models to interact with browsers, apps, and desktop programs on their own. Analyze user interface screenshots and perform actions such as clicks, taps, and scrolls.
How does Gemini 3.5 Flash perform in the “Computer Use” test?
In the OSWorld Verified Benchmark, this model achieved 78.4 percent, almost on par with GPT-5.5 (78.7 percent). According to the reported values, Claude Opus 4.8 takes the lead with 83.4%.
Is it safe to use a computer?
Google uses adversarial training and offers two optional protection systems: critical action confirmation requirements and automatic termination if prompt injection is detected. Google advises against using it for sensitive or non-recoverable tasks.
« Previous articleApple price hikes in 2026: How expensive are MacBooks, iPads, and Apple TVs now?
Next article »GPT-5.5 Instant: OpenAI update aimed at making ChatGPT a better conversation partner
