Google just fired back at OpenAI with Gemini 2.5 Computer Use, a new AI model that navigates websites like a human user. Released just one day after OpenAI's Dev Day announcements, Google's approach focuses specifically on browser automation while competitors offer full computer control. This timing isn't coincidental - it's Google's direct challenge to OpenAI's ChatGPT Agent dominance in the rapidly expanding AI automation market.
Google isn't letting OpenAI dominate the AI agent space without a fight. The company just unveiled Gemini 2.5 Computer Use, a specialized AI model that can navigate and interact with web browsers exactly like humans do. The timing is striking - this launch comes precisely one day after OpenAI showcased new ChatGPT applications at Dev Day 2025.
The new model uses what Google calls "visual understanding and reasoning capabilities" to analyze user requests and execute tasks through browser interfaces. Think filling out forms, clicking buttons, or navigating complex web applications that don't have APIs. It's designed for scenarios where traditional automation breaks down - when you need AI to work with interfaces built for human eyes and fingers, not code.
Google's been quietly testing this technology through Project Mariner, a research prototype that can add grocery items to your Safeway cart based on recipe ingredients. But Gemini 2.5 Computer Use represents the first commercial release of this browser automation capability for developers.
The competitive landscape is heating up fast. Anthropic actually moved first in this space, releasing Claude with computer use capabilities last October. OpenAI followed with its ChatGPT Agent feature and just doubled down with new developer apps announced yesterday. Now Google's entering the fray with its own approach.
But Google's taking a different tactical approach than its rivals. While ChatGPT Agent and Anthropic's computer use tools can control entire desktop environments, Gemini 2.5 Computer Use deliberately limits itself to browser-only interactions. Google acknowledges it's "not yet optimized for desktop OS-level control" and currently supports just 13 specific actions like opening browsers, typing text, and dragging elements.