OpenAI unveils Operator, its AI agent which takes control of the web

Operator is capable of automating complex tasks previously carried out by the user in their web browser.

The rumor was true. Two days after the announcement of the Stargate project, OpenAI unveils this Thursday, January 23, 2025 its first artificial intelligence agent designed for the web. According to the OpenAI definition, an agent is an artificial intelligence capable of working autonomously: it is given a task, it executes it. “We think this is a major trend that will impact the way people work, their productivity, their creativity, what they can accomplish,” explains Sam Altman in the introduction. Operator is the first incarnation: an assistant with its own web browser, capable of seeing and interacting with pages as a human would, whether to fill out a form, order groceries or create a meme.

How does Operator work?

Under the hood, Operator is powered by a new model called “Computer-Using Agent” or “CUA”. This AI combines the vision capabilities of GPT-4o with an advanced reasoning system, developed by reinforcement learning. Concretely, the model can see what is displayed on the Operator browser screen via screenshots and interact with all the elements of a graphical interface – buttons, menus, text fields – using a keyboard and a virtual mice.

The Operator interface in ChatGPT. © Screenshot / JDN

If the system encounters an obstacle or makes an error, it can self-correct using its reasoning capabilities. According to OpenAI, CUA is already setting new records on the WebArena and WebVoyager automated web browsing benchmarks. In the event of a blockage, the agent does not insist: he simply hands over to the user.

What are the first use cases?

For its launch, Operator is primarily focusing on time-consuming or repetitive use cases. He can fill out forms, order groceries online and even create memes. To get started, simply describe in natural language what you want to accomplish. The agent then takes control of its own browser and executes the task, requesting user approval for important actions.

-

OpenAI has partnered with several web giants: DoorDash (meal delivery), Instacart (grocery delivery), OpenTable (restaurant reservations), Priceline (travel reservations), StubHub (event ticketing) and Uber to optimize experience on different platforms. The objective is twofold: to improve the efficiency of the agent while respecting the standards established by these services. OpenAI is also exploring Operator’s potential in public services. A pilot partnership with the city of Stockton, California, aims to facilitate citizens’ access to municipal services.

Un agent ultra-premium

OpenAI has deployed three levels of protection to regulate its agent. At the first level, Operator is programmed to cede control to the user at critical moments: entering sensitive information such as login credentials or payment data, CAPTCHA resolution, or final validation of an order. At the second level, data protection: users can clear their browsing history with one click and deactivate the use of their data for model training. Finally, OpenAI has implemented security measures against malicious websites that attempt to manipulate the agent via injections of hidden prompts or malicious code. A “monitor model” continuously monitors Operator’s behavior and can pause a task at the slightest suspicion of suspicious activity.

Operator is only accessible to Pro subscribers (the highest level of ChatGPT at $200) connected from the United States, via the dedicated operator.chatgpt.com platform. OpenAI plans to gradually expand access to Plus, Team, and Enterprise subscribers. The company also says it plans to make the CUA model available through its API in the coming weeks, allowing developers to create their own agents capable of interacting with graphical interfaces.

-

--

PREV Neymar Jr’s return approaching? Discussions surrounding his heroic return to Santos FC
NEXT Betway daily bet: DAL-MTL