OpenAI may be about to release an AI tool that can take control of your PC and perform actions on your behalf.
Tibor Blaho, a software engineer known for accurately disclosing upcoming AI products, claims to have discovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is considered an “agent” system capable of autonomously handling tasks such as writing code and booking travel.
According to The Information, OpenAI is targeting January as Operator’s release month. The code discovered by Blaho this weekend adds credence to this information.
OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to set shortcuts to “Toggle Operator” and “Force Quit Operator,” according to Blaho. And OpenAI has added references to Operator on its website, Blaho said – although references that are not yet publicly visible.
Confirmed – ChatGPT macOS desktop app has hidden options to set desktop launcher shortcuts to “Switch Carrier” and “Force Quit Carrier” https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS
– Tibor Blaho (@btibor91) January 19, 2025
According to Blaho, OpenAI’s site also contains not-yet-public tables comparing Operator’s performance to other computer-based AI systems. Tables could very well be placeholders. But if the numbers are accurate, they suggest that Operator is not 100% reliable, depending on the task.
The OpenAI website already contains references to Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”
Including comparison with Claude 3.5 Using Sonnet computer, Google Mariner, etc.
(overview of tables… pic.twitter.com/OOBgC3ddkU
– Tibor Blaho (@btibor91) January 20, 2025
On OSWorld, a benchmark that attempts to mimic a real-world computing environment, “OpenAI Computer Use Agent (CUA)” – perhaps the AI model that powers Operator – scores 38.1%, ahead of the model from computer control of Anthropic but well below the 72.4% of humans. score. OpenAI CUA outperforms human performance on WebVoyager, which evaluates an AI’s ability to navigate and interact with websites. But the model falls far short of the human-level scores of another web-based benchmark, WebArena, according to leaked benchmarks.
The operator also struggles to perform tasks that a human could easily perform, if the leak is to be believed. In a test that asked Operator to register with a cloud provider and launch a virtual machine, Operator only succeeded 60% of the time. Tasked with creating a Bitcoin wallet, the operator was only successful 10% of the time.
We have reached out to OpenAI for comment and will update this article if we receive a response.
OpenAI’s imminent entry into the AI agent space comes as competitors including Anthropic, Google and others play for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. The AI agent market could be worth $47.1 billion by 2030, according to analytics firm Markets and Markets.
Today’s agents are rather primitive. But some experts have expressed concerns about their safety, should the technology improve rapidly.
One of the leaked charts shows the operator performing well on some security assessments, including tests that attempt to trick the system into performing “illicit activities” and searching for “sensitive personal data.” It appears that security testing is one of the reasons for Operator’s long development cycle. In a recent article on X, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent who he said lacked security measures.
“I can only imagine the backlash if OpenAI made a similar release,” Zaremba wrote.
It is worth noting that OpenAI has been criticized by AI researchers, including former staff members, for allegedly downplaying security work in favor of rapid production of its technology.
Lauren Sanchez, the fiancée of Amazon founder Jeff Bezos, was criticized for what social media…
A few years ago, Madden created an “Ultimate Legends” player Aaron Glenn with a score…
Islamabad, Pakistan — A prisoner exchange between the United States and The Taliban regime in…
Cameron Diaz recalled phone cameras that appeared in 2003
In Switzerland, trains can have up to 255 axles. They are also allowed to have…
U.S. Treasury yields were lower on Tuesday as investors digested President Donald Trump's return to…