Silicon Valley is full of optimism on AI agents.
In terms of basic, technology can solve problems, perform tasks and develop more intelligently as it learns from its environment. Agents are like a virtual assistant, which most workers dream of having. They already use them to reserve flights, collect data, sum up reports and even make decisions.
But the agents are far from perfect, not only errors and hallucinations are still commonplace, but they worsen more they are used.
Companies now use agents to automate the tasks developed and in several stages. New tools have emerged to make this possible. Regie AI uses “automatic driver sales agents” to automatically find prospects, write personalized emails and follow up with buyers. Cognition AI made an agent called Devin who performs complex engineering tasks. The Big Four Pwc Professional Service Company has unveiled “OS Agent” a platform that allows agents to communicate more easily to perform tasks.
But the more an agent takes steps to perform a task, the more likely his error rate – the percentage of incorrect outputs – will have an impact on the result. A few Agent processes can have up to 100 or more steps, according to Patronus ai, a startup that helps companies assess and optimize AI technology.
Patronus ai measured the risk and loss of income caused by the errors of AI agents. His conclusions confirm a familiar truth – with great power is accompanied by great responsibility.
“An error at any stage can derail the whole task. The more steps involved there are, the more something is going at the end,” wrote the company on its blog. He built a statistical model which found that an agent with an error rate of 1% per step can be made up of 63% of error in the 100th step.
The lead in the growth of Scalea Quintin said that error rates are much higher in the wild.
“Currently, whenever AI performs an action, there is about 20% of error (this is how LLM operates, we cannot expect 100% precision),” he wrote in an article on LinkedIn last year. “If an agent has to perform 5 shares to complete a task, there are only 32% chance that each step is correct.”
Deepmind CEO, Demis Hassabis, said during a recent event to think of an error rate like “a compound interest”, according to Computer Weekly. As it works through the 5,000 steps it needs to perform a real world task, the probability that it is correct could be random.
“In the real world, you have no perfect information,” said Hassabis during the event, according to Computer Weekly. “There is hidden information that we do not know, so we need AI models capable of understanding the world around us.”
The greatest probability of failure of AI agents means that companies risk losing their end customers.
The good news is that railing – filters, rules and tools that can be used to identify and delete inaccurate content – can help reduce error rates. Small improvements “can produce disproportionate reductions in the probability of error,” said Patronus ai in his post.
The CEO of Patronus AI, Anand Kannappan, told Bi that railing can be as simple as additional checks to ensure that agents do not fail while they work. They can “prevent the agent from continuing or asking the agent to try again,” he said.
“This is why it is so important to measure the performances with care and holistic way,” said Douwe Kiela, advisor to patronus ai and co -founder of the contextual AI, in Bi In a LinkedIn message.
businessinsider