Tech

The intense battle to stop AI robots from taking over the internet

Really support
independent journalism

Our mission is to provide unbiased, fact-based reporting that holds governments accountable and reveals the truth.

Whether it’s $5 or $50, every contribution counts.

Support us to deliver agenda-free journalism.

The intense battle to stop AI robots from taking over the internet

A number of companies have taken significant steps to prevent scrapers from attempting to harvest their text.

It’s the latest front in an ongoing and seemingly escalating battle between websites that let people read text and the AI ​​companies that want to use it to create their new tools.

The rise of artificial intelligence has many companies looking to train new, smarter AI technologies. But the large language model systems that underpin many of them, like ChatGPT, require vast amounts of text to train.

This has led some companies to scrape text from the web to feed into these systems for this training. This in turn has led to frustration among owners of text-based websites, who claim not only that companies don’t have permission to use their data, but also that it slows down the performance of the internet.

Elon Musk, for example, has repeatedly suggested that X, formerly Twitter, receives a huge amount of traffic from these scraping systems. X is one of several sites that have introduced strict “rate-limiting” rules, which attempt to prevent bots from reloading its site too often – though some have suggested that this has also been used to mask problems with X’s apparently struggling website.

Last week, Reddit introduced a series of changes aimed at preventing bots from scraping its website. The company also announced that it would use rate limiting, block unknown bots, and ask such systems to stay away from its website.

The organization stressed that these rules could potentially limit other automated systems important for transparency, such as the Internet Archive, which saves web pages for later access. But it insisted that tools important to researchers would still have access to Reddit.

“Anyone accessing Reddit content must comply with our policies, including those in place to protect Redditors. We are selective about who we work with and who we entrust with broad access to Reddit content,” she said when introducing the new rules.

Some companies have made deals to allow AI companies to access their data or that of their users. OpenAI and Google, for example, have signed deals with Reddit to use their users’ posts to train their artificial intelligence systems.

Others have taken legal action. New York Times sued OpenAI and Microsoft over its artificial intelligence systems, arguing that it violated the journal’s copyright by using its papers to train them.

Cloudflare, the internet infrastructure company, has now launched a range of similar tools and told its customers it was a way of declaring their “independence.” All Cloudflare customers will have an “easy button” to “block all AI bots,” it said.

Last year, Cloudflare introduced a change to block AI bots that “behave well.” Despite the fact that the system is intended for bots that play by the rules, Cloudflare customers “overwhelmingly” choose to block them, he said.

The company has now introduced a feature that will forcefully block all known bots. It will scan for scrapers’ fingerprints and prevent them from visiting websites, it said.

News Source : www.independent.co.uk
Gn tech

Back to top button