Business

YouTubers sue OpenAI for transcribing videos

OpenAI has been sued by a YouTuber whose videos were transcribed and used to train its artificial intelligence system, opening a new front in the legal battle against companies at the forefront of developing the technology.

With this lawsuit, YouTube creators join a larger dispute over the unauthorized use of copyrighted material to power ChatGPT. Creators who have filed lawsuits against the AI ​​companies include artists, authors, news publishers, and record labels.

The complaint filed Friday by David Millette in federal court in San Francisco is based on a report by The New York Times published last April about OpenAI’s creation of a speech recognition system called Whisper. Faced with a supply problem in late 2021 after exhausting nearly all the internet’s text reservoirs, the company led by Sam Altman reportedly built the tool to transcribe audio from YouTube videos, with the goal of training the next version of GPT.

According to the complaint, OpenAI used Whisper to transcribe more than a million hours of YouTube video in violation of its terms of service, which prohibit users from using its content for “independent” applications and from accessing services through “automated means (such as robots, botnets, or scrapers).” Greg Brockman, the company’s chairman and one of 11 co-founders (who has taken a sabbatical), is listed as one of Whisper’s creators in a research paper.

“OpenAI’s language model datasets include transcripts of videos taken directly from YouTube, as these video transcripts constitute one of the largest corpora of natural language data available for training and fine-tuning OpenAI’s language models,” the complaint states.

Some Google employees knew that OpenAI was harvesting YouTube videos for training data but failed to take action since the Alphabet-owned company was doing the same to develop its own AI system, the report said. Times If Google were to accuse OpenAI of violating the copyright of YouTube creators, it could face similar backlash, the report said, citing people familiar with the situation.

It should be noted that Millette has not filed a copyright infringement claim and is only alleging unjust enrichment and unfair competition for using video transcripts without consent or compensation. He is seeking at least $5 million and a court order barring OpenAI from further use of his content.

On July 30, a federal judge overseeing a lawsuit brought by prominent authors against OpenAI dismissed a complaint accusing the company of violating California’s unfair competition law, the same claim that Millette had brought. U.S. District Judge Araceli Martínez-Olguín ruled that federal law bars the claim because it involves material “within the subject matter of the copyright,” though she based part of her reasoning on the fact that it overlaps with a claim for direct copyright infringement, which was not alleged in the class action lawsuit representing the YouTubers.

Back to top button