Tech

OpenAI signs deal to train AI on Reddit data

OpenAI has struck a deal with Reddit to use data from the social news site to train AI models.

In a blog post on OpenAI’s PR site, the company said the Reddit partnership would provide it with access to “real-time, structured, and unique content” – such as posts and replies – from Reddit , allowing its tools and models to “understand better”. and present » this content. Reddit content will be integrated with ChatGPT, OpenAI’s popular conversational AI, and the companies will work together to bring new, unspecified “AI-based features” to Reddit users and moderators.

OpenAI will also become an advertising partner of Reddit.

“Reddit will leverage OpenAI’s AI model platform to bring its powerful vision to life,” OpenAI wrote in the post. “Using LLM, ML, and AI allows Reddit to improve the user experience for everyone.”

OpenAI has several similar licensing agreements with content providers ranging from media libraries to news publishers. But the unusual aspect of this deal is that Sam Altman, CEO of OpenAI, owns an 8.7% stake in Reddit, making him the third largest shareholder, and was formerly a member of the company’s board of directors.

In an attempt to discourage scrutiny, OpenAI says in its press release that while Altman remains a shareholder in Reddit, the partnership “was led by OpenAI’s COO (Brad Lightcap)” and “approved by the board of directors.” “independent administration (of OpenAI).” (I’ll note here that Altman is a member of OpenAI’s board of directors; he bailed on this decision, however, an OpenAI spokesperson told TechCrunch.)

Reddit has made data licensing deals an increasingly central part of its growth strategy as it navigates the market as a public company.

In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data to clients, including Google, worth a total of more than $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-ad revenue, driven primarily by these deals.

Reddit stock rose 11% in extended trading following the OpenAI deal announcement.

“The paradox I see is that as more and more content on the Internet is written by machines, there is a growing preference for content from real people,” Reddit CEO Steve Huffman said. during the company’s March earnings conference call. “And we have almost two decades of authentic conversations.”

Reddit’s platform – which has more than a billion posts and more than 16 billion comments, numbers that grow every day thanks to its hundreds of millions of active users – is a gold mine for advertising companies. Generative AI, whose models learn from sample content, such as text and images, to generate new, similar content.

But the company could face pushback from users concerned about how it monetizes their data.

It’s instructive to look at Stack Overflow, the Q&A forum for software developers, which recently signed a deal with OpenAI to provide data for training the latter’s models. In protest, some users deleted their top-rated answers to community questions. But Stack Overflow restored the deleted posts and banned those users, saying they violated its terms of service.

Reddit has previously expressed displeasure with an attempt to offer Reddit users greater control over their own data.

Vana, a startup built on blockchain, is trying to launch a data “DAO” (Digital Autonomous Organization) to allow Reddit users to pool their data and decide together how this combined data is used (or sold). Reddit banned Vana’s subreddit dedicated to DAO discussions, in a statement to TechCrunch, and accused the company of “exploiting” its data export controls.

techcrunch

Back to top button