Tech

Google Veo, a major advance in AI-generated video, debuts at Google I/O 2024

Google is seeking OpenAI’s Sora with Veo, an AI model capable of creating 1080p video clips of about a minute long from a text prompt.

Unveiled Tuesday at Google’s I/O 2024 developer conference, Veo can capture different visual and cinematic styles, including landscape shots and time-lapses, and make edits and adjustments to already-generated footage.

“We’re exploring features like storyboarding and generating longer scenes to see what Veo can do,” Demis Hassabis, head of Google DeepMind’s AI research and development lab, told reporters during a roundtable discussion. Virtual. “We’ve made incredible progress in video.”

Veo
Image credits: Google

Veo builds on Google’s early commercial work in video generation, previewed in April, which leveraged the company’s Imagen 2 family of image generation models to create looping video clips.

But unlike the Imagen 2-based tool, which could only create low-resolution videos of a few seconds, Veo appears to be competitive with today’s leading video generation models – not only Sora, but also models from startups like Pika , Runway and Irreverent. Laboratories.

During a briefing, Douglas Eck, who leads research efforts at DeepMind on generative media, showed me some hand-picked examples of what Veo can do. One in particular – an aerial view of a busy beach – demonstrated Veo’s strengths over competing video models, he said.

“The detail of all the swimmers on the beach proved difficult for image and video generation models – with so many moving figures,” he said. “If you look closely, the surfing is pretty good. And the meaning of the word “lively,” I would say, is captured by all the people – the bustling waterfront filled with sunbathers.

Veo
Image credits: Google

Veo was trained on many images. This is generally how it works with generative AI models: example after example of a certain form of data, the models detect patterns in the data that allow them to generate new data – videos, in the case by Veo.

Where do the images to train Veo come from? Eck wouldn’t say specifically, but he admitted that some might come from Google’s own YouTube.

“Google models may be trained on certain YouTube content, but always in accordance with our agreement with YouTube creators,” he said.

The “agreement” part can technically Be honest. But it’s also true that, given YouTube’s network effects, creators have no choice but to play by Google’s rules if they hope to reach the widest possible audience.

Veo
Image credits: Google

A New York Times article from April revealed that Google expanded its terms of service last year, in part to allow the company to leverage more data to train its AI models. Under the old ToS, it was unclear whether Google could use YouTube data to build products beyond the video platform. This is not the case with the new conditions, which loosen the reins considerably.

Google is far from the only tech giant leveraging large amounts of user data to train models internally. (See: Meta.) But what’s sure to disappoint some creators is Eck’s insistence that Google is setting the “gold standard” here, in terms of ethics.

“The solution to this (training data) challenge will be found by bringing all stakeholders together to determine what the next steps are,” he said. “Until we take these steps with stakeholders – we’re talking about the film industry, the music industry, the artists themselves – we won’t act quickly.”

However, Google has already made Veo available to some creators, including Donald Glover (AKA Childish Gambino) and his creative agency Gilga. (Like OpenAI with Sora, Google positions Veo as a tool for creatives.)

Eck noted that Google provides tools for webmasters to prevent the company’s bots from scraping training data from their websites. But the settings don’t apply to YouTube. And Google, unlike some of its competitors, doesn’t offer a mechanism for creators to remove their work from its training datasets after scraping.

I also asked Eck about regurgitation, which in the context of generative AI refers to when a model generates a mirror copy of a training example. Tools like Midjourney have been found to spit out exact stills from films like “Dune,” “Avengers” and “Star Wars” and provide a timestamp, creating a potential legal minefield for users. OpenAI reportedly went so far as to block trademarks and creator names to encourage Sora to attempt to circumvent copyright challenges.

So what steps has Google taken to mitigate the risk of regurgitation with Veo? Eck didn’t have an answer, except that the research team implemented filters for violent and explicit content (so no porn) and uses DeepMind’s SynthID technology to mark Veo’s videos as generated by AI.

Veo
Image credits: Google

“We are going to strive – for something as important as the Veo model – to gradually release it to a small set of stakeholders with whom we can work very closely to understand the implications of the model, and then only spread it. to a wider group,” he said.

Eck had more to share about the technical details of the model.

Eck described Veo as “quite controllable” in the sense that the model understands camera movements and visual effects from prompts quite well (think descriptors like “pan”, “zoom” and “explosion”) ). And, like Sora, Veo has a somewhat mastery of physics – things like fluid dynamics and gravity – which contribute to the realism of the videos she generates.

Veo also supports hidden editing for changes to specific areas of a video and can generate videos from a still image, similar to generative models like Stability AI’s Stable Video. Perhaps most intriguingly, given a sequence of prompts that together tell a story, Veo can generate longer videos – videos longer than a minute.

Veo
Image credits: Google

This is not to say that Veo is perfect. Reflecting the limitations of current generative AI, objects in Veo’s videos disappear and reappear without much explanation or consistency. And Veo often gets its physics wrong – for example, cars will reverse in inexplicable and impossible ways at the drop of a hat.

That’s why Veo will remain on a waitlist on Google Labs, the company’s portal for experimental technology, for the foreseeable future, within a new front-end for AI generative video creation and editing called VideoFX. As it improves, Google aims to bring some of the model’s capabilities to YouTube Shorts and other products.

“This is a work in progress, very experimental… there’s still a lot more to do than what’s been done here,” Eck said. “But I think it’s kind of the raw material for doing something really great in film.”

We are launching a newsletter on AI! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch

techcrunch

Back to top button