Google advances AI visual generation with Veo 2, Imagen 3, and Whisk

Google DeepMind has unveiled the next generation of its artificial intelligence (AI) video generation model, Veo 2, positioning it as a direct competitor to OpenAI's Sora. The company has simultaneously upgraded its image generation model, Imagen 3, and introduced Whisk, a new AI-powered image blending tool.

According to reports from TechCrunch and VentureBeat, Veo 2 outperforms Sora in both resolution and video length, offering up to 4K resolution with lengths exceeding two minutes, compared to Sora's 1080p resolution and 20-second limit.

The new model features enhanced control over camera positioning and movement, enabling the capture of objects and people from various angles. It also demonstrates improved capabilities in modeling motion, fluid dynamics, and light properties, resulting in more physically accurate representations.

Eli Collins, vice president of product management at DeepMind, explained that Veo 2 is trained on high-quality videos and textual descriptions. The model incorporates Google's SynthID watermarking technology to identify AI-generated content and mitigate the risk of deepfakes.

The platform is currently available for limited user testing on Google Labs' VideoFX through waitlist registration, with initial outputs restricted to 720p resolution and 8-second durations. The company plans to launch the full version of Vertex AI at a later date.

DeepMind acknowledged that while Veo 2 generates fewer hallucinations, issues such as extra fingers or unexpected objects can still occur. The team recognizes the need for improvement in coherence and consistency and is actively gathering feedback from artists and producers.

Early adoption has begun among YouTube creators, who are using VideoFX to create backgrounds for YouTube Shorts, streamlining their content creation process.

Imagen 3 advances Alongside Veo 2, DeepMind introduced upgrades to Imagen 3, now available on Google Labs' ImageFX. The enhanced version supports multiple artistic styles, including impressionism, anime, and photorealism.

The company reports that Imagen 3 demonstrates improved prompt interpretation, producing images that more closely align with user intent while featuring richer details and textures.

Whisk debuts Google's newest experimental tool, Whisk, which allows users to combine text inputs or uploaded images to specify a subject, scene, and style. Users can further refine the generated work through additional textual descriptions, with the system blending all three elements into a unique creation.

The interface features a dice icon that generates random images for subject, scene, and style components.

Whisk leverages both Gemini and Imagen 3 technologies, with Gemini handling textual descriptions of uploaded images and Imagen 3 processing these descriptions to recombine subjects, scenes, and styles.