On Wednesday, Google announced significant updates to its suite of first-party media-generating AI models on the Vertex AI cloud platform. These updates include the preview launch of Lyria, Google’s innovative text-to-music model, along with enhancements to the Veo 2 video-creation model. Additionally, Google has introduced a cutting-edge voice-cloning feature powered by Chirp 3, their advanced audio understanding model, aimed at “allow-listed” users. Furthermore, the Imagen 3 image generator now boasts what Google describes as “significantly” improved performance, making it an exciting time for generative AI enthusiasts.
Google is positioning Lyria as a compelling alternative to traditional royalty-free music libraries. This model allows customers to create unique songs across a variety of styles and genres, ranging from jazzy piano solos to atmospheric lo-fi tracks. By making Lyria available in preview for select customers, Google aims to tap into the growing demand for customizable music solutions in various industries, including film and advertising.
Chirp 3 is another focal point of Google’s generative AI updates. This advanced model is capable of synthesizing speech in approximately 35 languages. First previewed earlier this year, the Instant Custom Voice feature allows users to clone a voice with just 10 seconds of audio input. Now generally available, Chirp 3 also powers a new tool called Transcription with Diarization, which effectively separates and identifies speakers in recordings that feature multiple participants. To ensure ethical use, the Instant Custom Voice feature undergoes a “diligence” process to verify that users have the proper voice usage permissions.
The Veo 2 model has received substantial enhancements, including the ability to remove background images, logos, and objects from existing videos. It also allows users to extend video frames, such as converting landscape footage into portrait format. With new features that enable adjustments to camera angles and pacing, Veo 2 can create time lapses, drone-style clips, and more. Currently, these features are available in preview, signaling Google's commitment to refining video content creation tools for various applications.
As part of the recent updates, Google has unveiled enhancements to the Imagen 3 image generator. The latest improvements increase the model's ability to remove unwanted objects and reconstruct missing or damaged parts of images. This advancement showcases Google's dedication to providing robust tools for image generation, catering to the needs of content creators and marketers alike.
Google emphasizes that all media generated by Imagen, Veo, and Lyria (excluding Chirp) are watermarked using SynthID technology. This is part of the company's broader commitment to ensuring that all generative AI models have “built-in safeguards” to prevent the creation of harmful content. Despite ongoing discussions regarding the training data used for these models, Google maintains its policy of not disclosing specific datasets, a practice rooted in the complexities of intellectual property rights.
As Google continues to innovate within the generative AI landscape, these updates mark a strategic move to strengthen its position against competitors like Amazon, which offers a similar cloud AI platform called Bedrock. With tools like Lyria, Chirp 3, Veo 2, and Imagen 3, Google is poised to attract enterprise clients seeking advanced media generation capabilities. The introduction of these features highlights the company’s ongoing commitment to enhancing creativity and productivity through cutting-edge technology.