Google's latest update to its powerful Gemini 1.5 Pro AI model gives it enhanced audio processing capabilities, allowing it to understand speech from uploaded audio files without needing transcripts. The tech giant also announced it is making Gemini 1.5 Pro publicly available for the first time through its Vertex AI platform.
According to the latest blog post, Gemini 1.5 Pro can now listen to audio sources like earnings calls or audio extracted from videos to understand spoken information. This new audio functionality eliminates the previous requirement for written transcripts.
Google revealed the updates during its recent Google Next event, where it was also announced that Gemini 1.5 Pro will be accessible to all through Vertex AI, Google's platform for building AI applications. While previously only available internally, the public preview allows developers and users to leverage the advanced abilities of the 1.5 Pro model.
Interestingly, Gemini 1.5 Pro is said to surpass even Google's most powerful model, Gemini Ultra, in terms of performance. It can comprehend complex instructions seamlessly without any need for fine-tuning. However, the general public still primarily interacts with AI through Gemini's chatbot capabilities rather than direct access to the models.
In addition to Gemini upgrades, Google's image generation model Imagen also received new inpainting and outpainting features. This allows modifying images by adding or removing elements. The company further integrated its digital watermarking SynthID tool across Imagen to identify AI-generated images.
Looking ahead, Google aims to ground Gemini responses using up-to-date Google Search data. However, it will continue avoiding certain topics like the 2024 US election to prevent spreading misinformation. The company was recently criticized too for generating historically inaccurate images.