Google's Gemini Can Now Hear You

Spread the love

Google rebranded its bot, Bard, as Gemini, and its most recent version, Gemini 1.5 Pro, was made accessible to select developers in February of this year. Text, code, video, and (now) uploaded audio streams—including audio from videos—may all be processed by Gemini 1.5 Pro. It can also be used to listen to, analyze, and extract information from these media without requiring a written transcript.

In practical terms, this implies that customers can use Gemini 1.5 Pro to collect data from earnings calls, record interviews, or use it to evaluate audio-visual content—basically, any audio file. The AI can process an hour of video, eleven hours of audio, 30,000 lines of code, or more than 700,000 words in a single stream of prompts.

With a new File API that makes handling files easier and the first-ever native audio (speech) understanding capability, Gemini 1.5 Pro is now accessible in public preview in more than 180 countries through the Gemini API.

This upgrade also brought with it new capabilities like JSON mode and system instructions, which allow developers to have more control over the model’s output.

For individuals who have access to Vertex AI, Google is also providing Gemini 1.5 Pro as a public preview; however, a public beta test is not currently planned. The majority of consumers currently interact with Google AI via the Gemini chatbot.

Spread the love

Related Posts