Google has announced its latest AI model, Gemini, which was designed from the ground up to be multimodal, so it can interpret information in a variety of formats – text, code, audio, image and video.
According to the company, the typical approach to creating a multimodal model involves training components for different formats of information separately and then combining them together. What sets Gemini apart is that it is trained for different formats from the start and then refined with additional multimodal data.
“This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain,” Sundar Pichai, CEO of Google and Alphabet, and Demis Hassabis, CEO and co-founder of Google DeepMind, wrote in a blog post.
Google also explained that the new model has quite sophisticated reasoning capabilities that allow it to understand complex written and visual information, making it “adept at discovering knowledge that can be difficult to discern among vast amounts of data.”
For example, he can read hundreds of thousands of documents and extract this information to lead to new discoveries in certain fields.
Its multimodal nature makes it particularly suited to understanding and answering questions in complex fields such as mathematics and physics.
Gemini 1.0 offers three distinct versions to cater to various size preferences: Ultra, Pro, and Nano, listed in descending order of size.
According to Google’s initial benchmarking of Gemini, the Ultra version has demonstrated superior performance, outperforming 30 out of the 32 widely used academic benchmarks in model development and research. Notably, Gemini Ultra has achieved a milestone by surpassing human expert performance in massive multitask language understanding (MMLU), encompassing 57 subjects such as math, physics, history, law, medicine, and ethics.
The integration of Gemini Pro into Bard marks a significant milestone, constituting the most substantial update to Bard since its initial launch. The Pixel 8 Pro now leverages Gemini Nano to enhance functionalities like Summarize in the Recorder app and Smart Reply in Google’s keyboard.
Over the coming months, Gemini is set to extend its presence to additional Google products, including Search, Ads, Chrome, and Duet AI.
Starting from December 13, developers can access Gemini Pro through the Gemini API available in Google AI Studio or Google Cloud Vortex AI.