Summary
- Gemini 1.5 Pro increases the potential context window from 32K tokens to 1 million, allowing it to process larger amounts of data and answer questions more effectively.
- The introduction of Mixture of Experts (MoE) in Gemini 1.5 Pro improves computational efficiency during training and offers potential for faster inference times.
- Gemini 1.5 Pro demonstrates significant improvements in logic, understanding multimodal prompts, and logical reasoning, making it more capable in problem-solving tasks.
Google has been hard at work with Gemini since its rebrand, releasing Gemini Advanced powered by Gemini 1.0 Ultra. However, the underlying Gemini Pro model that powered the base version of Gemini was available in Bard for quite a while, but now Google is upgrading it. Gemini 1.5 is here, and it's coming to the Pro model in the free version soon. There are some pretty big architectural changes that Google say will put it on par in terms of performance with 1.0 Ultra.
Google Gemini: What is it, and how does it work?
Google Gemini is the company's answer to generative AI, but how does it work and what can you do with it?
Gemini 1.5 is a massive step forward
It beats GPT-4 Turbo in some key areas
First and foremost, one of the biggest steps that Gemini 1.5 Pro has taken is that it increases the context window from 32K tokens to 1 million tokens. The context window is essentially how much the LLM can "see" at any given time, which is a huge increase. As well, GPT-4 Turbo only has a 128K context window, and Google says that Gemini 1.5 Pro can go up to 10 million in research contexts.
This expanded context window has implications for how much data it can take in at any given time. For example, Google demonstrated how Gemini 1.5 Pro can take in the Apollo 11 Air-to-Ground voice transcript, which is 402 pages, and answer questions about it with ease.
Gemini 1.5 Pro also has another major improvement dubbed Mixture of Experts (MoE), which we've already seen with Mixtral 8x7B. Mixtral employs a MoE architecture to process incoming tokens, directing them to specialized neural networks within the system based on their relevance. The Mixtral 8x7B model features eight such experts. Notably, it's possible to structure these experts in a hierarchical manner, where an expert itself may be another MoE. Upon receiving a prompt, Mixtral 8x7B utilizes a routing network to determine the most suitable expert for each token. In this setup, each token is evaluated by two experts, and the final response is a blend of their outputs.
The MoE approach offers several benefits, particularly in terms of computational efficiency during the initial training phase, although it can be prone to overfitting during the fine-tuning stage. Overfitting occurs when a model becomes too familiar with its training data, leading to a tendency to reproduce it exactly in its responses.
What is Mixtral 8x7B? The open LLM giving GPT-3.5 a run for its money
If you've heard about Mixtral 8x7B but aren't sure what makes it so special, we have all of the details here
Another advantage of MoEs is their potential for faster inference times, as they activate only a subset of experts for each query. However, accommodating a model like Mixtral, with its 47 billion parameters, requires substantial RAM. The model's overall parameter count is 47 billion rather than 56 billion because it shares many parameters across all experts and does not simply multiply the 7 billion parameters of each expert by eight.
While the above explains how MoE works for Mixtral, the same architectural improvements of an MoE will be present in Gemini too, undoubtedly with some other changes brought in by Google, too. Google didn't reveal how many parameters are powering Gemini 1.5 Pro, but we expected that a MoE will still make it significantly more efficient to run.
Reasoning and problem solving improvements
Gemini should be better at logic and understanding
Google demonstrated in its reveal of Gemini 1.5 Pro that it's capable of significantly improved multimodal prompt understanding and logical reasoning. When given a 44-minute silent film, as per the above clip, it can identify plot points and events.
Those improvements aren't just for multi-modal prompts, though. Google says that when given a prompt of over 100,000 lines of code, Gemini can still provide modifications, solutions, and other changes based on prompts given by the user.
Gemini 1.5 Pro will be available soon
It's rolling out to developers first
If you're a developer that's been using Google's AI Studio or Vertex AI, a limited preview will be available for you to try Gemini 1.5 Pro out now. Google says that it can outperform Gemini 1.0 Pro in 87% of its benchmarks and "performs at a broadly similar level" when compared to Gemini 1.0 Ultra. Interestingly as well, Google says that the model can learn in-conversation, without needing any fine-tuning. Its context window being so large is a first of its kind, which enables it to do significantly more than was previously possible with an LLM.
When it reaches its wider release, Google says that Gemini 1.5 Pro will be available with a 128K context window. It also says that it plans to introduce "pricing tiers" starting at the standard 128K context window and going up to 1 million tokens. Early testers are advised to expect higher latency with the higher context window limit, but that there is no cost for it during its testing stage.
For now, normal users who want to give it a try will have to wait. Google isn't saying when it will roll out, aside from "soon," which doesn't really say a whole lot. Given that developers are already able to try it out now, it seems likely that it will be available in the coming weeks or months for most people.
