Gemini 1.5 has just been released, and with it comes a preview for Gemini 1.5 Pro. Gemini Pro is the model that powers the free version of Gemini, meaning that regular consumers will get to use it for free very soon. If you're excited about what that means, we've got a list of all the things that Google says Gemini 1.5 Pro can do that Gemini 1.0 couldn't.

👁 Gemini 1.5 logo that says 'Gemini 1.5'
Gemini 1.5 has massive improvements, but it's not available for everyone yet

Gemini 1.5 is here, and it'll be coming to the Pro model first. Developers and enterprise users can use it now.

1 Gemini 1.5 has a significantly larger context window

A context window is how much an LLM can "see"

Source: Google

Gemini 1.5 has a context window that goes up to 10 million tokens in research, and will have up to 1 million tokens for regular consumers. That larger context window will cost money, but the free version of Gemini 1.5 Pro will still come with a 128K context window. For reference, GPT-4 Turbo has a 128K context window too, and both Gemini Pro now and regular GPT-4 have a context window of 32K. 1 million token is a first of its kind in the industry.

Context windows in artificial intelligence serve as the collective memory that influences the AI's processing. They encapsulate all the inputs necessary for the AI to comprehend a query and formulate an answer. This includes the initial prompt from the user, along with any supplementary context or preceding interactions. The breadth of the context window plays a crucial role in determining the amount of information the model can retain from previous parts of the conversation or text, directly impacting its capability to deliver coherent and pertinent responses.

How much this benefits a user is entirely dependent on how they use an LLM. If you just want to ask basic questions and not do much else, you won't really benefit. If you use an LLM for coding or other things that may have longer responses, a larger context window can really benefit you. Google says to expect higher latencies at higher context windows, but that's to be expected currently.

2 Gemini 1.5 Pro is better at coding than Gemini 1.0 Ultra

It obviously smokes Gemini 1.0 Pro too

Source: Google

If you use LLMs for coding, then you'll be glad to know that Gemini 1.5 Pro does an even better job at coding than Gemini 1.0 Ultra does, let alone Gemini 1.5 Pro. That's according to Google's technical paper anyway, which says the following.

Gemini 1.5 Pro is our best performing model in code to date, surpassing Gemini 1.0 Ultra on Natural2Code, our internal held-out code generation test set made to prevent web-leakage.

For anyone who uses LLMs for coding, this is big. Gemini Advanced is good at programming, but it can always stand to be better, and I still prefer ChatGPT Plus for anything programming related that I do. If it's better at programming than the Ultra model, then that bodes really, really well for anyone who uses it for programming normally.

3 It can analyze significant amounts of data

That includes 100,000 lines of code, according to Google

Source: Google

As a byproduct of that larger context window, Gemini 1.5 Pro can understand more that you give it. While this is assumed given the context window increases, there's no guarantee that the LLM would be capable of responding with the same level of quality to larger inputs as it would be able to with smaller ones. Google has assured people that it's just as capable at responding to larger inputs as it is shorter ones, demonstrating it by asking Gemini for help with a program spanning more than 100,000 lines of code.

With this, Google says that it can reasonably give modifications, suggestions, and help to large amounts of input at once, with the above codebase using more than 800K tokens. That's an absurd number of tokens, and is more than any other LLM can achieve currently. To that end, Google also gave it a 44-minute silent film, asking the AI about specific details in the movie. It was able to respond with answers that were correct.

Gemini 1.5 Pro significantly extends this context length frontier to multiple millions of tokens with almost no degradation in performance, making it possible to process significantly larger inputs. Compared to Claude 2.1 with a 200k token context window, Gemini 1.5 Pro achieves a 100% recall at 200k tokens, surpassing Claude 2.1’s 98%. This 100% recall is maintained up to 530k tokens, and recall is 99.7% at 1M tokens. When increasing from 1M tokens to 10M tokens, the model retains 99.2% recall.

As well, Google also supplied it the entire 402 page transcript of the ground-to-air control with Apollo 11, and it could do the same as well. Being able to parse through a lot of data is a major plus in Google's favor, and will help people who are managing large code bases or trawling through considerable amoiunts of data.

Google explained how it used a Needle In A Haystack evaluation, where a small piece of text containing a particular fact or statement is hidden placed within a long block of text. Gemini 1.5 Pro found it 99% of the time, even in blocks of data that filled the 1 million context window.

4 It can learn in a conversation

Researchers taught it Kalamang, a language spoken by fewer than 200 people

Source: Google

LLMs aren't capable of everything, especially if they weren't trained on data that looks like it represents an answer to a prompt. That's why, for small languages, LLMs aren't all that powerful. You can try to teach an LLM a language, but chances are, you'll either fill its context window or it won't be able to adapt. Researchers at Google taught Gemini 1.5 Pro Kalamang by giving it a grammar manual, a language with fewer than 200 speakers worldwide. They said that the model was able "to translate English to Kalamang at a similar level to a person learning from the same content."

While it's not perfect, this means that Google's LLM will be able to take in more information that you give it that it may not have previously known and apply it to the remainder of the conversation.

5 It should respond faster

That's thanks to the Mixture-of-Experts architecture

Source: Google

LLMs incorporating a Mixture-of-Experts (MoE) architecture process input by routing tokens to specialized neural networks within the system, chosen for their relevance to the task at hand. This architecture allows for a dynamic and efficient approach to answering queries, with the potential for hierarchical structuring where an expert network might itself be an MoE. The selection process involves a routing network that identifies the most suitable expert(s) for each token, resulting in a response generated from the combined expertise of multiple neural networks. Gemini now uses an MoE architecture, which should result in faster answers.

The MoE architecture enhances computational efficiency, particularly beneficial during the model's initial training phase, though it may lead to overfitting during fine-tuning, where the model overly memorizes and replicates training data. Additionally, MoEs can offer faster inference times by activating only a relevant subset of experts for each query, optimizing resource use. However, supporting such sophisticated models necessitates considerable memory resources, as their large number of parameters, often in the billions, requires substantial RAM for effective operation.

While it's not clear how exactly this will benefit Gemini, it should be able to do inference much faster than its predecessor. Mixtral 8x7B is one such LLM that has incorporated MoE to great success, making a model with the power of a 47B model while only requiring the capabilities needed to run a 12.9B model. There are big gains to be had here both in performance and cost savings for Google, which is why I suspect they're using it.

Gemini 1.5 is a massive step for Google

The biggest improvement here is undoubtedly the larger context window, as it enables so many of these improvements. Google even says that Gemini 1.5 Pro outperforms Gemini 1.0 Ultra in a lot of different scenarios, "despite Gemini 1.5 Pro using significantly less training compute and being more efficient to serve." There's a lot to be excited about here, especially if you're a proponent of LLMs or someone who's excited by AI development in general.

When these changes will come to consumers isn't completely clear yet, as developers can already start using Gemini 1.5 Pro today. Gemini 1.5 Ultra is also expected to be in the pipeline, though Google didn't mention anything about it yet. Even still, it's clear that Google wants to be the best AI player in the space.

👁 Gemini vs ChatGPT Plus feature image
Google Gemini Advanced vs ChatGPT Plus: Which is better?

Both services are great and cost the same, but which is better: Gemini Advanced or ChatGPT Plus?