Summary

  • Project Astra: Google's response to OpenAI's GPT-4o model at Google I/O.
  • Gemini's multi-modal capabilities let it analyze room, context with low-latency agents.
  • Google demonstrates Gemini at I/O, showcasing its ability to understand and relate multiple inputs.

Google I/O is well underway, and AI is undoubtedly the star of the show. OpenAI made a statement yesterday by announcing its new GPT-4o model a day before Google I/O, but Google was ready for it. Announced at this year's event, Google showed off its own competitor to the multi-modal model revealed by OpenAI yesterday, and Google is calling it Project Astra.

It's unclear if Project Astra will come to other devices at present, but it's clear that it's at least coming to Google Pixel smartphones later this year through the Gemini app.

👁 ChatGPT, Copilot, and Gemini logo on a background with a weave
How to find out if you should pay for Gemini Advanced, ChatGPT Plus, or Copilot Pro

Every company serious about AI has a subscription offering. Most people would probably do just fine with the free versions, though.

Project Astra is Google's answer to GPT-4o

It's all thanks to low-latency multiple agents

In a demonstration shared on X (formerly Twitter), Google seemingly responded to OpenAI by showing how Google's Gemini could analyze the room and make a guess as to what was going on, just like what OpenAI demonstrated with ChatGPT. Now the company has shown it off at Google I/O, and it appears every bit as good as its OpenAI counterpart.

Gemini has always been built to be multi-modal, and Google says this is an extension of that initial development principle. It understands multiple inputs and can join them all and relate them, with the longer context window also giving Google the ability to give the model a broader understanding of a situation, topics, and more.

Google demonstrated how it could identify a neighborhood, help a user understand code, and could even come up with a band name for a tiger and a dog in one frame. Google said this will be coming to smartphones later this year through the Gemini app, which would include Google Pixel smartphones. Google also demonstrated it picking out someone's glasses as they walked around.

Getting an AI to respond conversationally and quickly with a video feed requires a lot of engineering prowess on Google's end. The company described at its opening keynote how it continuously encodes video frames and speech input into a timeline of events and caches it for efficient recall. It was shown to work through an app with a viewfinder on a user's smartphone, understanding the context of its usage and the world around it.

It's not clear when exactly you'll get to play around with Project Astra, but it's clear the company wants to get it in the hands of consumers soon. We're excited to see where it goes, and the AI race is really heating up.

👁 GPT-4o introduced welcome banner with ChatGPT
We tried out GPT-4o, and it's so much faster than GPT-4

If you're curious how much faster GPT-4o is, the difference between it and GPT-4 is staggering.