VOOZH about

URL: https://www.analyticsvidhya.com/blog/2024/05/gpt-4o-vs-gemini/

⇱ GPT-4o vs Gemini: Comparing Two Powerful Multimodal AI Models


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

GPT-4o vs Gemini: Comparing Two Powerful Multimodal AI Models

Himanshi Singh Last Updated : 15 May, 2024
7 min read

Introduction

With the release of GPT-4o, this model is getting huge attention for its multimodal capabilities. GPT-4o is known for its advanced language processing skills and has been enhanced to interpret and generate visual content.  However, we shouldn’t overlook Gemini, a model that has been highly praised for its multimodal abilities long before GPT-4o arrived. Gemini excels at combining image recognition with strong language understanding, making it a worthy competitor to GPT-4o.

In this article, we’ll compare GPT-4o vs Gemini by examining how well they perform in different tasks. By looking at their performance, we aim to determine which model is superior. This comparison is crucial because the ability to work with both text and images is valuable for many applications, such as automatic content creation and data analysis.

GPT -4o vs Gemini

Let’s compare GPT-4o and Gemini to see which one performs better.

Calculate Sum of Numbers

A very basic thing for a multimodal LLM would be to correctly identify the text/numbers written in a given image. So here we’ve provided an image with certain text and have asked our 2 competitors, GPT-4o and Gemini, to calculate the sum of the given numbers in the image. Let’s see who’s going to win this round.

GPT -4o

Here you can see, GPT-4o provided the correct output. Easy peasy.

Gemini

Not sure what Gemini understood with the given prompt. Although it was pretty simple, Gemini couldn’t understand the context.

Result

GPT-4o won!

Code Game Provided in the Image Attached in Python

In this round, we’ve provided an image of a tic-tac-toe game but we’ve not mentioned specifically in the prompt. The task for our models is to first identify the game and then write a python code to implement it. 

GPT-4o

def print_board(board):
for row in board:
     print(" | ".join(row))
     print("-" * 9)
def check_winner(board):
# Check rows
for row in board:
     if row[0] == row[1] == row[2] != " ":
         return row[0]
# Check columns
for col in range(3):
     if board[0][col] == board[1][col] == board[2][col] != " ":
         return board[0][col]
# Check diagonals
if board[0][0] == board[1][1] == board[2][2] != " ":
     return board[0][0]
if board[0][2] == board[1][1] == board[2][0] != " ":
     return board[0][2]
return None
def is_board_full(board):
for row in board:
     if " " in row:
         return False
return True
def main():
board = [
     [" ", " ", "O"],
     ["O", "X", " "],
     [" ", " ", " "]
]
]
current_player = "X"  # X's turn to play
print("Initial Board:")
print_board(board)
while True:
     row = int(input(f"Player {current_player}, enter the row (0, 1, 2): "))
     col = int(input(f"Player {current_player}, enter the column (0, 1, 2): "))
     if board[row][col] == " ":
         board[row][col] = current_player
     else:
         print("Cell already taken! Try again.")
         continue
     print_board(board)
     winner = check_winner(board)
     if winner:
         print(f"Player {winner} wins!")
         break
     if is_board_full(board):
         print("It's a draw!")
         break
     current_player = "X" if current_player == "O" else "O"
if __name__ == "__main__":
main()

Output:

The code provided by GPT-4o gave this above output. Great job in identifying the game correctly and creating a grid as well. Although one of the β€œo” is misplaced in the given output but otherwise we’ve received a fully functional tic-tac-toe game. 

Gemini

def display_board(board):
  """Displays the current board state"""
  for row in board:
for cell in row:
   print(cell, end=" ")
print()
def is_valid_move(board, row, col)
  """Checks if a move is valid (i.e. the cell is empty)"""
  return board[row][col] == " "
def make_move(board, player, row, col):
  """Places a player's mark on the board"""
  board[row][col] = player
def is_winner(board, player):
  """Checks if a player has won"""
  # Check rows
  for row in board:
if all(cell == player for cell in row):
   return True
  # Check columns
  for col in range(len(board[0])):
if all(board[row][col] == player for row in range(len(board))):
   return True
  # Check diagonals
  if all(board[row][row] == player for row in range(len(board))):
return True
  if all(board[row][len(board) - row - 1] == player for row in range(len(board))):
return True
  return False
def is_board_full(board):
  """Checks if the board is full"""
  for row in board:
for cell in row:
   if cell == " ":
     return False
  return True
def main():
  """Main game loop"""
  board = [[" " for _ in range(3)] for _ in range(3)]
  current_player = "X"
  while True:
display_board(board)
# Get player move
while True:
   row = int(input("Player {} enter row (1-3): ".format(current_player))) - 1
   col = int(input("Player {} enter column (1-3): ".format(current_player))) - 1
   if is_valid_move(board, row, col):
     break
   else:
     print("Invalid move. Please try again.")
# Make move and check for winner
make_move(board, current_player, row, col)
if is_winner(board, current_player):
   display_board(board)
   print("Player {} wins!".format(current_player))
   break
# Check for tie
if is_board_full(board):
   display_board(board)
   print("It's a tie!")
   break
# Switch player
current_player = "O" if current_player == "X" else "X"
if __name__ == "__main__":
  main()

Output:

Now, when we run the code provided by Gemini, then it identified the game clearly but as you can see no we did not get any grid. This made it difficult to play the game. Below you can check how the game looks without the grid.

Result

GPT-4o won!

Generate Python Code to Recreate Bar Chart using Matplotlib

In this round we’ve given an image of a bar chart to both our models. They need to analyze the chart and generate the Python code required to recreate it using Matplotlib. Additionally, they should ensure that the code produces the same bar chart when run. Let’s see which model wins this round. 

GPT-4o

import matplotlib.pyplot as plt
import numpy as np
# Data
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
sales_A = [10, 15, 17, 12]
sales_B = [10, 15, 20, 15]
# Plotting
fig, ax = plt.subplots()
bar_width = 0.35
index = np.arange(len(quarters))
bar1 = ax.bar(index, sales_A, bar_width, label='A')
bar2 = ax.bar(index, sales_B, bar_width, bottom=sales_A, label='B')
ax.set_xlabel('Quarter')
ax.set_ylabel('Sales')
ax.set_title('Sales by Product & Quarter')
ax.set_xticks(index)
ax.set_xticklabels(quarters)
ax.legend()
plt.show()

Output:

Gemini

import matplotlib.pyplot as plt
# Data
products = ["A", "B"]
sales_q1 = [30, 25]
sales_q2 = [40, 35]
sales_q3 = [20, 30]
sales_q4 = [10, 15]
# Create a bar chart
fig, ax = plt.subplots()
width = 0.35  # the width of the bars
index = range(len(products))  # the x-axis positions for the bars
bar1 = plt.bar(index, sales_q1, width, label='Q1')
bar2 = plt.bar([p + width for p in index], sales_q2, width, label='Q2')
bar3 = plt.bar([p + width * 2 for p in index], sales_q3, width, label='Q3')
bar4 = plt.bar([p + width * 3 for p in index], sales_q4, width, label='Q4')
# Add labels and title
plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales by Product & Quarter')
# X-axis labels
plt.xticks([p + 1.5 * width for p in index], products)
# Legend
plt.legend()
plt.show()

Output:

Result

GPT-4o won!

Explain Code and Provide the Output

Here we’ve given an image input to both our models and they have to understand the code written in the provided screenshot and additionally provide the output for the same. Let’s see how they perform on this test.

GPT-4o

Provided a very long summary but here’s the summary and output:

Gemini

Got the below explanation but no output for the code.

Point goes to GPT-4o for properly understanding the prompt and providing correct output as well.

Result

GPT-4o won!

Identify Buttons and Input Fields in the Given Design

In this prompt the models were asked for a detailed analysis of a user interface (UI) design to locate and describe interactive elements such as buttons and input fields. The goal is to specify what each element is, its purpose, and any associated labels or features.

GPT-4o

Impressive how accurately GPT-4o can identify items in a design with a clear understanding of each button, checkbox and textbox. 

Gemini

Gemini got the input fields correct but there was some uncertainty in the submit button which was square in shape. 

Result

GPT-4o won!

GPT -4o vs Gemini: Final Verdict

TasksWinner
Calculating the sum of numbers from an image.GPT-4o
Writing Python code for a tic-tac-toe game based on an image.GPT-4o
Creating Python code to recreate a bar chart from an image.GPT-4o
Explaining code from a screenshot and providing the output.GPT-4o
Identifying buttons and input fields in a user interface design.GPT-4o

In this head-to-head comparison, GPT-4o clearly outperformed Gemini. GPT-4o consistently provided accurate and detailed results across all tasks, from calculating sums and coding games to generating bar charts and analyzing UI designs. It showed a strong ability to understand and process both text and images effectively.

Gemini, on the other hand, struggled in several areas. While it performed adequately in some tasks, it often failed to provide detailed explanations or accurate coding. Its performance was inconsistent, highlighting its limitations compared to GPT-4o.

Overall, GPT-4o proved to be the more reliable and versatile model. Its superior performance across multiple tasks makes it the clear winner in this comparison. If you need a model that can handle both text and images with high accuracy, GPT-4o is the better choice. In this article we explored GPT-4o vs Gemini.

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together. 

Thanks for stopping by my profile - hope you found something you liked :)

Login to continue reading and enjoy expert-curated content.

Free Courses

AWS Data Querying with S3 & Athena

Master AWS data storage & querying with S3, Athena, Glue, RDS, and Redshift.

Foundations of LangGraph

Build reliable AI workflows using LangGraph state, memory, & agent

Claude 4.5: Smarter, Faster & More Human AI

Build real-world AI workflow with Claude 4.5 Opus using smart, human-like AI

NotebookLM Essentials to Pro: The Complete Practical Guide

Your complete NotebookLM guide to faster learning, smarter research, and pow

Gemini 3: The AI That Thinks, Sees and Creates

Learn Gemini 3 through hands on demos, real apps, and multimodal AI projects

Responses From Readers

Caio Firme

Thanks for this comparative analysis. I have just one question: did you use Gemini Ultra 1.0 or Gemini 1.0 Pro in your analysis?

123 1
Himanshi Singh

Hi, this was Gemini Pro since it is free to use and GPT-4o is also available with limited use for free users.

123 456
partial_andy405

Thanks for the article. It was very interesting. I do think it would be useful to compare GPT-4.0 with Gemini Ultra for a more comprehensive evaluation. Also, it’s worth you clarifying at the start of the article which Gemini model was used and its limitations compared to the Ultra version.

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner