India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

Reading list

Overview of generative AI applications and their impact

Introduction to LangChain, ChatGPT and Gemini Pro

What are Large Language Models?GPT models Mistral Llama Gemini How to build diffferent LLM AppIications?

Introduction to Prompt Engineering Best Practices and Guidelines for Prompt Engineering N shot prompting Chain of Thought Tree of Thoughts Skeleton of Thoughts Chain of Emotion

Introduction to Finetuning LLMs Parameter-Efficient Finetuning (PEFT)LORA QLORA using Unsloth using Huggingface

What do you mean by Training LLMs from Scratch?

Intro to the LangChain Ecosystem Core Components of LangChain Applications of LCEL Chains RAG using LangChain LangGraph LangSmith

Introduction to RAG systems Evaluation of RAG systems

Getting Started with LlamaIndex Components of LlamaIndex Advanced approaches for powerful RAG system

Introduction to Stable Diffusion Generating image using Stable diffusion Diffusion models Prompt Engineering Concepts for Stable Diffusion MidJourney Understanding Dalle 3

GPT-4o vs Gemini: Comparing Two Powerful Multimodal AI Models

👁 Himanshi Singh

Himanshi Singh Last Updated : 15 May, 2024

7 min read

Introduction

With the release of GPT-4o, this model is getting huge attention for its multimodal capabilities. GPT-4o is known for its advanced language processing skills and has been enhanced to interpret and generate visual content. However, we shouldn’t overlook Gemini, a model that has been highly praised for its multimodal abilities long before GPT-4o arrived. Gemini excels at combining image recognition with strong language understanding, making it a worthy competitor to GPT-4o.

In this article, we’ll compare GPT-4o vs Gemini by examining how well they perform in different tasks. By looking at their performance, we aim to determine which model is superior. This comparison is crucial because the ability to work with both text and images is valuable for many applications, such as automatic content creation and data analysis.

GPT -4o vs Gemini

Let’s compare GPT-4o and Gemini to see which one performs better.

Calculate Sum of Numbers

A very basic thing for a multimodal LLM would be to correctly identify the text/numbers written in a given image. So here we’ve provided an image with certain text and have asked our 2 competitors, GPT-4o and Gemini, to calculate the sum of the given numbers in the image. Let’s see who’s going to win this round.

👁 Calculate Sum of Numbers

GPT -4o

👁 GPT-4o

Here you can see, GPT-4o provided the correct output. Easy peasy.

Gemini

👁 Gemini

Not sure what Gemini understood with the given prompt. Although it was pretty simple, Gemini couldn’t understand the context.

Result

GPT-4o won!

Code Game Provided in the Image Attached in Python

In this round, we’ve provided an image of a tic-tac-toe game but we’ve not mentioned specifically in the prompt. The task for our models is to first identify the game and then write a python code to implement it.

👁 GPT-4o vs Gemini

GPT-4o

def print_board(board):
for row in board:
     print(" | ".join(row))
     print("-" * 9)
def check_winner(board):
# Check rows
for row in board:
     if row[0] == row[1] == row[2] != " ":
         return row[0]
# Check columns
for col in range(3):
     if board[0][col] == board[1][col] == board[2][col] != " ":
         return board[0][col]
# Check diagonals
if board[0][0] == board[1][1] == board[2][2] != " ":
     return board[0][0]
if board[0][2] == board[1][1] == board[2][0] != " ":
     return board[0][2]
return None
def is_board_full(board):
for row in board:
     if " " in row:
         return False
return True
def main():
board = [
     [" ", " ", "O"],
     ["O", "X", " "],
     [" ", " ", " "]
]
]
current_player = "X"  # X's turn to play
print("Initial Board:")
print_board(board)
while True:
     row = int(input(f"Player {current_player}, enter the row (0, 1, 2): "))
     col = int(input(f"Player {current_player}, enter the column (0, 1, 2): "))
     if board[row][col] == " ":
         board[row][col] = current_player
     else:
         print("Cell already taken! Try again.")
         continue
     print_board(board)
     winner = check_winner(board)
     if winner:
         print(f"Player {winner} wins!")
         break
     if is_board_full(board):
         print("It's a draw!")
         break
     current_player = "X" if current_player == "O" else "O"
if __name__ == "__main__":
main()

Output:

👁 Output

The code provided by GPT-4o gave this above output. Great job in identifying the game correctly and creating a grid as well. Although one of the “o” is misplaced in the given output but otherwise we’ve received a fully functional tic-tac-toe game.

Gemini

def display_board(board):
  """Displays the current board state"""
  for row in board:
for cell in row:
   print(cell, end=" ")
print()
def is_valid_move(board, row, col)
  """Checks if a move is valid (i.e. the cell is empty)"""
  return board[row][col] == " "
def make_move(board, player, row, col):
  """Places a player's mark on the board"""
  board[row][col] = player
def is_winner(board, player):
  """Checks if a player has won"""
  # Check rows
  for row in board:
if all(cell == player for cell in row):
   return True
  # Check columns
  for col in range(len(board[0])):
if all(board[row][col] == player for row in range(len(board))):
   return True
  # Check diagonals
  if all(board[row][row] == player for row in range(len(board))):
return True
  if all(board[row][len(board) - row - 1] == player for row in range(len(board))):
return True
  return False
def is_board_full(board):
  """Checks if the board is full"""
  for row in board:
for cell in row:
   if cell == " ":
     return False
  return True
def main():
  """Main game loop"""
  board = [[" " for _ in range(3)] for _ in range(3)]
  current_player = "X"
  while True:
display_board(board)
# Get player move
while True:
   row = int(input("Player {} enter row (1-3): ".format(current_player))) - 1
   col = int(input("Player {} enter column (1-3): ".format(current_player))) - 1
   if is_valid_move(board, row, col):
     break
   else:
     print("Invalid move. Please try again.")
# Make move and check for winner
make_move(board, current_player, row, col)
if is_winner(board, current_player):
   display_board(board)
   print("Player {} wins!".format(current_player))
   break
# Check for tie
if is_board_full(board):
   display_board(board)
   print("It's a tie!")
   break
# Switch player
current_player = "O" if current_player == "X" else "X"
if __name__ == "__main__":
  main()

Output:

👁 Ouput

Now, when we run the code provided by Gemini, then it identified the game clearly but as you can see no we did not get any grid. This made it difficult to play the game. Below you can check how the game looks without the grid.

👁 Output

Result

GPT-4o won!

Generate Python Code to Recreate Bar Chart using Matplotlib

In this round we’ve given an image of a bar chart to both our models. They need to analyze the chart and generate the Python code required to recreate it using Matplotlib. Additionally, they should ensure that the code produces the same bar chart when run. Let’s see which model wins this round.

👁 Generate Python Code to Recreate Bar Chart using Matplotlib

GPT-4o

import matplotlib.pyplot as plt
import numpy as np
# Data
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
sales_A = [10, 15, 17, 12]
sales_B = [10, 15, 20, 15]
# Plotting
fig, ax = plt.subplots()
bar_width = 0.35
index = np.arange(len(quarters))
bar1 = ax.bar(index, sales_A, bar_width, label='A')
bar2 = ax.bar(index, sales_B, bar_width, bottom=sales_A, label='B')
ax.set_xlabel('Quarter')
ax.set_ylabel('Sales')
ax.set_title('Sales by Product & Quarter')
ax.set_xticks(index)
ax.set_xticklabels(quarters)
ax.legend()
plt.show()

Output:

👁 Output

Gemini

import matplotlib.pyplot as plt
# Data
products = ["A", "B"]
sales_q1 = [30, 25]
sales_q2 = [40, 35]
sales_q3 = [20, 30]
sales_q4 = [10, 15]
# Create a bar chart
fig, ax = plt.subplots()
width = 0.35  # the width of the bars
index = range(len(products))  # the x-axis positions for the bars
bar1 = plt.bar(index, sales_q1, width, label='Q1')
bar2 = plt.bar([p + width for p in index], sales_q2, width, label='Q2')
bar3 = plt.bar([p + width * 2 for p in index], sales_q3, width, label='Q3')
bar4 = plt.bar([p + width * 3 for p in index], sales_q4, width, label='Q4')
# Add labels and title
plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales by Product & Quarter')
# X-axis labels
plt.xticks([p + 1.5 * width for p in index], products)
# Legend
plt.legend()
plt.show()

Output:

👁 GPT-4o vs Gemini

Result

GPT-4o won!

Explain Code and Provide the Output

Here we’ve given an image input to both our models and they have to understand the code written in the provided screenshot and additionally provide the output for the same. Let’s see how they perform on this test.

👁 Explain Code and Provide the Output

GPT-4o

Provided a very long summary but here’s the summary and output:

👁 GPT-4o summary

👁 GPT-4o vs Gemini

Gemini

Got the below explanation but no output for the code.

👁 GPT-4o vs Gemini

Point goes to GPT-4o for properly understanding the prompt and providing correct output as well.

Result

GPT-4o won!

Identify Buttons and Input Fields in the Given Design

In this prompt the models were asked for a detailed analysis of a user interface (UI) design to locate and describe interactive elements such as buttons and input fields. The goal is to specify what each element is, its purpose, and any associated labels or features.

👁 Image

GPT-4o

👁 GPT-4o vs Gemini

Impressive how accurately GPT-4o can identify items in a design with a clear understanding of each button, checkbox and textbox.

Gemini

👁 GPT-4o vs Gemini

Gemini got the input fields correct but there was some uncertainty in the submit button which was square in shape.

Result

GPT-4o won!

GPT -4o vs Gemini: Final Verdict

Tasks	Winner
Calculating the sum of numbers from an image.	GPT-4o
Writing Python code for a tic-tac-toe game based on an image.	GPT-4o
Creating Python code to recreate a bar chart from an image.	GPT-4o
Explaining code from a screenshot and providing the output.	GPT-4o
Identifying buttons and input fields in a user interface design.	GPT-4o

In this head-to-head comparison, GPT-4o clearly outperformed Gemini. GPT-4o consistently provided accurate and detailed results across all tasks, from calculating sums and coding games to generating bar charts and analyzing UI designs. It showed a strong ability to understand and process both text and images effectively.

Gemini, on the other hand, struggled in several areas. While it performed adequately in some tasks, it often failed to provide detailed explanations or accurate coding. Its performance was inconsistent, highlighting its limitations compared to GPT-4o.

Overall, GPT-4o proved to be the more reliable and versatile model. Its superior performance across multiple tasks makes it the clear winner in this comparison. If you need a model that can handle both text and images with high accuracy, GPT-4o is the better choice. In this article we explored GPT-4o vs Gemini.

👁 Himanshi Singh

Himanshi Singh

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together.

Thanks for stopping by my profile - hope you found something you liked :)

Artificial Intelligence Beginner Generative AI LLMs

Login to continue reading and enjoy expert-curated content.

Free Courses

👁 Generative AI
4.8

AWS Data Querying with S3 & Athena

Master AWS data storage & querying with S3, Athena, Glue, RDS, and Redshift.

👁 Generative AI
4.6

Foundations of LangGraph

Build reliable AI workflows using LangGraph state, memory, & agent

👁 Generative AI
4.6

Claude 4.5: Smarter, Faster & More Human AI

Build real-world AI workflow with Claude 4.5 Opus using smart, human-like AI

👁 Generative AI
4.7

NotebookLM Essentials to Pro: The Complete Practical Guide

Your complete NotebookLM guide to faster learning, smarter research, and pow

👁 Generative AI
4.7

Gemini 3: The AI That Thinks, Sees and Creates

Learn Gemini 3 through hands on demos, real apps, and multimodal AI projects

Responses From Readers

Cancel reply

👁 Caio Firme

Caio Firme

Thanks for this comparative analysis. I have just one question: did you use Gemini Ultra 1.0 or Gemini 1.0 Pro in your analysis?

123 1

Cancel reply

Show 1 reply

👁 Himanshi Singh

Himanshi Singh

Hi, this was Gemini Pro since it is free to use and GPT-4o is also available with limited use for free users.

123 456

Cancel reply

👁 partial_andy405

partial_andy405

Thanks for the article. It was very interesting. I do think it would be useful to compare GPT-4.0 with Gemini Ultra for a more comprehensive evaluation. Also, it’s worth you clarifying at the start of the article which Gemini model was used and its limitations compared to the Ultra version.

123

Cancel reply

Become an Author

Share insights, grow your voice, and inspire the data community.

Reach a Global Audience
Share Your Expertise with the World
Build Your Brand & Audience

Join a Thriving AI Community
Level Up Your AI Game
Expand Your Influence in Genrative AI

👁 imag

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

👁 Av Logo White

Continue your learning for FREE

👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner

👁 AI Popup Banner

URL: https://www.analyticsvidhya.com/blog/2024/05/gpt-4o-vs-gemini/