Just a few months ago, DeepSeek shook the AI world with its V3, R1, and Janus models. And recently, Elon Musk’s Grok 3 has joined the party! With the claim of being the smartest AI on Earth, expectations from Grok 3 are quite high. As per the official benchmarks shared by the xAI team at the launch event, it seems to be a monster model, outperforming all its competitors on several benchmarks. But is it actually good enough for regular tasks you and I do in our daily lives? Is it worth spending $40 on an X Premium+ account to access the model? You will definitely know the answers by the end of this article. Let’s begin our competition, testing Grok 3 vs DeepSeek R1 on five diverse prompts!

👁 Grok 3 vs DeepSeek R1

Task 1: Python Code Generation

Prompt: “Write a Python function to generate a random maze using depth-first search and visualize it using Matplotlib”

Grok 3

import random
import numpy as np
import matplotlib.pyplot as plt

def generate_maze(width, height):
 """
 Generate a random maze using Depth-First Search and visualize it with Matplotlib.
 
 Parameters:
 width (int): Width of the maze (number of columns)
 height (int): Height of the maze (number of rows)
 """
 # Initialize the maze grid with walls (1) and paths (0)
 maze = np.ones((height * 2 + 1, width * 2 + 1), dtype=int)
 maze[1::2, 1::2] = 0 # Set paths (odd rows/columns) to 0
 
 def carve_path(x, y):
 """
 Recursively carve paths in the maze using DFS.
 """
 directions = [(0, 2), (2, 0), (-2, 0), (0, -2)] # Down, Right, Up, Left
 random.shuffle(directions)
 
 for dx, dy in directions:
 new_x, new_y = x + dx, y + dy
 if (0 <= new_x < height * 2 + 1 and 
 0 <= new_y < width * 2 + 1 and 
 maze[new_x, new_y] == 1):
 maze[new_x - dx // 2, new_y - dy // 2] = 0 # Remove wall
 maze[new_x, new_y] = 0 # Carve path
 carve_path(new_x, new_y)
 
 # Start carving from the top-left path (1,1)
 carve_path(1, 1)
 
 # Ensure entrance and exit are open (top-left and bottom-right paths)
 maze[0, 1] = 0 # Entrance
 maze[-1, -2] = 0 # Exit
 
 # Visualize the maze
 plt.figure(figsize=(width // 2, height // 2))
 plt.imshow(maze, cmap='binary', interpolation='nearest')
 plt.title("Random Maze Generated with DFS")
 plt.axis('off')
 plt.show()
 
 return maze

# Example usage
if __name__ == "__main__":
 # Generate a 10x10 maze (actual grid will be 21x21 due to walls)
 maze = generate_maze(10, 10)#import csv

DeepSeek R1:

import random
import matplotlib.pyplot as plt

def generate_maze(width, height):
 # Initialize the grid with walls (1)
 grid = [[1 for _ in range(width)] for _ in range(height)]
 
 # Starting point
 start_x, start_y = 1, 1
 grid[start_y][start_x] = 0
 
 # Stack for DFS
 stack = [(start_x, start_y)]
 
 # Directions: up, right, down, left
 directions = [(0, -2), (2, 0), (0, 2), (-2, 0)]
 
 while stack:
 x, y = stack[-1]
 random.shuffle(directions)
 
 for dx, dy in directions:
 nx, ny = x + dx, y + dy
 if 0 < nx < width - 1 and 0 < ny < height - 1 and grid[ny][nx] == 1:
 grid[ny][nx] = 0
 grid[y + dy // 2][x + dx // 2] = 0
 stack.append((nx, ny))
 break
 else:
 stack.pop()
 
 return grid

def plot_maze(grid):
 height = len(grid)
 width = len(grid[0])
 
 fig, ax = plt.subplots(figsize=(width, height))
 ax.set_xticks([])
 ax.set_yticks([])
 
 for y in range(height):
 for x in range(width):
 if grid[y][x] == 1:
 ax.add_patch(plt.Rectangle((x, height - y - 1), 1, 1, color='black'))
 
 ax.set_xlim(0, width)
 ax.set_ylim(0, height)
 ax.set_aspect('equal')
 plt.gca().invert_yaxis()
 plt.show()

# Parameters
maze_width = 21 # Should be odd
maze_height = 21 # Should be odd

# Generate and plot the maze
maze = generate_maze(maze_width, maze_height)
plot_maze(maze)#import csv

Output:

👁 Grok 3 vs DeepSeek R1- Code Generation.webp

Observation:

The DeepSeek R1 output presents a clear, well-structured maze that appears professionally generated. The sharp and consistent walls create a properly enclosed labyrinth, ensuring a challenging yet visually clean pathfinding experience. It adheres to good maze-generation principles, making it both functional and aesthetically pleasing.

In contrast, the Grok 3 output appears highly pixelated, with less distinguishable paths due to the thick black-and-white grid structure. The overall design feels more chaotic, with pathways that are less defined compared to DeepSeek R1. While it does utilize Depth-First Search (DFS) generation, the visual clarity and usability are significantly inferior, making it a less effective maze representation.

Verdict:

Grok 3 ❌ | DeepSeek R1 ✅

Task 2: Web Search

As DeepSeek R1 doesn’t allow web search, I have used DeepSeek search option for this task!

Prompt: “What are the latest advancements in nuclear fusion technology as of 2025? Fromat of output – List of advancements +source”

Observation:

Between DeepSeek R1 and Grok 3, DeepSeek R1 is the better model for answering research-heavy queries like advancements in nuclear fusion technology. The biggest advantage of DeepSeek R1 is that it provides direct, clickable source links for every claim, ensuring transparency and credibility. It also covers a wider range of advancements, including fusion fuel technology, private-sector investments, regulatory changes, and government initiatives, making its response more comprehensive. Additionally, its sources come from authoritative platforms such as ITER, UKAEA, and the U.S. DOE, which significantly improves trustworthiness.

On the other hand, while Grok 3 delivers better explanatory depth, it falls short in credibility because it does not include direct source links. Instead, it vaguely references news outlets like BBC News or MIT News without proper citations. A major weakness is its reliance on posts from X (Twitter), which often contain unverified claims. While Grok 3 does acknowledge these limitations, the lack of concrete sources makes it less reliable for factual research.

Verdict:

Grok 3 ❌ | DeepSeek R1 ✅

Task 3: Basic Animation Using HTML

Prompt: “Create HTML + CSS code for a red ball rotating inside a square continuously.”

Grok 3:

<!DOCTYPE html>
<html lang="en">
<head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Rotating Red Ball in Square</title>
 <style>
 body {
 margin: 0;
 padding: 0;
 height: 100vh;
 display: flex;
 justify-content: center;
 align-items: center;
 background-color: #f0f0f0; /* Light gray background for contrast */
 overflow: hidden;
 }

 .container {
 position: relative;
 width: 300px;
 height: 300px;
 background-color: #000; /* Black square */
 border: 2px solid #333; /* Dark border for visibility */
 }

 .ball {
 position: absolute;
 width: 30px;
 height: 30px;
 background-color: #ff0000; /* Red ball */
 border-radius: 50%; /* Makes it circular */
 animation: rotateBall 3s linear infinite; /* Continuous rotation */
 transform-origin: 150px 150px; /* Center of the square (150px from left/top) */
 }

 @keyframes rotateBall {
 0% {
 transform: rotate(0deg) translate(120px) rotate(0deg);
 }
 100% {
 transform: rotate(360deg) translate(120px) rotate(-360deg);
 }
 }
 </style>
</head>
<body>
 <div class="container">
 <div class="ball"></div>
 </div>
</body>
</html>

DeepSeek R1:

<!DOCTYPE html>
<html>
<head>
 <style>
 .container {
 width: 200px;
 height: 200px;
 border: 2px solid #000;
 margin: 50px auto;
 position: relative;
 overflow: hidden;
 }

 .track {
 width: 100%;
 height: 100%;
 animation: rotate 4s linear infinite;
 transform-origin: center;
 }

 .ball {
 width: 20px;
 height: 20px;
 background-color: red;
 border-radius: 50%;
 position: absolute;
 top: 0;
 left: 50%;
 transform: translateX(-50%);
 }

 @keyframes rotate {
 from {
 transform: rotate(0deg);
 }
 to {
 transform: rotate(360deg);
 }
 }
 </style>
</head>
<body>
 <div class="container">
 <div class="track">
 <div class="ball"></div>
 </div>
 </div>
</body>
</html>

Output:

Observation:

Clearly, Grok 3 is struggling to interpret the prompt and generate the correct response. DeepSeek R1 took about 8-10 minutes to reply, but its output is more precise and accurate.

Verdict:

Grok 3 ❌ | DeepSeek R1 ✅

Task 4: Image Analysis

Prompt: “Analyze this chessboard position. Suggest the best move for the current player (white) to checkmate black and explain the reasoning“

👁 chess board

Grok 3:

👁 Image

DeepSeek R1:

👁 Image

Observation:

Both Grok 3 and DeepSeek R1 got the answer wrong. Grok 3 suggested e4-e5, which does not deliver checkmate or threaten the king. DeepSeek R1 suggested Qe1#, but this move is impossible in the given position, showing it misinterpreted the board. The correct move was Qf7#, where the queen delivers checkmate by trapping the Black king. Grok 3 failed to recognize an immediate checkmate, while DeepSeek R1 assumed the wrong board setup instead of analyzing the actual position.

Verdict:

Grok 3 ❌ | DeepSeek R1 ❌

Task 5: Logical Reasoning

Prompt: “Solve this zebra puzzle. Give me output in a table”

👁 Image

Grok 3:

👁 Image

Putting the generated response in the puzzle:

👁 Image

DeepSeek R1:

👁 Image

Putting the generated response in the puzzle:

👁 Image

Observation:

DeepSeek R1 again took a longer to respond but gave the correct answer. Grok 3 failed to understand the image and gave incorrect output.

Verdict:

Grok 3 ❌ | DeepSeek R1 ✅

Grok 3 vs DeepSeek R1: Result

Python Code Generation	✅ DeepSeek R1
Web Search	✅ DeepSeek R1
Basic Animation (HTML + CSS)	✅ DeepSeek R1
Image Analysis (Chessboard Checkmate)	❌ Both Failed
Logical Reasoning (Zebra Puzzle)	✅ DeepSeek R1

End Note

Elon Musk’s Grok 3 was hyped as a game-changer in AI, claiming to be the smartest model on Earth. However, in real-world testing, it failed to live up to the expectations. Across multiple tasks, Grok 3 struggled with accuracy, logical reasoning, and complex problem-solving, often producing incorrect or poorly structured responses. Meanwhile, DeepSeek R1 consistently outperformed it, delivering more accurate, structured, and verifiable answers in key areas like code generation, web search, and logical reasoning.

Despite the bold marketing claims, Grok 3 still has a long way to go before it can compete with top AI models. The fact that it failed basic reasoning tasks suggests that xAI needs major improvements in its training approach. However, given Musk’s track record of rapid iteration and improvements, it will be interesting to see if future updates can bridge this gap. Will Grok 3 evolve into the AI powerhouse it claims to be, or will it remain an overhyped experiment? Time will tell.

Discover the power of xAI Grok 3, the smartest AI on Earth! Dive into our blog for an in-depth comparison with DeepSeek R1 and see which model reigns supreme. Don’t miss out!

Stay tuned to Analytics Vidhya Blog to follow Grok 3 updates regularly!

👁 Nitika Sharma

Nitika Sharma

Hello, I am Nitika, a tech-savvy Content Creator and Marketer. Creativity and learning new things come naturally to me. I have expertise in creating result-driven content strategies. I am well versed in SEO Management, Keyword Operations, Web Content Writing, Communication, Content Strategy, Editing, and Writing.

Beginner Generative AI