VOOZH about

URL: https://thenewstack.io/my-ai-python-coding-test-surprising-results/

⇱ My AI Python Coding Test: Surprising Results - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-02-06 11:15:55
My AI Python Coding Test: Surprising Results
AI / AI Engineering / Python

My AI Python Coding Test: Surprising Results

This hands-on experiment with Ollama and various LLMs reveals the current limitations, pitfalls, and best practices for using AI to generate Python code.
Feb 6th, 2025 11:15am by Jack Wallen
👁 Featued image for: My AI Python Coding Test: Surprising Results
Featured image via Unsplash.
You know it’s coming. You’ve heard the grumblings. You’ve read the memos and listened to the talks. AI is writing code. Trust me, I get the concern. I’m also a novelist and I’ve read accounts of other writers using AI to bang out books at a rate for which the human being cannot keep up. The silver lining there is that creative efforts undertaken with AI tend to be pretty bad. But what about the coding side of things? I decided to put Ollama to the test and have it write some Python programs to see how it fared. I was not impressed. First, let me tell you how I did this.

What I Used

To begin with, I decided to use a locally installed instance of Ollama, with the Msty frontend. I decided to add the frontend into the mix because I wanted it to be as efficient as possible. Although the terminal usage of Ollama is fairly simple, Msty makes some of the features more accessible (such as adding new models and Knowledge Stacks and using a prompts library). Initially, I decided to use the llama-3.2 model for the first round of testing. I fed Ollama the following prompt:
Write a Python application that asks the user how many dice to roll, how many sides are on each dice, and then roll the dice the user has entered
Here’s the code llama-3.2 spit out: Guess what? It didn’t work. It looked as though it was going to function perfectly, but then it wound up stuck in a loop asking How many dice would you like to roll? There were a few obvious errors in the code. Take a look at line 49, which is this:
first_half = ', '.join([str(result)[:half_points] for result in results.split(',')[0:-1]])
That should be:
first_half = ', '.join([str(result)[:half_points] for result in result.split(',')[0:-1]])
Ollama’s output had results.split, when it should be result.split. That’s a pretty goofy error, but it’s easily fixed. There’s another similar error in the line below that, which is:
second_half= [result[half_points:]for result in results.split(',')]
That should be:
second_half= [results[half_points:]for result in results.split(',')]
After making those changes, the program finally runs. Even then, if you enter a larger number when asked how many dice to roll, the error pops back up, only this time telling you that results.split should be result.split. Guess what… that won’t run either! I then tried the same prompt with the gemma2:2b model. As you probably expected, the code generated wouldn’t work. Again, it wound up caught in a loop, asking how many dice to roll. If I pare the program down to simply create an app to roll random dice numbers, gemma2:2b gets it right. I went back to each model and ran different queries to have it create various Python apps (of varying degrees of difficulty) and found it to be hit-and-miss. For instance, I wrote this query for gemma2:2b:
write a python program that accepts input for a users clothing choices and then reports what they should wear
The output of that query worked fine. I then ran the same query with the Llama 3.2 model, and the code it produced was vastly different, but it ran as well. Here’s where things get annoying. I added the DeepSeek R1 model to Msty, and every time I queried, the response seemed more like a long, drawn-out discussion on how to write code. What llama and gemma took roughly 30 seconds to spit out, DeepSeek ran for 10 minutes and gave me nothing I could use other than a long-winded back and forth that felt as random as it was guided.

What I Discovered

In the end, here’s what I discovered about using AI to write code:
  1. Start with a simple query, such as Write a program to roll a die.
  2. Test the output.
  3. Then ask the AI to update the original with a query such as taking that same program and allowing it to ask users how many dice to roll.
  4. Test the output.
  5. Further, refine the application with another query.
  6. Test the output.
  7. Keep refining until you’re done.
Whenever I used Ollama and Msty to write Python programs with the above tactic, the results were much better than diving into something more complex. The other takeaway is that different models are better suited for this purpose. For example, skip right past DeepSeek and use one of the Qwen models (such as Qwen2.5 Coder). When I attempted the same experiment using the Qwen2.5 Coder LLM, things were a bit more predictable. Almost every time I used this model, the results worked. Even better, the code it produced was far less complicated, so it was easier to read and debug (when needed). Another thing is not to expect perfect results. You will have to tweak things and even try out different models. I even ran across issues with Msty tanking on me, which helped me draw this simple conclusion: The companies creating AI want you to believe their tools are as capable as you are at writing code and that is not exactly true. When you use AI to write code, it’s imperative that you comb through every line in the output and test it because, more than likely, you’re going to have to spend a good amount of time debugging. I was actually excited about writing this piece because I’d tested Ollama and Msty with some fairly basic applications, and it performed admirably. When things got more complex, however, AI let me down. In the end, remember these key things:
  • Choose the right model.
  • Start off simple.
  • Vet the code.
TRENDING STORIES
Jack Wallen is what happens when a Gen Xer mind-melds with present-day snark. Jack is a seeker of truth and a writer of words with a quantum mechanical pencil and a disjointed beat of sound and soul. Although he resides...
Read more from Jack Wallen
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.