VOOZH about

URL: https://thenewstack.io/beyond-chatgpt-tailored-ai-in-test-automation/

⇱ Beyond ChatGPT: Tailored AI in Test Automation - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-12-07 10:00:35
Beyond ChatGPT: Tailored AI in Test Automation
contributed,
Software Testing

Beyond ChatGPT: Tailored AI in Test Automation

While solutions like ChatGPT can do cool things in the realm of software testing, they’re not the right tools for serious test authoring and maintenance.
Dec 7th, 2023 10:00am by Frank Moyer
👁 Featued image for: Beyond ChatGPT: Tailored AI in Test Automation
Image by Esther Merbt from Pixabay.

It’s safe to say that, by this point, virtually every developer and Quality Assurance (QA) engineer who has an internet connection has experimented with generative AI technology like ChatGPT.

It’s probably also safe to say that most of these folks have been impressed by what generative AI software can do in the realm of code generation and software testing.

ChatGPT and similar services are impressively adept at producing test cases, for example, using virtually any language or automation framework that you ask them to work with.

But does that mean it’s time to surrender to the robots by handing responsibility for software testing to generative AI tools?

I’m here to tell you it’s not.

While solutions like ChatGPT can do cool things in the realm of software testing, they’re not the right tools for serious test authoring and maintenance.

Allow me to explain by discussing where generative AI excels in the context of software testing, then walking through reasons why you typically shouldn’t use it to write tests.

The Appeal of GenAI for Software Testing

The main capability generative AI tools bring to software testing is that they can automatically produce scripts to execute automated tests.

This is a big deal because the ability to automate tests — as opposed to running them manually, an approach that takes much more time and yields less consistent testing results — is critical for testing at scale, especially for businesses that want to be able to build, test and deliver software updates on a frequent basis.

Yet, traditionally, actually writing the tests that power automated testing was a lot of work.

In fact, test authoring was often the biggest pain point in QA. A recent survey conducted by my company, Kobiton, shows that nearly half of QA teams spend at least nine hours writing a single test case.

Because ChatGPT has no contextual knowledge about what the app does or which features are most important to users, it has no ability to determine what is most critical to test.

Eight percent of organizations spend 40 or more hours on that task. Given that a single application might require dozens or even hundreds of tests, generating the tests to power automated testing can be a monumental task.

Recent advancements like DALL-E 3’s integration with GPT pave the way for generating test scripts directly from app screenshots — a task that might soon take seconds, not days. This cutting-edge capability could revolutionize how QA engineers approach test automation using tools like Selenium and TestNG.

While GPT impressively automated basic test cases for Kobiton, its aptitude for handling more intricate, domain-specific user interactions was lacking.

This highlights the crucial role of domain expertise in ensuring comprehensive coverage and underscores the limitations of current AI in grasping the nuances of complex testing scenarios. As much as some folks would like to say that GenAI doesn’t reliably produce good code, that isn’t usually the case when it comes to generating automated software tests.

Why ChatGPT Might Not Be the Best Tool for Software Test Generation

But just because ChatGPT and similar tools can save so much time by automatically generating tests doesn’t mean they’re the right solution for every test automation need. On the contrary, if you rely on public generative AI services to produce tests, you face two major risks.

Lack of Domain Expertise

One risk is the fact that, although there’s no denying that the test scripts produced by genAI typically execute well, there is no way of guaranteeing that they’ll test the right things.

ChatGPT’s limitation is clear: it lacks the domain expertise to discern which specific app features to prioritize for testing, potentially overlooking critical test cases.

The fact that they can do things like automatically look at screenshots and identify visual elements is very cool. But because they have no contextual knowledge about what the app does or which features are most important to users, they cannot determine what is most critical to test.

As a result, without human oversight to correct for potential AI bias, you might end up with tests that run well and take you seconds to generate, but that offer little value because they don’t test the right things. In turn, you have to run some tests manually because your automated tests don’t offer adequate coverage.

Lack of Maintainability

It’s easy to ask ChatGPT to generate tests. It’s much harder — and, in many cases, impossible — to ask it to update an existing test due to changes in the app you need to test. A ChatGPT prompt such as the following isn’t likely to get you very far: “Here’s a test you wrote eight months ago. I added a new UI feature to my app and now would like you to update the test.”

You can, of course, simply generate new tests from scratch every time your app changes and you need to update your tests. But the problem there is that you lose test consistency, as well as visibility into the historic state of your tests.

ChatGPT tends to style tests differently each time it produces one; indeed, being able to generate original content in response to similar requests is part of what makes generative AI so powerful in general.

But getting different results for similar queries is a bad thing in the context of software testing, where it’s better to have a standing set of tests that evolve over time, rather than tests that you regenerate from scratch repeatedly.

A Better Approach to Automated Test Generation

The limitations of public generative AI tools for test generation don’t mean that QA teams need to settle for producing tests manually. Instead, they should take advantage of AI tools that were designed specifically for generating tests — as opposed to generic genAI services like ChatGPT.

Tools created specifically for the test automation domain can generate consistent tests. They can also update tests over time, rather than regenerating them for each new application release. In this way, these solutions provide the benefits of fast, low-effort test generation, without the drawbacks of a generic solution like generative AI.

Conclusion: A Healthy Approach to GenAI for Testing

ChatGPT and similar tools can write tests very quickly, and the quality of the tests is usually surprisingly good. But if you think beyond the challenge of generating tests themselves, you realize that public genAI services fall short. They lack the domain expertise to know what you actually need to test, and they have little ability to update or maintain tests over time in a consistent way.

While today’s generative AI tools like ChatGPT might not dominate software testing, they are stepping stones toward more sophisticated AI applications. I expect that most QA teams will be turning to domain-specific tools that leverage AI to generate and maintain tests in ways that a general-purpose tool like ChatGPT will just never excel at.

TRENDING STORIES
Frank Moyer is a 25-year technology industry veteran with a track record of building value in startups and exiting successfully. As CTO of Kobiton, Moyer sets the product and technology direction for the company.
Read more from Frank Moyer
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.