VOOZH about

URL: https://thenewstack.io/improving-chatgpts-ability-to-understand-ambiguous-prompts/

⇱ Improving ChatGPT's Ability to Understand Ambiguous Prompts - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-11-27 09:00:29
Improving ChatGPT's Ability to Understand Ambiguous Prompts
sponsor-zilliz,sponsored-post-contributed,
AI / Large Language Models

Improving ChatGPT’s Ability to Understand Ambiguous Prompts

Prompt engineering technique helps large language models (LLMs) handle pronouns and other complex coreferences in retrieval augmented generation (RAG) systems.
Nov 27th, 2023 9:00am by Cheney Zhang
👁 Featued image for: Improving ChatGPT’s Ability to Understand Ambiguous Prompts
Featured image by Michael Dziedzic on Unsplash.
Zilliz sponsored this post.
In the ever-expanding realm of AI, large language models (LLMs) like ChatGPT are driving innovative research and applications at an unprecedented speed. One significant development is the emergence of retrieval augmented generation (RAG). This technique combines the power of LLMs with a vector database acting as long-term memory to enhance the accuracy of generated responses. An exemplary manifestation of the RAG approach is the open source project Akcio, which offers a robust question-answer system.
👁 Akcio's architecture diagram

Akcio’s architecture

In Akcio’s architecture, domain-specific knowledge is seamlessly integrated into a vector store, such as Milvus or Zilliz (fully managed Milvus), using a data loader. The vector store retrieves the Top-K most relevant results for the user’s query and conveys them to the LLM, providing the LLM with context about the user’s question. Subsequently, the LLM refines its responses based on the external knowledge. For instance, if a user queries, “What are the use cases of large language models in 2023?” about an article titled “Insights Report on the Progress of Large Language Models in 2023” that was imported into Akcio, the system adeptly retrieves the three most relevant passages from the report:
1. In 2023, the LLM use cases can be divided into two categories: generation AI and decision-making. Decision-making scenarios are expected to have higher business value.
2. The generation AI scenario mainly includes dialogue interaction, code development, intelligent agents, etc.
3. NLP applications include text classification, machine translation, sentiment analysis, automatic summarization, etc.
Akcio combines these passages with the original query and forwards them to the LLM, generating a nuanced and precise response:
The application scenarios of the large model industry can be divided into generation and decision-making scenarios.

The Challenge of Coreference Resolution in RAG

However, despite the strides made, implementing RAG systems introduces challenges, particularly in multi-turn conversations involving coreference resolution. Consider this sequence of questions:
Q1: What are the use cases of the large language model in 2023?
A1: The use cases of large language models can be divided into generation AI and decision-making.
Q2: What are their differences, and can you provide examples?
The pronoun “their” in Q2 refers to “generation AI and decision-making.” Yet, the LLM might generate irrelevant results to this question that undermine the conversation’s coherence:
1. BERT and GPT are important models in the field of NLP, but they have significant differences in their design and application scenarios.
2. The difference between large models and small models lies in their scale and complexity. Large models usually have more parameters and more complex structures, requiring more computational resources and time for training and inference. Small models, on the other hand, are relatively simple with fewer parameters and faster training and inference speeds.
3. However, there is no additional information to distinguish between these two products because they appear very similar.

Using ChatGPT for Coreference Resolution

Traditional methods, such as tokenization, lemmatization and keyword replacement using recurrent neural networks, are often inadequate for resolving complex references. Consequently, researchers have turned to LLMs like ChatGPT for coreference resolution tasks. This approach involves instructing ChatGPT to substitute pronouns or retain the original question based on the context provided. While this method is promising, it occasionally produces direct answers instead of following the prompt instructions, which indicates the need for a more-refined strategy.

Examples Tested

We experimented with straightforward commands urging ChatGPT to replace pronouns using the following prompt format:
prompt = f'''Please generate a new question adhering to the following guidelines:
1. Complete any missing pronouns or conditions in the question based on the context.
2. Retain the original question if it's already complete.


{history}
Question: {question}'''
Example 1 Query:
history = ''' What is NLP?
NLP stands for Natural Language Processing. It is a field of computer science and artificial intelligence that focuses on the interaction between computers and humans using natural language. '''
question = 'What problem does it solve?'
ChatGPT’s response:
What problem does Natural Language Processing (NLP) solve?
In this case, ChatGPT did a great job, replacing “it” with “Natural Language Processing (NLP).” Example 2 Query:
history = '''When was GPT4 released?
GPT4 was released in 2023'''
question = 'What progress has been made in computer vision this year?'
ChatGPT’s response:
What progress has been made in computer vision this year?
In this case, ChatGPT struggled with substituting “this year,” leading to an incomplete resolution. Example 3 Query:
history = '''What is GPT3?
GPT-3 (Generative Pre-trained Transformer 3) is a state-of-the-art language processing model developed by OpenAI.
When was GPT4 released?
GPT4 was released in 2023'''
question = 'What is the difference between them? What are the advantages of the latter?'
ChatGPT’s response:
Sorry, the information provided about GPT-4's release date is not accurate as it has not been officially announced by OpenAI. Therefore, it is not possible to answer the question about the difference between GPT-3 and GPT-4 or the advantages of the latter.
Unfortunately, ChatGPT diverges from the instructions, possibly due to its intricate decision-making process. Despite our efforts to reinforce the prompt, ChatGPT occasionally veers toward direct answers, complicating the coreference resolution task.

Few-Shot Prompt with Chain of Thought: A Refined Approach

Prompt engineering plays a pivotal role in harnessing LLMs effectively. We decided to test combining few-shot prompts with the Chain of Thought (CoT) method as a promising strategy. Few-shot prompts present LLMs with multiple reference examples, guiding them to emulate those examples in their responses. CoT enhances LLMs’ performance in complex reasoning tasks by encouraging step-by-step reasoning in their answers. By integrating these techniques, we developed a prompt format to guide ChatGPT through coreference resolution. The revised prompt format includes an empty conversation history, basic examples, failed pronoun replacements and cases involving multiple pronouns, to offer ChatGPT more explicit instructions and reference examples. Instances where ChatGPT returns NEED COREFERENCE RESOLUTION: Yes are crucial, as they indicate that ChatGPT needs to replace pronouns or ambiguous references for a coherent response. Here is a refined prompt format:
REWRITE_TEMP = f'''
HISTORY:
[]
NOW QUESTION: Hello, how are you?
NEED COREFERENCE RESOLUTION: No => THOUGHT: Consequently, the output question mirrors the current query. => OUTPUT QUESTION: Hello, how are you?
-------------------
HISTORY:
[Q: Is Milvus a vector database?
A: Yes, Milvus is a vector database.]
NOW QUESTION: How to use it?
NEED COREFERENCE RESOLUTION: Yes => THOUGHT: I must substitute 'it' with 'Milvus' in the current question. => OUTPUT QUESTION: How to use Milvus?
-------------------
HISTORY:
[]
NOW QUESTION: What are its features?
NEED COREFERENCE RESOLUTION: Yes => THOUGHT: Although 'it' requires substitution, there's no suitable reference in the history. Thus, the output question remains unchanged. => OUTPUT QUESTION: What are its features?
-------------------
HISTORY:
[Q: What is PyTorch?
A: PyTorch is an open-source machine learning library for Python. It provides a flexible and efficient framework for building and training deep neural networks.
Q: What is Tensorflow?
A: TensorFlow is an open-source machine learning framework. It provides a comprehensive set of tools, libraries, and resources for building and deploying machine learning models.]
NOW QUESTION: What is the difference between them?
NEED COREFERENCE RESOLUTION: Yes => THOUGHT: 'Them' should be replaced with 'PyTorch and Tensorflow' in the current question. => OUTPUT QUESTION: What is the difference between PyTorch and Tensorflow?
-------------------
HISTORY:
[{history}]
NOW QUESTION: {question}
NEED COREFERENCE RESOLUTION: '''

Examples Tested and Refined Responses

Here are some results from our experiments with refined prompts: Example 1 Query:
history = '''When was GPT-4 released?
GPT-4 was released in 2023'''
question = 'What progress has been made in computer vision this year?'
ChatGPT’s refined response:
Yes => THOUGHT: I need to replace "this year" with "2023" in the now question. => OUTPUT QUESTION: What progress has been made in computer vision in 2023?
Example 2: Query:
history = '''What is GPT-3?
GPT-3 (Generative Pre-trained Transformer 3) is a state-of-the-art language processing model developed by OpenAI.
When was GPT-4 released?
GPT-4 was released in 2023'''
question = 'What is the difference between the two? What are the advantages of the latter?'
ChatGPT’s refined response:
Yes => THOUGHT: I need to replace 'the two' with 'GPT-3 and GPT-4' and 'the latter' with 'GPT-4' in the now question. => OUTPUT QUESTION: What is the difference between GPT-3 and GPT-4? What are the advantages of GPT-4?
The refined prompt format significantly enhances ChatGPT’s ability to handle intricate coreference resolution tasks. Questions involving multiple entities, which previously posed challenges, are now addressed effectively. ChatGPT adeptly substitutes pronouns and ambiguous references, delivering accurate and contextually relevant responses.

Conclusion

Prompt engineering plays a pivotal role in resolving coreference problems in RAG systems using LLMs. By integrating innovative techniques such as few-shot prompts and CoT methods, we’ve significantly improved handling complex references in RAG systems, enabling LLMs like ChatGPT to substitute pronouns and ambiguous references accurately and resulting in coherent responses.
Zilliz is a leading vector database company, offering high-performing and scalable solutions. We’re powered by Milvus, the popular open-source vector database that helps companies from any scale build AI-powered search solutions.
Learn More
TRENDING STORIES
Cheney Zhang is an accomplished Algorithm Engineer at Zilliz. With a profound passion for and expertise in cutting-edge AI technologies such as LLMs and retrieval augmented generation (RAG), Cheney has actively contributed to many innovative AI projects, including Towhee, Akcio,...
Read more from Cheney Zhang
Zilliz sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Milvus Lite, a lightweight version of the open source vectorDB Milvus, installs easily & integrates with 20+ AI tools.