VOOZH about

URL: https://www.analyticsvidhya.com/blog/2023/07/datahour-reducing-chatgpt-hallucinations-by-80/

⇱ DataHour: Reducing ChatGPT Hallucinations by 80%


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

DataHour: Reducing ChatGPT Hallucinations by 80%

Shivansh Kaushal Last Updated : 08 Jul, 2023
4 min read

Introduction

Natural Language Processing (NLP) models have become increasingly popular in recent years, with applications ranging from chatbots to language translation. However, one of the biggest challenges in NLP is reducing ChatGPT hallucinations or incorrect responses generated by the model. In this article, we will discuss the techniques and challenges involved in reducing hallucinations in NLP models.

👁 Reducing ChatGPT Hallucinations

Observability, Tuning, and Testing

The first step in reducing hallucinations is to improve the observability of the model. This involves building feedback loops to capture user feedback and model performance in production. Tuning involves improving poor responses by adding more data, correcting retrieval issues, or changing prompts. Testing is necessary to ensure changes improve results and do not cause regressions. The challenges faced in observability include customers sending screenshots of bad responses, leading to frustration. To address this, logs can be monitored daily using data ingestion and secret code.

Debugging and Tuning a Language Model

The process of debugging and tuning a language model involves understanding the model input and response. To debug, logging is necessary to identify the raw prompt and filter it down to specific chunks or references. The logs need to be actionable and easy to understand for anyone. Tuning involves determining how many documents should be fed into the model. Default numbers are not always accurate, and a similarity search may not yield the correct answer. The goal is to figure out why something went wrong and how to fix it.

Optimizing OpenAI Embeddings

👁 The End of the Giant AI Models Era: OpenAI CEO Warns Scaling Era is Over

Developers of a vector database query application faced challenges in optimizing the performance of the OpenAI embeddings used in the application. The first challenge was determining the optimal number of documents to pass to the model, which was addressed by controlling the chunking strategy and introducing a controllable hyperparameter for the number of documents.

The second challenge was prompt variation, which was addressed using an open-source library called Better Prompt that evaluates the performance of different prompt versions based on perplexity. The third challenge was improving the results from the OpenAI embeddings, which were found to perform better than sentence transformers in multilingual scenarios.

Techniques in AI Development

The article discusses three different techniques used in AI development. The first technique is perplexity, which is used to evaluate the performance of a prompt on a given task. The second technique is building a package that allows users to test different prompt strategies easily. The third technique is running an index, which involves updating the index with additional data when something is missing or not ideal. This allows for more dynamic handling of questions.

Using GPT-3 API to Calculate Perplexity

👁 Image

The speaker discusses their experience with using the GPT-3 API to calculate perplexity based on a query. They explain the process of running a prompt through the API and returning the log probabilities for the best next token. They also mention the possibility of fine-tuning a large language model to imitate a particular writing style, rather than embedding new information.

Evaluating Responses to Multiple Questions

The text discusses the challenges of evaluating responses to 50+ questions at a time. Manually grading every response takes a lot of time, so the company considered using an auto-evaluator. However, a simple yes/no decision framework was insufficient because there are multiple reasons why an answer may not be correct. The company broke down the evaluation into different components, but found that a single run of the auto-evaluator was erratic and inconsistent. To solve this, they ran multiple tests per question and classified the responses as perfect, almost perfect, incorrect but containing some correct information, or completely incorrect.

Reducing Hallucinations in NLP Models

The speaker discusses their process for reducing hallucinations in natural language processing models. They broke down the decision-making process into four categories and used an auto feature for the 50 plus category. They also rolled out the evaluation process into the core product, allowing for evaluations to be run and exported to CSB. The speaker mentions a GitHub repo for more information on the project. They then discuss the steps they took to reduce hallucinations, including observability, tuning, and testing. They were able to reduce the hallucination rate from 40% to sub 5%.

Conclusion

Reducing ChatGPT hallucinations in NLP models is a complex process that involves observability, tuning, and testing. Developers must also consider prompt variation, optimizing embeddings, and evaluating responses to multiple questions. Techniques such as perplexity, building a package for testing prompt strategies, and running an index can also be useful in AI development. The future of AI development lies in small, private, or task-specific elements.

Key Takeaways

  • Reducing ChatGPT hallucinations in NLP models involves observability, tuning, and testing.
  • Developers must consider prompt variation, optimizing embeddings, and evaluating responses to multiple questions.
  • Techniques such as perplexity, building a package for testing prompt strategies, and running an index can also be useful in AI development.
  • The future of AI development lies in small, private, or task-specific elements.

Frequently Asked Questions

Q1. What is the biggest challenge in reducing hallucinations in NLP models?

A. The biggest challenge is improving the observability of the model and capturing user feedback and model performance in production.

Q2. What is perplexity?

A. Perplexity is a technique to evaluate the performance of a prompt on a given task.

Q3. How can developers optimize OpenAI embeddings?

A. Developers can optimize OpenAI embeddings by controlling the chunking strategy, introducing a controllable hyperparameter, and using an open-source library to evaluate prompt variations.

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
👁 Av Logo White

Continue your learning for FREE

Forgot your password?
👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner
👁 AI Popup Banner