VOOZH about

URL: https://thenewstack.io/building-smarter-chatbots-with-advanced-language-models/

⇱ Building Smarter Chatbots With Advanced Language Models - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-15 10:59:39
Building Smarter Chatbots With Advanced Language Models
sponsor-gcore,sponsored-post-contributed,
AI / Large Language Models

Building Smarter Chatbots With Advanced Language Models

Use LangChain Community, Mixtral 8-7B and ChromaDB to develop a powerful, intuitive chatbot using vector database retrieval and semantic search.
May 15th, 2024 10:59am by Mikhail Khlystun
👁 Featued image for: Building Smarter Chatbots With Advanced Language Models
Featured image by Gcore.
Gcore sponsored this post.
The development of chatbots is evolving rapidly, with new tools and frameworks making it easier and more efficient to build sophisticated systems. But current large language models (LLMs) suffer from limitations: They lack current knowledge and can’t access domain-specific information, such as the contents of a company’s knowledge base. Retrieval-augmented generation (RAG) can solve this problem by finding knowledge beyond the LLM’s training data, and then passing that information to an LLM. In this technical article, I’ll explain how to leverage LangChain Community, Mixtral 8-7B and ChromaDB to create an advanced chatbot capable of processing diverse file types for retrieving information from a vector database, searching via semantic search and interacting with users through an intuitive interface.

Evolving Chatbot Technologies

The tools and processes for chatbot development are evolving very quickly. They are expanding chatbots’ capabilities and changing how they interact with users and process information. I’ve identified five that I believe are particularly important, and I’ll be using them in this tutorial.
  • Transitioning to LangChain Community and Mixtral 8-7B: The shift from LangChain and Mistral to their more advanced counterparts, LangChain Community and Mixtral 8-7B, marks a significant evolution in chatbot development. These tools extend the application range of chatbots, enabling document processing and enhancing natural language understanding across various domains.
  • Transitioning from graph databases to ChromaDB: ChromaDB supports storing and querying large-scale, high-dimensional data. This makes ChromaDB a superior choice for managing complex data types and structures in diverse applications.
  • Using conversational retrieval chain: While RAG enhances chatbot responses by enabling access to external data beyond the LLM’s training dataset, the conversational retrieval chain builds on this by dynamically retrieving information from vector databases during the conversation. This shift retains the benefits of RAG while also improving chatbot interactivity and relevance by integrating real-time, context-specific data retrieval via advanced language models.
  • Advanced file handling and processing: The new scenario expands the types of files handled, including PDF, M4A, CSV, Excel and EML, and introduces advanced processing techniques. This involves using ChromaDB for storing and querying extracted information and integrating voice recognition for audio files, expanding the chatbot’s ability to handle various data sources.
  • Deployment with the Gradio interface: Gradio provides an interactive and user-friendly interface for testing and deploying AI models, including chatbots. This makes it easier for users to interact with the system in real time.
I’ll put these tools into action in this tutorial. But first, a note on RAG for the uninitiated.

Understanding RAG

RAG plays a pivotal role in enhancing the functionality of LLMs. RAGs facilitate LLMs’ access to external data, enabling them to generate responses with added context. The result is an app that gives end users a superior, next-gen LLM experience. Your LLM is simply more helpful and effective with RAG. RAG operates through a sequence of four key steps:
  1. Loading encoded documents: The process begins by loading a vector database with documents that have been encoded into machine-readable format.
  2. Query encoding: The user’s query is transformed into a vector using a sentence transformer. This vectorized format of the query makes it compatible with the encoded documents in the database.
  3. Context retrieval: The encoded query is used to retrieve relevant context from the vector database. This context contains the information needed to generate a response that appropriately addresses the user’s query.
  4. Prompting the LLM: The retrieved context and the query are used to prompt the LLM. The LLM generates a contextually appropriate and information-rich response.

Demonstrating the Impact of RAG

To illustrate the effectiveness of RAG in enhancing the chatbot’s capabilities, I prepared screenshots comparing the answers provided by the model with and without the use of RAG:

Without RAG

The model lacks the ability to access up-to-date pricing information since it was not part of the training dataset. This limitation results in responses that do not reflect current company data. 👁 Model without RAG produces inaccurate response

With RAG

After saving the pricing page as a PDF file and using it as extra content for RAG, the model effectively parsed and utilized the file, accurately answering questions regarding up-to-date pricing. This demonstrates RAG’s capability to enhance the chatbot’s performance by integrating dynamic, external information. 👁 Model with RAG produces accurate response

System Requirements and Performance

To ensure optimal performance of our chatbot system, I tested the setup on a virtual machine equipped with 4x GeForce GTX 1080 Ti GPUs. The average utilization of these resources is crucial for sustaining the demanding processes of the chatbot. 👁 Resource utilization is high with 4xGPU
By implementing the command export CUDA_VISIBLE_DEVICES=0, I restricted the system to utilize only one GPU. This adjustment significantly changed GPU resource utilization, with the model taking about 6GB of GPU memory to process the requests efficiently. 👁 Resource utilization decreases when the system is restricted to 1 GPU

How to Run the Code

This setup process gives you all the necessary tools and dependencies correctly configured to run and interact with the chatbot efficiently. The code you’ll need is available in GitHub, so I’ve avoided writing it in full here. I ran the model using Ubuntu 22.04, but it’ll work on any up-to-date Linux operating system.

Create a Virtual Environment

Initialize a new Python virtual environment to manage dependencies:

Activate the Virtual Environment

Activate the created environment to use it for the following steps:

Clone the Repository

Download the project code from our GitHub repository:

Install Dependencies:

Install all required libraries from the provided requirements file:

Run the Inference Script:

Launch the chatbot application using Python:

Access the Chatbot

Local Machine

If you are running the chatbot on your local machine, open a web browser and navigate to the local server URL:
http://127.0.0.1:5050
You’ll see this screen appear: 👁 Login screen

Remote Machine

If you are running the chatbot on a remote machine, such as in the cloud, you will need to use port-forwarding techniques. To make the bot accessible on all network interfaces, modify the server configuration in your code by changing 127.0.0.1 to 0.0.0.0: Note: Exposing the bot on a public interface can pose security risks, so ensure you have proper security measures in place.

Conclusion

The development process that I’ve shared here opens the door to creating more knowledgeable, responsive and helpful chatbots that can transcend traditional limitations by accessing updated information and providing answers informed by a comprehensive understanding of uploaded documents. This journey into chatbot development underscores the importance of integrating new technologies and the need for regularly updated development strategies that adapt to and incorporate new advancements for the creation of more intelligent, efficient and user-friendly chatbot applications. As technology continues to advance, the potential for chatbots as tools for information retrieval, customer engagement and personalized assistance is bound only by the creativity and innovation of developers. At Gcore, we pave the way for the future of AI, supporting the AI development lifecycle: training, inference and applications. We use cutting-edge NVIDIA GPUs for outstanding performance across our 180+ point-of-presence global network. Our mission is to connect the world to AI, anywhere, anytime.
Gcore is the global edge AI, cloud, network, and security solutions provider. Headquartered in Luxembourg, with a staff of 600+ operating from ten offices worldwide, Gcore provides its solutions to global leaders in numerous industries.
Learn More
The latest from Gcore
TRENDING STORIES
Mikhail Khlystun is a seasoned cloud engineer with over a decade of experience building infrastructure for public and private providers. Recently, Mikhail has focused on artificial intelligence, crafting powerful infrastructures for training and inference models using GPU from NVIDIA and...
Read more from Mikhail Khlystun
Gcore sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.