VOOZH about

URL: https://thenewstack.io/how-perplexitys-online-llm-was-inspired-by-freshllms-paper/

⇱ How Perplexity's Online LLM Was Inspired by FreshLLMs Paper - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-01-24 04:00:30
How Perplexity's Online LLM Was Inspired by FreshLLMs Paper
AI / Large Language Models

How Perplexity’s Online LLM Was Inspired by FreshLLMs Paper

We dig into the technology behind Perplexity’s Copilot, which was inspired by the FreshLLMs paper that proposed search engine-augmented LLMs.
Jan 24th, 2024 4:00am by Janakiram MSV
👁 Featued image for: How Perplexity’s Online LLM Was Inspired by FreshLLMs Paper
Photo by Marten Newhall on Unsplash.

Perplexity has been making waves since its appearance at the AWS re:Invent keynote in December 2023. Intrigued by the approach, I signed up for Copilot when it was launched. Out of many AI assistants that I have access to, I found Perplexity’s Copilot to be the most useful and functional. That’s because it offers the best of both worlds: generative AI and conventional search experiences. I soon replaced my default search engine with its search companion.

👁 Image

Perplexity user interface

Now let’s understand the technology behind Perplexity AI’s Copilot.

Currently, large language models (LLMs) have two major challenges: obsolete data and hallucinations. Since foundation models have a cut-off date based on their pre-training dataset, they cannot respond with the most recent data. Even the most capable models tend to make up answers, leading to hallucinations.

The first problem, which is a lack of access to the latest data, can be addressed by performing a web search and feeding the LLM with the output to help it make informed decisions. This can be accomplished by integrating APIs such as SerpAPI, which provides programmatic access to Google Search. Each time a prompt is sent, the LLM decides if it needs access to the web and then invokes the search API if required. The scrapped content from multiple sources is then summarized and added as context to the prompt, which enables the LLM to respond with a useful and meaningful response.

👁 Image

The second problem related to hallucination can be addressed through a proven technique called retrieval augmented generation, or RAG. Unlike the previous approach that made a dynamic call to the search API, RAG expects data to be retrieved from a well-known data store like a vector database or a full-text search index maintained externally.

👁 Image

It’s important to note that the first approach works best for context built from the data available in the public domain. If you are building a Q&A application or a summarization app for data that’s internal and private to your organization, RAG is the ideal solution.

Perplexity AI relies more on a search engine-based approach for its Copilot. For use cases that need access to private data, it offers an OpenAI-compatible API that can be used with RAG.

FreshLLMs: Bringing Current Data to LLMs

Perplexity AI is inspired by the mechanism explained in the paper, FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, which proposed search engine-augmented LLMs. Similar to how RAG injects context into the prompt, FreshLLMs advocate the idea of injecting the summary of top hits sorted by the publication date from the search. Apart from adding context, it also proposes the use of few-shot prompting that teaches the LLM how to respond based on a few examples.

FreshLLM classifies the questions into four categories:

  1. Never-changing, where the answer almost never changes.
  2. Slow-changing answers that may change over the course of several years.
  3. Fast-changing answers, such as flight status and weather, which may change multiple times.
  4. False-premise, where the questions are factually incorrect and need to be rebutted.

The authors of the paper created a dataset with 600 questions divided into the above categories. Called the FRESHQA benchmark, it involved testing a model’s ability to answer questions accurately with a human evaluation of over 50,000 judgments to assess factual correctness. The evaluation uses two modes: RELAXED, focusing on the main answer’s correctness, and STRICT, ensuring that all claims are factual and current. The study highlights the limitations of LLMs, especially with rapidly changing information and false-premise questions, and suggests that simply increasing model size doesn’t guarantee better performance. It concludes that FRESHQA presents a significant challenge for LLMs, indicating a need for further advancement.

👁 Image

The study found that pre-trained LLMs, such as T5, PaLM, GPT-3.5, and GPT-4, struggled on the FreshQA dataset. The response accuracy ranged from 0.8% to 32.0% under STRICT and 0.8% to 46.4% under RELAXED. The STRICT evaluation, which requires all information to be factual and current, causes a significant drop in accuracy for models like GPT 3.5 and GPT-4, primarily due to their inability to access real-time information, resulting in outdated or refused answers. PALM also sees a notable accuracy decrease under STRICT, often due to response artifacts and hallucinations. Conversely, FLAN-PALM and CODEX perform better, showing minimal hallucination thanks to their more concise and direct responses.

👁 Image

The authors have experimented with a technique called FRESHPROMPT, which introduces contextually relevant and up-to-date information from a search engine to a pre-trained LLM. Given a question, the method uses the question to query a search engine, retrieving all search results, including the answer box, organic results, and other useful information — such as the knowledge graph, questions and answers from crowdsourced QA platforms, and related questions that search users also ask. This information is then used to teach the LLM to reason over the retrieved evidence, improving the model’s ability to provide accurate and current responses based on few-shot prompting.

How Perplexity AI Implemented the Idea of FreshLLMs

Perplexity AI has built two online LLMs, pplx-7b-online and pplx-70b-online, which can access real-time information from the internet, enabling them to provide up-to-date and accurate responses. These models leverage open sourced models, in-house search technology, and fine-tuning to effectively use information from the web. They are designed to overcome the limitations of offline LLMs by providing responses to time-sensitive queries and offering the most relevant and valuable information. The models are publicly accessible via an API, allowing developers to integrate the technology into their applications and websites.

The model pplx-7b-online is based on mistral-7b, while pplx-70b-online is built on top of llama2-70b base model. They have been fine-tuned to effectively use snippets from the web to enhance their responses. According to Perplexity, it curates high-quality, diverse and large training sets through in-house data contractors to ensure high performance in terms of helpfulness, factuality and freshness. Additionally, the models undergo regular fine-tuning to continually improve their performance. These efforts enable the models to provide accurate, up-to-date and contextually relevant responses by leveraging real-time information from the internet.

Apart from focusing on the freshness and the current nature of the responses, Perplexity AI ensures that the models deliver helpful and factually accurate answers.

Recently, Perplexity AI announced the availability of an API to access its online models as well as other models such as mixtral-8x7b-instruct, llama-2-70b-chat and codellama-34b-instruct. The pro subscribers of Perplexity Copilot get $5 credit to use the API.

In my next article, I will walk you through a tutorial on how to build applications based on Perplexity AI’s API. Stay tuned.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.