![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Building AI agents is hard. You’ll struggle with hallucinations, keeping the agents on track and navigating them to use the right tools.
One way to overcome these problems is to give agents code-execution capabilities.
Here are some reasons why your AI agent should have a code interpreter.
Agents with code interpreters gain powers like performing a statistical analysis of CSV files or plotting charts.
When you ask different agents for the same thing, it becomes evident how much those with an underlying code interpreter differ. The following tasks are almost impossible to finish without running code:
See how Perplexity (an agent without a code interpreter) deals with a data analysis task. Even when provided a data file, the agent cannot finish the task — the best it can do is provide advice on what code I should run.
Here is how ChatGPT with an underlying code interpreter would deal with the same task…
… including the installation of new packages and generating a chart.
Note that the end users don’t need to be aware that the app carries out coding tasks behind the scenes since the primary objective (like “book me a flight”) often doesn’t revolve around coding.
Large language models (LLMs) are great at generating text but struggle with reasoning and complex thinking.
Google’s team made an interesting parallel from the famous book “Thinking, Fast and Slow” by Daniel Kahneman. The ability to execute code equips agents with slow thinking (effortful, logical and calculating) versus fast thinking (intuitive and automatic), and is represented by how agents act without a code interpreter.
In their analogy, agents relying purely on LLMs can be thought to operate without slow thinking, quickly producing text without a deeper thought. Below is an example of how even simple tasks might require some system and cannot be answered just intuitively.
A recent paper confirmed that LLMs are hallucinating on multistep tasks even when given reasoning prompts. As a follow-up to the findings from the paper, a software engineer demonstrated how using a code-interpreter-style LLM engine successfully reduces hallucinations by an order of magnitude. He found that code interpreters can reduce the GPT-4 hallucination rate from <10% to <1%.
Code interpreters can handle uploads and downloads, write code to look up data from source files and arrive at conclusions instead of reasoning freestyle like simpler agents usually do.
Other ways to battle LLM hallucinations include RAG, fine-tuning and increasing the size of LLM context windows.
Another big challenge is the LLM code generation. When an agent can not only generate but also run code, it’s able to test the functioning of its own output and iterate on it.
I think we will see code interpreters powering even more AI agents and apps as a part of the new ecosystem being built around LLMs, where a code interpreter represents a crucial part of an agent’s brain. For inspiration to build, see popular open source products like Open Interpreter or AutoGen.
There are still challenges to overcome, such as finding a secure and optimal way to run the LLM-generated code, which can be solved by executing the processes in an isolated cloud environment.