![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
A new paradigm in AI tooling has emerged as AI’s focus shifts from ground-up model development to enabling software engineers and developers to build AI applications quickly and at scale. This is embodied in the “AI stack,” a comprehensive collection of integrated tools, solutions and components designed to streamline the development and management of AI applications.
This has happened very quickly, along with AI’s recent evolution. As the AI stack is largely built upon existing technologies, it’s worth looking at how it emerged before breaking down its components and outlook.
At the risk of sounding reductive, traditional AI development involves building regression and classification models using techniques like random forests, decision trees and neural networks. The typical stack includes data manipulation tools like Pandas, machine learning libraries like Scikit-Learn, and deep learning frameworks like TensorFlow (Keras), PyTorch and Caffe. There are also experimentation tracking tools such as TensorBoard, MLflow and Neptune.ai. Finally, deployment tools promote the trained models into production, and consumer access to these models is granted via an inference API endpoint.
Arguably, there hasn’t been an established go-to stack. Sure, research and development teams had preferred tools to complete specific tasks, but there hasn’t been a default stack. Researchers and practitioners commonly fell into one of two buckets, either TensorFlow or PyTorch.
Until recently, there was a wide gap between research outcomes and industry applications.
This began to change with AlexNet’s breakthrough in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which led to increased optimism in the field of computer vision. While significant progress was made in image classification, researchers recognized that complex tasks like object detection still faced considerable challenges. Moreover, the path from research breakthroughs to widespread real-world applications proved more difficult than initially anticipated.
In other words, introducing a research breakthrough didn’t translate to rapid adoption due to the complexities of translating cutting-edge research into practical, scalable and efficient applications suitable for consumer devices. Factors like computational requirements, lack of standardized tools and frameworks and the need for significant adaptation and optimization for different hardware platforms all contributed to this delay.
The transformer architecture, introduced in 2017 in the paper “Attention Is All You Need,” was pivotal. It enabled development of large-scale, general-purpose language models that could process vast amounts of data across diverse domains. It overcame the limitations of previous recurrent neural network and long short-term memory architectures.
The first generation of GPT (generative pretrained transformer) was released in 2018, GPT-2 in 2019 and GPT-3 in 2020. In between the GPT releases, other foundation models — along with their code — were introduced through research papers. These models found their way into creative web apps in a span of months or even weeks.
As the gap between research and industry closed, new tools enabled researchers to build and experiment with novel neural networks. And in turn, innovations in the application and system infrastructure allowed engineers and developers to harness these powerful new models in real-life applications.
The shift in the AI domain is now focused on finding ways to empower developers to build AI applications quickly and at scale. This is an important consideration because today’s AI stack is not focused on the significant but relatively solved task of developing and deploying AI models, but on implementing, optimizing, evaluating and monitoring AI applications and systems.
Nowadays, new research in foundation model development is published frequently, and tooling for large language model (LLM) applications emerges monthly. More importantly, modern AI applications can take advantage of research findings almost immediately. What’s more, data is abundant and computing devices are increasingly powerful, enabling the full application of scaling laws in the generative AI (GenAI) domain.
The landscape of LLM provisioning has evolved towards a bifurcated ecosystem. Open source initiatives increasingly focus on releasing model weights and hyperparameters, facilitating fine-tuning rather than architectural modifications. Proprietary models abstract implementation details behind REST APIs. This paradigm shift optimizes for domain-specific adaptability in open models and seamless integration in closed systems, reflecting a nuanced approach to balancing model accessibility with deployment efficiency.
As noted, the modern AI stack is an integrated collection of tools, solutions and components that enables engineers and developers to build AI apps with generative capabilities like audio, image and text generation at scale. It comprises programming languages, model providers, LLM frameworks, vector databases, operational databases, monitoring and evaluation tools, and deployment solutions.
The AI stack infrastructure uses parametric knowledge from the foundation models and non-parametric knowledge from information sources like PDFs, databases and search engines to conduct GenAI functionalities.
Key components of the AI stack include:
The modern AI stack represents an evolution from the fragmented tooling landscape of traditional machine learning to a more cohesive and specialized ecosystem optimized for the era of LLMs and GenAI. This stack is engineered to tackle modern AI applications’ unique and intriguing challenges like handling massive language models, managing vector embeddings and orchestrating complex AI workflows in retrieval-augmented generation (RAG) pipelines or agentic systems, which leverage LLMs and their function-calling capabilities to perform tasks autonomously.
👁 The POLM (Python, OpenAI, LlamaIndex/LangChain, and MongoDB) AI stack
The POLM (Python, OpenAI, LlamaIndex/LangChain and MongoDB) AI stack is a collection of tools, solutions and frameworks implemented with the Python programming language. It is composed to enable the efficient development of modern AI applications, handle the unstructured nature of AI-related data and meet the real-time demands of modern applications. Its components include:
The POLM AI stack provides a framework for developing GenAI applications, facilitating the transition from proof-of-concept to production-ready systems. Its architecture, designed for RAG applications and agent-based systems, uses unified data models and language consistency to enhance development efficiency.
A key component of the POLM stack is its document-based data model, which aligns with the often unstructured nature of AI-generated data. This model allows efficient storage and retrieval of complex, nested data structures common in AI applications, such as conversation histories, embedding vectors and agent tool definitions.
The POLM stack offers several features that can enhance the development process of GenAI applications:
This tutorial provides step-by-step instructions using LangChain as the LLM orchestrator and framework. It demonstrates how to implement a RAG system with memory and semantic cache, which are mature components of RAG systems. Meanwhile, this second tutorial presents the POLM stack using LlamaIndex as the LLM framework.
The AI landscape is evolving from RAG-enabled chatbots to agentic systems with human-in-the-loop interfaces. This progression is driven by the growing capabilities of foundation models, including tool use and improved reasoning.
LLM abstraction frameworks like LangChain and LlamaIndex are expanding to support agent-based architectures. Meanwhile, specialized libraries like CreAI and AutoGen are emerging to facilitate multiagent system development. This stack also incorporates new components, like Tavily for real-time search, which enhance AI agents’ ability to gather data and make decisions.
While specialized libraries for agentic system development have emerged, the core components of the AI stack remain largely unchanged for these systems. However, the database infrastructure for agentic systems requires more versatility, handling conversational history, operational data and vector embeddings, as well as features like semantic caching and data streaming.
As AI applications continue to mature, agentic systems will necessitate further advancements in AI tools, particularly in databases that can perform a wide array of functions to support these complex, dynamic systems. Here is where we can expect the next wave of innovation.