VOOZH about

URL: https://www.analyticsvidhya.com/blog/2023/05/deploying-large-language-models-in-production-llmops-with-mlflow/

⇱ Deploying Large Language Models in Production - Analytics Vidhya


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Deploying Large Language Models in Production: LLMOps with MLflow

Gayathri Nadella Last Updated : 03 Jun, 2025
5 min read

Introduction

Large Language Models (LLMs) are now widely used in a variety of applications, like machine translation, chat bots, text summarization , sentiment analysis , making advancements in the field of natural language processing (NLP). However, it is difficult to deploy and manage these LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.

MLflow is an opensource platform that provides set of tools for tracking experiments, packaging code, and deploying models in production. Centralized model registry of MLflow simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.

πŸ‘ Large Language Models | LLMs | MLflow

Learning Objectives

  • Understand the challenges involved in deploying and managing LLMs in production environments.
  • Learn how MLflow can be used to solve the challenges in deploying the Large language models in production environments there by implementing LLMOps.
  • Explore the support for popular Large Language Model libraries such as – Hugging Face transformers, OpenAI, and Lang Chain.
  • Learn how to use MLflow for LLMOps with practical examples.

This article was published as a part of the Data Science Blogathon.

Challenges in Deploying and Managing LLMs in Production Environments

The following factors make managing and deploying LLMs in a production setting difficult:

  1. Resource Management:  LLMs need a lot of resources, including GPU, RAM, and CPU, to function properly. These resources can be expensive and difficult to manage.
  2. Model Performance: LLMs can be sensitive to changes in the input data, and their performance can vary depending on the data distribution. Ensuring that the good model performance in a production environment can be challenging.
  3. Model Versioning: Updating an LLM can be challenging, especially if you need to manage multiple versions of the model simultaneously. Keeping track of model versions and ensuring that they are deployed correctly can be time-consuming.
  4. Infrastructure: Configuring the infrastructure for deploying LLMs can be challenging, especially if you need to manage multiple models simultaneously.
πŸ‘ MLOps | Large Language Models | LLMs | MLflow

How to Use MLflow for LLMOps?

MLflow is an open-source platform for managing the machine learning lifecycle. It provides a set of tools and APIs for managing experiments, packaging code, and deploying models. MLflow can be used to deploy and manage LLMs in production environments by following the steps:

  1. Create an MLflow project: An MLflow project is a packaged version of a machine learning application. You can create an MLflow project by defining the dependencies, code, and config required to run your LLM.
  2. Train and Log your LLM: You can use TensorFlow, PyTorch, or Keras to train your LLM. Once you have trained your model, you can log the model artifacts to MLflow using the MLflow APIs.If you are using a pre trained model you can skip the training step.
  3. Package your LLM: Once you have logged the model artifacts, you can package them using the MLflow commands. The MLflow can create a Python package that includes the model artifacts, dependencies, and config required to run your LLM.
  4. Deploy your LLM: You can deploy your LLM using Kubernetes, Docker, or AWS Lambda. You can use the MLflow APIs to load your LLM and run predictions.

Hugging Face Transformers Support in MLflow

It is a popular open-source library for building natural language processing models. These models are simple to deploy and manage in a production setting due to MLflow’s built-in support for them.To use the Hugging Face transformers with MLflow, follow these steps:

  • Install MLflow and transformers: Transformers and MLflow installation can be done using Pip.
!pip install transformers
!pip install mlflow
  • Define your LLM: The transformers library can be used to define your LLM, as shown in the following Python code:
import transformers
import mlflow

chat_pipeline = transformers.pipeline(model="microsoft/DialoGPT-medium")
  • Log your LLM: To log your LLM to MLflow, use the Python code snippet below:
with mlflow.start_run():
 model_info = mlflow.transformers.log_model(
 transformers_model=chat_pipeline,
 artifact_path="chatbot",
 input_example="Hi there!"
 )
  • Load your LLM and make predictions from it:
# Load as interactive pyfunc
chatbot = mlflow.pyfunc.load_model(model_info.model_uri)
#make predictions
chatbot.predict("What is the best way to get to Antarctica?")
>>> 'I think you can get there by boat'
chatbot.predict("What kind of boat should I use?")
>>> 'A boat that can go to Antarctica.'

Open AI Support in MLflow

Open AI is another popular platform for building LLMs. MLflow provides support for Open AI models, making it easy to deploy and manage Open AI models in a production environment. Following are the steps to use Open AI models with MLflow:

  • Install MLflow and Open AI: Pip can be used to install Open AI and MLflow.
!pip install openai
!pip install mlflow
  • Define your LLM: As shown in the following code snippet, you can define your LLM using the Open AI API:
from typing import List
import openai
import mlflow

# Define a functional model with type annotations

def chat_completion(inputs: List[str]) -> List[str]:
 # Model signature is automatically constructed from
 # type annotations. The signature for this model
 # would look like this:
 # ----------
 # signature:
 # inputs: [{"type": "string"}]
 # outputs: [{"type": "string"}]
 # ----------

 outputs = []

 for input in inputs:
 completion = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "<prompt>"}]
 )

 outputs.append(completion.choices[0].message.content)

 return outputs
  • Log your LLM: You can log your LLM to MLflow using the following code snippet:
# Log the model
mlflow.pyfunc.log_model(
 artifact_path="model",
 python_model=chat_completion,
 pip_requirements=["openai"],
)

Lang Chain Support in MLflow

Lang Chain is a platform for building LLMs using a modular approach. MLflow provides support for Lang Chain models, making it easy to deploy and manage Lang Chain models in a production environment. To use Lang Chain models with MLflow, you can follow these steps:

  • Install MLflow and Lang Chain: You can install MLflow and Lang Chain using pip.
!pip install langchain
!pip install mlflow
  • Define your LLM: The following code snippet demonstrates how to define your LLM using the Lang Chain API:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Translate everything you see after this into French:

{input}"""

prompt = PromptTemplate(template=template, input_variables=["input"])

llm_chain = LLMChain(
 prompt=prompt,
 llm=HuggingFaceHub(
 repo_id="google/flan-t5-small",
 model_kwargs={"temperature":0, "max_length":64}
 ),
)
  • Log your LLM: You can use the following code snippet to log your LLM to MLflow:
mlflow.langchain.log_model(
 lc_model=llm_chain,
 artifact_path="model",
 registered_model_name="english-to-french-chain-gpt-3.5-turbo-1"
)
  • Load the model: You can load your LLM using the below code.
#Load the LangChain model

import mlflow.pyfunc

english_to_french_udf = mlflow.pyfunc.spark_udf(
 spark=spark,
 model_uri="models:/english-to-french-chain-gpt-3.5-turbo-1/1",
 result_type="string"
)
english_df = spark.createDataFrame([("What is MLflow?",)], ["english_text"])

french_translated_df = english_df.withColumn(
 "french_text",
 english_to_french_udf("english_text")
) 

Conclusion

Deploying and managing LLMs in a production environment can be challenging due to resource management, model performance, model versioning, and infrastructure issues. LLMs are simple to deploy and administer in a production setting using MLflow’s tools and APIs for managing the model lifecycle. In this blog, we discussed how to use MLflow to deploy and manage LLMs in a production environment, along with support for Hugging Face transformers, Open AI, and Lang Chain models. The collaboration between data scientists, engineers, and other stakeholders in the machine learning lifecycle can be improved by using MLflow.

πŸ‘ MLflow | Hugging Face | OpenAI | LangChain

Some of the Key Takeaways are as follow:

  1. MLflow deploys and manages LLMs in a production environment.
  2. Hugging Face transformers, Open AI, and LangChain models support in MLflow.
  3. Resource management, model performance, model versioning, and infrastructure issues can be challenging when deploying and managing LLMs in a production environment, but MLflow provides a set of tools and APIs to help overcome these challenges.
  4. MLflow provides a centralized location for tracking experiments, versioning models, and packaging and deploying models.
  5. MLflow integrates for ease to use with existing workflows.

Read more: Build NLP Applications with Hugging Face

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Gayathri is an aspiring AI leader and a highly skilled data scientist with over 11 years of experience in leveraging data to drive business outcomes. She has deep expertise in NLP, Computer vision, Machine learning and AI and a proven track record of delivering insights and recommendations that have helped organizations make informed decisions and deliver real business value. With a strong background in both technical and business domains, she is adept at communicating complex data-driven findings in a clear and concise manner.
As a data scientist manager, innovator, and researcher, she has led cross-functional teams of data scientists and engineers to deliver high-quality data-driven insights and solutions to our clients an excellent communicator and team player, a mentor, and has the ability to translate complex technical concepts into plain language for business stakeholders.
As a technical architect, she has designed and implemented, deployed, and maintained AI solutions to enable organizations to leverage their data effectively.

Her experience has taught her that the most important aspect of data science is not just technical expertise, but the ability to work closely with business stakeholders to understand their needs and deliver solutions that meet their business objectives. She always strives to stay at the forefront of the latest data science and technology advancements, and is always eager to learn and grow as a professional.

During the free time, she enjoys reading about the latest advancements in data science and technology.

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Master AI interview questions with expert answers.

LLMOps in Action: Build, Deploy & Scale RAG-Powered AI Systems

Master LLMOps to build, deploy, and scale RAG AI systems

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Deploy a fastapi machine learning model with XGBoost and Docker APIs.

Building a Deep Research AI Agent

Build a Research & Report Agent with LangGraph & OpenAI for under $1!

Build Data Pipelines with Apache Airflow

Learn ETL pipeline building and workflow orchestration with Airflow.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner