VOOZH about

URL: https://thenewstack.io/instacart-speeds-ml-deployments-with-hybrid-mlops-platform/

⇱ Instacart Speeds ML Deployments with Hybrid MLOps Platform - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-07-08 04:00:24
Instacart Speeds ML Deployments with Hybrid MLOps Platform
AI / Microservices

Instacart Speeds ML Deployments with Hybrid MLOps Platform

Grocery delivery service Instacart recently spun up a new Machine Learning platform, called Griffin, that tripled the number of machine learning applications that the service spun up in a year.
Jul 8th, 2022 4:00am by Jessica Wachtel
👁 Featued image for: Instacart Speeds ML Deployments with Hybrid MLOps Platform

Grocery delivery service Instacart recently spun up a new Machine Learning platform, called Griffin, that tripled the number of ML applications that the service spun up in a year.

Instacart began developing its machine learning infrastructure in 2016 with Lore, an open sourced framework. After years of rapid growth leading to an increase in the amount, diversity, and complexity of ML applications, Lore’s monolithic architecture was increasingly becoming a bottleneck.

This bottleneck challenge led to the development of Griffin, a hybrid, extensible platform that supports diverse data management systems, and integrates with multiple ML tools and workflows. Sahil Khanna’s recent blog post goes into great detail about Griffin, including its benefits, components, and workflows.

Instacart relies heavily on machine learning for product and operation innovations. Such innovations don’t come easy as multiple machine learning models often must work together to provide a service. Griffin, built by the machine learning infrastructure team, now plays a foundational role in supporting the following machine learning applications and empowering innovations.

In short, Griffin offers the following benefits to the service:

  • Aids customers with locating the correct item in a catalog of over 1 billion products.
  • Supports 600,000+ shoppers with the delivery of products to millions of customers in the US and Canada.
  • Incorporates AI into Instacart’s support of their 800+ retailers across 70,000+ stores in 5,000+ cities in the US and Canada.
  • Enables 5,000+ brand partners to connect their products to potential partners.

Griffin: Instacart’s MLOps Platform

To allow Instacart to stay current with innovations in the state of the art of ML operations (MLOps) while also deploying specialized and diverse solutions, Griffin was designed as a hybrid model. Griffin allows Machine Learning Engineers (MLE) to utilize third-party solutions such as Snowflake, Amazon Web Services, Databricks, and Ray to support diverse use cases and in-house abstraction layers to provide unified access to those solutions.

Griffin was created with the main goals of helping MLEs quickly iterate on machine learning models, effortlessly manage product releases, and closely track production applications. With that in mind, the system was built with these major considerations:

  • Scalability It needs to support thousands of machine learning applications.
  • Extensibility It needs to be flexible enough to extend and integrate with a number of data management and machine learning tools.
  • Generality It needs to provide a unified workflow and consistent user experience despite broad integration with third-party solutions

The diagram below illustrates Griffin Systems Architecture.

👁 Image

The considerations are clearly illustrated in the diagram above. Griffin integrates multiple SaaS solutions including Redis, Scylla, and S3 demonstrating extensibility which supports growth at Instacart showing its scalability. The integrated interface for the MLEs shows Griffin’s generality.

Instacart can develop specialized solutions for distinct use cases (such as real-time recommendations) as a result of the four foundational concepts introduced below which are also considered distinct elements.

  • MLCLI: The in-house machine learning command-line interface that develops machine learning applications and manages the model lifecycle.
  • Workflow Manager and ML Launcher: The orchestrator that schedules and manages machine learning pipelines & containerizes task execution.
  • Feature Marketplace: This uses third-party platforms for real-time and batch feature engineering.
  • Training and Interference Platform: The framework-agnostic training and inference platform for adopting open-source frameworks.

MLCLI

MLCLI allows MLEs to customize and execute tasks such as training, evaluation, and inference in their applications within containers (Docker for example). Containerization eliminates bugs caused by variations in execution environments and provides a unified interface.

The diagram below illustrates MLCLI features used by MLE’s during ML application development.

👁 Image

Workflow Manager and ML Launcher

Workflow Manager handles the scheduling and managing of the machine learning pipelines. It leverages Airflow to schedule containers and utilizes ML Launcher, an in-house abstraction, to containerize task execution.

ML Launcher integrates third-party compute backends such as Sagemaker, Databricks, and Snowflake to perform container runs and meet unique hardware requirements for ML. Instacart chose this design because it allows for the scaling up to hundreds of Directed Acyclic Graphs (DAGs) with thousands of tasks in a short period without worrying about Airflow run time.

The diagram below illustrates the Architecture Design of Workflow Manager and ML Launcher.

👁 Image

Feature Marketplace (FM)

With data being the center of any MLOps platform, Instacart developed its FM product to support both real-time and batch engineering. FM manages feature computation, provides feature storage, supports feature discoverability, eliminates offline/online feature drift, and allows feature sharing. This product uses third-party platforms such as Snowflake, Spark, and Flint and integrates multiple storage backends, Scylla, Redis, and S3.

The diagram below illustrates the Architecture Design of Feature Marketplace.

👁 Image

Inference and Training Platform

The Inference and Training Platform allows MLEs to define the model architecture and inference routine to customize applications which allowed Instacart to triple the number of ML applications in one year. Instacart standardized package, metadata, and code management to support diversity in frameworks and ensure reliable model deployment. Some of the frameworks already adopted were Tensorflow, XGBoost, and Faiss.

The diagram below illustrates the Architecture Design of the Inference and Training Platform.

👁 Image

A Few Key Learnings

Some valuable lessons were learned during the development of Griffin.

  • Buy vs. Build Utilizing third-party solutions is important when it comes to supporting a quickly growing feature set and in avoiding reinventing the wheel. In order to benefit from seamless switching between solutions while keeping migration overhead costs down, careful platform integration is key.
  • Make Incremental Progress Prioritizing regular onboarding sessions streamlined feedback and kept the design simple. Regular hands-on codelabs and onboarding sessions encouraged early feedback and collaboration. This environment prevented engineers from going down the rabbit hole of wanting to design the “perfect” platform.
TRENDING STORIES
Jessica Wachtel is a developer marketing writer at InfluxData where she creates content that helps make the world of time series data more understandable and accessible. Jessica has a background in software development and technical journalism.
Read more from Jessica Wachtel
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Docker, Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.