VOOZH about

URL: https://thenewstack.io/linkedin-open-sources-openhouse-data-lakehouse-control-plane/

⇱ LinkedIn Open Sources OpenHouse Data Lakehouse Control Plane - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-06 06:22:09
LinkedIn Open Sources OpenHouse Data Lakehouse Control Plane
sponsor-aerospike,sponsored-topic,
Data / Open Source / Storage

LinkedIn Open Sources OpenHouse Data Lakehouse Control Plane

At the heart of OpenHouse lies its Catalog, a RESTful table service that offers secure and scalable table provisioning alongside declarative metadata management.
Mar 6th, 2024 6:22am by Steven J. Vaughan-Nichols
👁 Featued image for: LinkedIn Open Sources OpenHouse Data Lakehouse Control Plane
Feature image by Jonas from Pixabay.

Managing data lakehouses isn’t easy, so LinkedIn created and now has released as open source, OpenHouse, a control plane and interface for supervising a wide variety of data lakehouses.

It all starts with data lakes. These are cheap, open storage systems for any data type — CSV, JSON, tabular data, text, images, audio, video, JSON, CSV, etc.  A Data Lakehouse, as defined by Databricks, is an architecture that enables efficient and secure Artificial Intelligence (AI) and Business Intelligence (BI) analysis on a data lake’s data. LinkedIn‘s OpenHouse provides an open source control plane to manage tables within open data lakehouse deployments.

This control plane is made up of a declarative catalog and a suite of data services. Users can seamlessly define tables, their schemas, and associated metadata declaratively within the catalog. OpenHouse reconciles the observed state of tables with the desired state by orchestrating various data services.

LinkedIn built this because there were no other tools available that could address its issues. Its open source data lakehouse deployments are built on the foundations of compute engines such as Apache Spark, Trino, and Apache Flink;  distributed storage; and metadata catalogs/table formats, like Apache Iceberg, Delta, Hudi, Apache Hive Metastore.” That’s a lot of data in a wide variety of formats and architectures.

As LinkedIn admitted, “While functional, our current setup for managing tables is fragmented. The individual building blocks of compute engines, distributed storage, and metadata catalogs operate independently as part of an overall data plane.”

How LinkedIn Uses OpenHouse

OpenHouse was the answer. Since its inception last year, OpenHouse has been a cornerstone of LinkedIn’s data infrastructure, managing over 3,500 tables and serving more than 550 daily active users. Its impact has been profound, notably slashing the time-to-market for LinkedIn’s data build tool (dbt) implementation on managed tables by over six months and halving the end-user toil associated with data sharing. Integrating over 1,000 datasets, including those from AI and Large Language Models (LLMs), into OpenHouse.

The inspiration behind OpenHouse stemmed from the perennial struggle between control and flexibility in big data management. Traditional cloud data warehouse solutions, while ensuring governance and performance, often lack the scalability and adaptability offered by open source data lakehouse systems. OpenHouse emerges as a solution to this dilemma, providing a managed experience that liberates end-users from the intricacies of infrastructure management while empowering data infrastructure teams with enhanced control and governance capabilities.

At the heart of OpenHouse lies its Catalog, a RESTful table service that offers secure and scalable table provisioning alongside declarative metadata management. This is complemented by Data Services, which facilitates seamless table maintenance.

OpenHouse’s key features include fundamental catalog operations, retention management, governance through column tagging, and comprehensive observability tools. These features are seamlessly integrated with Apache Spark. This enables standard engine syntax, SQL queries, and the DataFrame API to execute operations efficiently.

Moreover, OpenHouse introduces advanced replication capabilities by extending the Apache Gobblin framework, ensuring high availability and consistency across geographies. Its support for Apache Iceberg as a table format further underscores its commitment to compliance and optimal performance through regular maintenance tasks.

Recognizing the importance of adaptability, OpenHouse was designed with pluggability in mind, offering interfaces for storage, authentication, authorization, database management, and job submission. This design philosophy ensures that OpenHouse can be customized to fit diverse environments, from cloud infrastructures to specific table formats.

As OpenHouse embarks on this new chapter as a BSD 2-Clause license open source project, LinkedIn invites the global community to explore its capabilities, contribute to its development, and provide feedback. The company is particularly focused on understanding how OpenHouse performs in various settings and is committed to addressing technical challenges as it transitions from Apache Hive to OpenHouse.

Aerospike is the real-time database built for infinite scale, speed, and savings. Our customers are ready for what’s next with the lowest latency and the highest throughput data platform. Cloud and AI-forward, we empower leading organizations like Adobe, Airtel, Criteo, Experian, and PayPal.
Learn More
The latest from Aerospike
TRENDING STORIES
Steven J. Vaughan-Nichols, aka sjvn, has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast internet connection, WordStar was the state-of-the-art word processor, and we liked it.
Read more from Steven J. Vaughan-Nichols
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.