VOOZH about

URL: https://thenewstack.io/fivetran-brings-data-lake-interoperability-to-google-cloud/

⇱ Fivetran Brings Data Lake Interoperability to Google Cloud - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-04-10 06:45:23
Fivetran Brings Data Lake Interoperability to Google Cloud
sponsor-fivetran,sponsored-event-coverage,
Cloud Native Ecosystem / Databases / Storage

Fivetran Brings Data Lake Interoperability to Google Cloud

Fivetran brings open table formats and native integration to the BigQuery Metastore catalog to deliver compliant, AI-ready data into the Google Cloud ecosystem.
Apr 10th, 2025 6:45am by Charles Humble
👁 Featued image for: Fivetran Brings Data Lake Interoperability to Google Cloud
Featured image by Getty Images for Unsplash+.
Fivetran sponsored this post.

Alongside data lake support for Microsoft Fabric, data integration vendor Fivetran expanded its Managed Data Lake Service to support Google’s Cloud Storage (GCS), following previous launches on AWS and Azure. The Fivetran Managed Data Lake Service, which the vendor launched last year, automatically converts data into open table formats, specifically Apache Iceberg and Delta Lake, and facilitates interoperability with popular query engines and metadata catalogs.

In announcing the new service at Google Cloud Next in Las Vegas, Fivetran says it has around 4,000 joint customers with Google, and it is already onboarding Google’s Cloud Storage customers.

Anjan Kundavaram, chief product officer at Fivetran, said in an interview with The New Stack that Fivetran has native integration with Google’s BigQuery metastore. This ensures that data in GCS is automatically cataloged in BigQuery’s metastore, improving governance and interoperability across Google’s data ecosystem. “Customers who are used to Google BigQuery really can’t tell the difference between a BigQuery interaction and an Iceberg query running on Google’s Cloud Storage with the Fivetran Managed Data Lake Service,” he said.

What Is a Data Lake?

Unlike a data warehouse, which stores data in an ACID-compliant system (i.e., one that has atomicity, consistency, isolation and durability), a traditional data lake is a system or repository of data stored in a raw format, usually as object blobs or files. The goal is to have a single store of data, including raw copies of source system data, sensor data and social data.

The term “data lake” was coined in 2010 by then-chief technology officer James Dixon’s team at Pentaho. Dixon wrote that he wanted a term distinct from “data mart,” which is a smaller repository of interesting attributes derived from raw data.

To add to the terminology confusion, the term “data lakehouse” is often used somewhat interchangeably with “data lake.” Strictly speaking, a data lakehouse is a hybrid approach; like a data lake, it can ingest a wide variety of raw data formats, but it also supports ACID transactions like a data warehouse does. However, a modern data lake leverages open table formats, which store data in an ACID-compliant manner to bring data warehouse-like functionality to data lakes.

Data lakes can be tricky to manage, especially when not actively maintained, and consequently are sometimes derogatively called “data swamps.” In a 2014 report from PwC, Sean Martin, CTO of Cambridge Semantics, said, “We see customers creating big data graveyards, dumping everything into the Hadoop Distributed File System and hoping to do something with it down the road. But then they just lose track of what’s there. The main challenge is not creating a data lake but taking advantage of the opportunities it presents.”

How GenAI Is Boosting Data Lakes

This perhaps explains why data lakes seemed to fall briefly out of favor. However, Kundavaram suggested that generative AI (GenAI) has been a catalyst for a new wave of data lake-based initiatives. This, he said, is because “for agents or RAG [retrieval-augmented generation], you really want all your data, structured and unstructured, in one place.”

Fivetran has a partnership with OpenAI, the company that has — for better or worse — become the poster child for the tidal wave of hype around GenAI. “OpenAI has the same data pipeline problem that everyone has, though probably at a larger scale,” Kundavaram said. “We’ve been close partners with them, supporting their use case and innovating alongside [them].”

Along with its ability to handle both structured and unstructured data from multiple sources, Kundavaram offered two additional reasons a data lake is the best approach for GenAI projects: future-proofing and cost. “It’s built on open standards, and if you want to use any number of querying tools like Google, Snowflake or Databricks, you can,” he said. “It is also very cost effective since you don’t need to make copies of data and customers experience significant savings on ingestion costs.”

More generally, Fivetran said that companies including Disney, Sonos, Workday and PWC are turning to managed data lakes as they look to centralize high volumes of structured and unstructured data for AI workloads.

Given the renewed interest in data lakes, I was curious why Fivetran hasn’t launched a data lake product before now. Building a new product inevitably takes time and considerable engineering investment, of course, but Kundavaram said that the open table formats — particularly Apache Iceberg — also needed time to become sufficiently well-developed. “It’s matured quite a bit in the last couple of years,” he said.

Landscape, Pricing and Outlook

Data integration is a highly competitive space. Among dozens of vendors, major players include Microsoft with Azure Data Factory, SQL Server Integration Services and Power Query for data integration, and Microsoft Fabric as its main data platform; Informatica has its Intelligent Data Management Cloud; and Oracle has Oracle Cloud Infrastructure, Oracle GoldenGate and Oracle Data Integrator.

To win customers, Fivetran needs an edge. A core strength is its 700+ connector ecosystem. It continues to invest heavily here, adding about 60 to 70 new connectors per quarter, Kundavaram said. The vendor’s Powered by Fivetran program enables its customers to embed Fivetran connectors into their own applications, and a Connector SDK enables partners to create custom connectors as needed. By leveraging this, enterprises can centralize large volumes of data in Google Cloud Storage, creating a foundation for training custom large language models (LLMs).

Fivetran includes a number of data governance capabilities, such as role-based access control (RBAC), data encryption, and column blocking and hashing. In addition, its Hybrid Deployment model can be used to keep the data plane and all pipelines within the customer’s own secure network.

“We have a lot of customers with sensitive data who run our product using Hybrid Deployment,” Kundavaram said. “This ensures that only functional metadata gets shared back to our control plane, while no data leaves their environment.”

When compared to its larger competitors, Fivetran’s takes a different approach to data transformation. The vendor offers a simpler set of around 55 dbt Core-compatible Quickstart data models for its most popular connectors, including Marketo, Mixpanel, Salesforce and SAP. Around 40% of its customers use these when setting up the source integration, Kundavaram said, and land “transformed, analytics-ready tables in the destination.” Alternatively, customers can build their own dbt models, which Fivetran can schedule and manage.

Fivetran is venture-funded, and in its most recent funding round (in 2021), it announced a Series D round of $565 million, valuing the company at $5.6 billion. In September 2024, Fivetran announced it had surpassed $300 million in annual recurring revenue, up from $200 million in 2023, although these figures have not been audited according to the rules of public companies.

Historically, small and midsize businesses (SMBs) have been Fivetran’s focus but, aided by its acquisition of HVR in 2021 alongside its Series D funding round, the vendor has expanded its reach beyond the midmarket segment. Pfizer, for example, uses Fivetran “to support scalable analytics platforms and enable real-time analytics, which is particularly crucial in areas such as clinical trials and supply chain operations,” according to a Fivetran case study.

From a pricing perspective, Fivetran is consumption-based in a tiered model, based on monthly active rows processed. This approach allows SMB customers to start their projects without securing significant upfront capital expenditure and larger enterprises to better manage costs even as volumes scale.

Learn more about Fivetran’s Managed Data Lake Service for Google’s Cloud Storage.

Fivetran, the global leader in data movement, empowers companies like OpenAI, Pfizer, and Morgan Stanley, to power analytics and AI and achieve transformative business outcomes.
Learn More
TRENDING STORIES
Charles Humble is a former software engineer, architect and CTO who has worked as a senior leader and executive of both technology and content groups. He was InfoQ’s editor-in-chief from 2014-2020, and was chief editor for Container Solutions from 2020-2023....
Read more from Charles Humble
Fivetran sponsored this post.
SHARE THIS STORY
TRENDING STORIES
AWS, Oracle and Snowflake are also sponsors of The New Stack.
TNS owner Insight Partners is an investor in: OpenAI, Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.