Databricks CustomerLake is a warehouse-native agentic customer data platform that Databricks announced on June 16, 2026 at its Data + AI Summit in San Francisco â the company’s formal entry into the marketing software market. Rather than store another copy of your customer data, CustomerLake runs the CDP directly on the Databricks lakehouse, governed by Unity Catalog alongside the AI models and agents that act on that data.
That architectural choice is the whole story. For a decade the CDP category sold a separate system that ingested, unified, and re-stored customer data so marketers could act on it. CustomerLake argues the warehouse already holds the data, the governance, and now the agents — so the CDP should be a front door on the lakehouse, not a second database next to it. The platform is in Private Preview, not generally available, with a handful of named early adopters and no disclosed pricing rates.
This guide covers what actually launched, why warehouse-native architecture matters, the agentic loop Databricks calls Infinity Campaigns, the market context that makes the move land, and the honest open questions a marketing leader should weigh before treating this as an infrastructure decision. Everything below is sourced to Databricks’ own launch materials and independent martech coverage, with vendor claims labelled as such.
- 01Databricks entered martech with a warehouse-native CDP.Announced June 16, 2026 at Data + AI Summit, CustomerLake runs the CDP natively on the Databricks lakehouse — no separate data store — governed by Unity Catalog. It is the company's second software application after Lakewatch (March 2026).
- 02It is Private Preview, not generally available.Named early adopters include HP, Circle K, AB InBev (Zé Delivery), and Getnet by Santander. These are vendor-curated early adopters, not GA case studies — and Databricks has disclosed no pricing rates beyond a consumption-based model.
- 03Agents replace the campaign waterfall.Profile Agents unify Customer 360 data via Agentic Identity Resolution; Campaign Agents build audiences, recommend next-best actions, and activate across channels in what Databricks calls Infinity Campaigns — continuous loops rather than plan-build-ship-measure batches.
- 04The composable CDP movement built the on-ramp.Vendors like Hightouch and GrowthLoop spent years arguing the warehouse should be the CDP's source of truth. They made that case so well that the warehouse vendor built its own front end — an irony at the centre of this launch.
- 05Gartner sees the architecture, not the product, winning.Gartner predicts that by 2030, 80% of net-new enterprise CDP deployments will be embedded in or composable with data platforms, and advises CMOs to treat CustomerLake as an infrastructure decision. The figure sits behind a paywall but is widely cited.
01 — What LaunchedA CDP that lives inside the lakehouse.
At Data + AI Summit 2026 — a conference Databricks says drew more than 30,000 in-person attendees to the Moscone Center, with tens of thousands more joining virtually from 150-plus countries — the company unveiled CustomerLake, its first move into the marketing software category. It follows Lakewatch, the security lakehouse Databricks shipped in March 2026, and marks the second time the data-platform vendor has packaged a vertical application on top of its core lakehouse.
The defining claim is that CustomerLake is built natively on the Databricks lakehouse and governed by Unity Catalog. There is no separate CDP data store: customer data, AI models, and the agents that act on them co-reside in one governed platform. Two agent families do the work — Profile Agents for data unification and Campaign Agents for activation — and a Genie natural-language interface lets marketers query governed customer data without writing SQL or filing a BI request.
Profile Agents
Build a unified Customer 360 via Agentic Identity Resolution (AIR) — combining deterministic matching, probabilistic matching, and LLM-assisted edge-case resolution, with a continuous human-review feedback loop. (Vendor-stated; no independent identity benchmark exists.)
Campaign Agents
Build audiences, recommend next-best actions, activate across channels, and continuously optimize — replacing the traditional plan → build → ship → measure sequence with a continuous loop Databricks calls an Infinity Campaign.
CustomerLake also ships with Lakehouse Federation, which Databricks says enables cross-platform queries across Databricks, Snowflake, BigQuery, and operational databases without duplicating data — a nod to the reality that few enterprises run a single warehouse. The scale claims around what those agents can do are firmly vendor-stated and worth reading with care: Databricks says the platform is designed to deliver 1:1 personalized experiences at very large scale, a figure that has no independent verification and should be treated as aspirational rather than benchmarked.
02 — ArchitectureWhy warehouse-native changes the math.
A traditional CDP is a second system of record. It ingests events and attributes from your sources, stitches identities, stores a unified profile, and then ships segments out to channels. Every step in that chain is a copy, a sync lag, and a governance boundary the data crosses. The composable CDP movement attacked the first problem — the extra copy — by activating directly from the warehouse. Databricks attacks the rest of the chain by putting the activation layer, the models, and the governance in the same place the data already lives.
The governance point is the one marketers underrate. Because Unity Catalog governs the data, the AI models, and the agents as one surface, lineage and access control do not stop at the CDP boundary — they extend to the agent that built the audience and the model that scored it. For regulated industries, that single governed plane is a materially different posture than a CDP that holds its own copy of customer data under its own access controls.
Separate CDP databases
CustomerLake holds no separate copy of customer data. The CDP reads and writes against the lakehouse directly, so there is no second system of record to keep in sync. (Vendor-stated architecture.)
Unity Catalog plane
Data, AI models, and agents are governed together under Unity Catalog — lineage and access control extend across the unification, scoring, and activation steps rather than stopping at a CDP boundary.
Query surfaces
Lakehouse Federation reaches across Databricks, Snowflake, BigQuery, and operational databases without duplicating data — acknowledging that most enterprises run more than one warehouse.
03 — Infinity CampaignsFrom the campaign waterfall to a continuous loop.
The agentic framing is more than branding. Databricks positions Infinity Campaigns as a replacement for the batch campaign waterfall — the plan, build, ship, and measure sequence that has defined lifecycle marketing for years. Instead of a marketer queuing a segment, building creative, shipping a send, and reading the report a week later, Campaign Agents are meant to analyze, decide, and activate against every customer in a continuous loop, with the human setting goals and guardrails rather than operating the machinery.
Marketing stops being a series of campaigns and becomes a continuous loop — agents that constantly analyze, decide, and act on every customer in real time.— Ali Ghodsi, Co-founder & CEO, Databricks
The honest read is that this is a vision statement attached to a Private Preview, not a measured outcome. The continuous-loop model is genuinely different from batch marketing, but its value depends entirely on whether the agents make good decisions on real customer data at scale — and no independent evaluation of that exists yet. The two-sided framing Databricks puts around it is the more durable idea: marketers will increasingly both use agents internally and need to market to their customers’ AI agents, the ones researching products on a buyer’s behalf. The legacy CDP category was built for a world where a human always sat at the other end of the message. That assumption is the one quietly breaking.
Batch waterfall
A marketer queues a segment, builds creative, schedules a send, and reads the report afterward. Each cycle is discrete, sequential, and slow to react to what the data is telling you mid-flight.
Infinity Campaign
Campaign Agents build audiences, recommend next-best actions, and activate across channels in a loop. The human sets goals and guardrails; the agents run the cycle. (Vendor-described; results unproven at this stage.)
04 — The RunwayThe composable CDP movement built Databricks’ on-ramp.
The most interesting dynamic in this launch is one most coverage skipped: the composable CDP category inadvertently paved the runway for CustomerLake. For years, vendors like Hightouch and GrowthLoop built their entire pitch on a single argument — the warehouse should be the CDP’s source of truth, and activation should happen from there rather than from a separate copy. They were persuasive enough that more than a quarter of CDPs now support a warehouse-centric architecture. The unintended consequence is that they taught the market to want exactly the thing only a warehouse vendor can build best: a CDP that is the warehouse.
Databricks reinforced the build-over-buy signal in how it staffed the effort. Rather than acquire an existing CDP, it recruited founding teams from ActionIQ and Census to build CustomerLake in house — a deliberate choice to own the architecture rather than bolt a packaged product onto the lakehouse. When the data layer decides to ship its own activation front end, the standalone vendors that spent a decade arguing the warehouse should be central find themselves competing against the warehouse itself.
05 — ComparisonThree CDP archetypes, side by side.
Most CDP comparisons stop at two options — packaged versus composable — and assume you need one of them. The table below adds the third tier CustomerLake represents and reads eight operational dimensions across all three archetypes. The legacy column reflects the Segment / mParticle packaged model, the composable column the Hightouch / GrowthLoop warehouse-activation model, and the agentic column CustomerLake as Databricks describes it. Cells in the agentic column are vendor-stated and Private Preview; read them as architecture intent, not proven capability.
| Dimension | Legacy packaged | Composable | Warehouse-native agentic |
|---|---|---|---|
| Data residency | Separate CDP data store — a copy outside the warehouse | Lives in your warehouse; CDP reads from it | Native to the lakehouse — no separate store at all |
| Identity resolution | Deterministic + probabilistic rules, batch-run | Warehouse SQL models you build and maintain | Agentic Identity Resolution (AIR) — deterministic, probabilistic, plus LLM-assisted edge cases with human review |
| Governance layer | The CDP's own access controls | Warehouse permissions you wire up yourself | Unity Catalog governs data, models, and agents together |
| Personalization model | Batch segments shipped to channels on a schedule | Reverse-ETL syncs audiences from warehouse to tools | Continuous agentic loops Databricks calls Infinity Campaigns |
| Pricing model | Per-platform software license, often per-profile | License plus your own warehouse compute spend | Consumption-based — no rates disclosed at launch |
| Developer / marketer split | Marketer-led UI; limited engineering touch | Engineer-heavy — data team builds the models | Genie natural-language interface lets marketers query without SQL |
| Third-party enrichment | Built-in connectors to data brokers | Whatever you pipe into the warehouse | Launch partners (e.g. IAS) connect via clean rooms — no third-party cookies |
| Availability today | Generally available, mature | Generally available across multiple vendors | Private Preview only — not yet GA |
Reading down the agentic column, the pattern is consistent: every row collapses a boundary the previous two archetypes preserved. The separate store disappears, governance becomes one plane, and identity resolution gains an LLM-assisted layer on top of the deterministic and probabilistic matching the category already used. The honest asterisk on the whole column is availability — it reads as the most consolidated architecture precisely because it is the least proven, still in Private Preview with no GA date or disclosed rates. For a grounding in the packaged-versus-composable trade-offs before you weigh the third tier, our CDP build-buy-or-skip decision matrix walks the maturity signals that point to each path, and the customer data platform fundamentals guide covers what a CDP actually does before the architecture debate.
06 — Market ContextA growing market with a utilization problem.
CDP market sizing varies widely by analyst firm, so the honest move is to cite the range rather than a single headline. For 2026, Mordor Intelligence puts the market at about $4.58B, Fortune Business Insights at $4.07B, and MarketsandMarkets at $9.72B — the last projecting growth to $37.11B by 2030 at a 30.7% CAGR. The CDP Institute counts roughly 208 active vendors as of July 2025, with a concentrated core where a small group of large vendors accounts for about 67% of CDP employment and 73% of total funding.
The more telling number is who is growing. Composable, warehouse-native vendors grew at about 7.8% organic employment growth — nearly six times the 1.3% industry average — and more than a quarter of CDPs now support warehouse-centric architecture. That is the trend line CustomerLake steps onto, not against.
Where CDP employment is growing · warehouse-native vs industry average
Source: CDP.com industry statistics, retrieved June 17, 2026But adoption is not the same as use, and this is the paradox at the heart of the category. The scorecard below pulls the gap together: 41% of companies have implemented a CDP, yet only 22% of marketers report high utilization, and organizations estimate they use roughly 47% of available capabilities. The question CustomerLake does not yet answer is whether moving the CDP inside the warehouse fixes that adoption problem — or simply relocates the under-utilization into a new layer.
| Metric | Current figure | Source | Implication |
|---|---|---|---|
| CDP implementation rate | 41% of companies | CDP.com industry stats | Adoption is mainstream — the install base is large |
| High-utilization rate | 22% of marketers | CDP.com industry stats | Most teams barely use what they bought |
| Average capabilities used | ~47% of features | CDP.com industry stats | Roughly half of every CDP investment sits idle |
| Warehouse-native adoption | >25% of CDPs | CDP.com industry stats | Warehouse-centric architecture is already a quarter of the field |
| Fortune 500 on Databricks | >60% penetration | Databricks (Dec 2025) | A vast warehouse install base CustomerLake can land on |
07 — The Threat ModelWhat it means for incumbents and buyers.
CustomerLake does not threaten every CDP equally. The asymmetric economics hit hardest where a buyer already runs Databricks and the CDP’s main value was unifying and activating data the warehouse already holds. Where the incumbent’s value is real channel orchestration, deep activation integrations, or a 700-plus connector library and 25,000-company network like Twilio Segment’s, the calculus is more nuanced. The matrix below maps who should watch closely and who can keep building.
Enterprises with a lakehouse install base
With Fortune 500 penetration above 60%, a vast base already runs the warehouse CustomerLake lands on. If your CDP mainly unifies and activates Databricks data, this is the strongest reason yet to revisit the contract — but Private Preview means evaluate, do not migrate.
Packaged-CDP incumbents
The asymmetric price dynamic is the real threat: a consumption model that is additive to a data-platform business is hard to match on cost alone. Differentiation now has to live in orchestration, integrations, and proven outcomes — not in storing another copy of the data.
Teams not on Databricks
Lakehouse Federation reaches Snowflake and BigQuery, but the native governance benefit is strongest on Databricks itself. If your data lives elsewhere, composable activation from your own warehouse may still be the cleaner path than adopting a second platform.
Anyone signing a long contract
Private Preview, no disclosed pricing, no GA date, and entirely vendor-stated capability benchmarks. The prudent move is to treat CustomerLake as a roadmap input to your CDP renewal — not a product you can deploy today.
08 — Your StackWhat to actually do about it.
For most marketing teams, the right response to a Private Preview announcement is not to act but to recalibrate. If you are mid-cycle on a CDP renewal and you run Databricks, the smart move is to fold CustomerLake into the renewal conversation as a credible alternative — even if you cannot deploy it yet — because the option alone changes your negotiating position. If you are not on Databricks, the launch is a strong signal that warehouse-native architecture is the direction of travel, which makes the composable path worth a serious look before you sign another packaged-CDP contract.
The cost angle deserves its own line in the model. A consumption model that rides on compute you already pay for is a structurally different line item than a standalone CDP license, and it changes how CDP spend competes with the rest of your channel budget. If you are re-running that math, our marketing budget allocation guide frames where data-platform spend sits against paid, owned, and earned channels. And when the question becomes how to wire agents, identity, and activation into a working customer-data workflow rather than a slide, our CRM & marketing automation engagements start with exactly this kind of architecture decision — mapping your data, governance, and channel reality before any platform commitment.
09 — ConclusionAn infrastructure decision wearing a martech badge.
The warehouse just became the CDP — but Private Preview means watch, not migrate.
CustomerLake is the clearest sign yet that the CDP category is being absorbed into the data platform rather than competing alongside it. By running the CDP inside the lakehouse and governing data, models, and agents under one plane, Databricks turns a decade of composable-CDP advocacy into its own on-ramp — and brings a structural cost advantage that standalone vendors cannot easily answer.
The honest caveats matter as much as the thesis. CustomerLake is in Private Preview, not generally available. Its scale and identity claims are vendor-stated with no independent verification, no pricing rates are public, and the named customers are early adopters rather than proven case studies. The 80%-by-2030 forecast is an analyst projection behind a paywall. None of that makes the move less significant — it makes it a roadmap input, not a deployable product.
The sharpest unanswered question is the one the market should sit with: 80% of new CDPs may be composable by 2030, yet only 22% of marketers report high utilization of the CDPs they already own. Moving the platform inside the warehouse fixes the architecture. It does not automatically fix adoption — and whether agents close that gap or simply relocate it is the test CustomerLake still has to pass. For now, the right posture is to treat it as the infrastructure decision Gartner calls it: evaluate it against your warehouse and your renewals, and commit only when the preview becomes a product.
