VOOZH about

URL: https://thenewstack.io/snowflake-platform-gets-generative-ai-ml-data-lakehouse-features/

⇱ Snowflake Platform Gets Generative AI, ML, Data Lakehouse Features - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-11-06 10:30:17
Snowflake Platform Gets Generative AI, ML, Data Lakehouse Features
Data

Snowflake Platform Gets Generative AI, ML, Data Lakehouse Features

Snowflake brings generative AI & ML features to its SQL language, data lakehouse workloads to its platform, and dev features to tie it all up.
Nov 6th, 2023 10:30am by Andrew Brust
👁 Featued image for: Snowflake Platform Gets Generative AI, ML, Data Lakehouse Features
Feature image from franganillo on Pixabay.

At its Snowday event last Wednesday, leading data cloud contender Snowflake rolled out a number of new capabilities on its platform. And while I could have used that very same sentence to cover Snowday in years past, this is really the year that the pieces — including organically developed and acquired technologies — seem to be coming together.

This year’s announcements started off with the long-awaited public preview of Snowflake’s support for Iceberg tables, and segued into a collection of developments in the data governance, developer and AI realms, with the last of these including a model registry, a feature store, vector search and LLM inferencing.

As in past years, Snowflake’s new capabilities and technologies run the gamut of shipping readiness: a couple of them are still in development, some are or will “soon” be in private preview, others are or will be in public preview, and a few are now generally available.

Breaking the Ice

Let’s start with the Apache Iceberg table capability. Iceberg is an open table format that, most commonly, takes data stored in Apache Parquet format and enhances it, allowing Snowflake to use Iceberg tables as if they were standard, native ones.

Once that capability is in GA, it will allow Snowflake to take on data lakehouse workloads, in addition to warehouse workloads, making it a key pillar to Snowflake calling itself a true data platform, rather than a cloud data warehouse.

Snowflake first debuted its support for Iceberg two full years ago. It’s still not GA, but Snowflake has assured me that the public preview will kick off this month. That’s good news.

For now, though, there is a distinction to be made between Iceberg tables managed by Snowflake and those managed by another engine/platform. For example, tables in Cloudera Data Platform or Dremio can be implemented in Iceberg format, which puts the tables’ metadata in one of those platforms’ catalogs, rather than Snowflake’s. In such a case, Snowflake will, for now, treat these tables as read-only.

Users on the other platform’s side could still carry out their updates, and Snowflake would see these and process the data with performance comparable to working with data in its original, native format, but it would not be able to update them. Snowflake’s catalog integration, one step to addressing this, will also be in public preview soon. Work on an Iceberg catalog REST API — which will fully clear the hurdle of non-Snowflake-managed Iceberg tables being used in a native, read/write manner — is in development, with no announced timeline for private or public preview.

On the Horizon

Iceberg support now falls under Snowflake “Horizon” — a new umbrella brand for all of the company’s features pertaining to compliance, security, privacy, interoperability (including Iceberg) and access. Beyond Iceberg support, there were a number of Horizon features announced at Snowday, including a new Trust Center interface for managing security; a data quality metrics monitoring and alerting facility; a data lineage UI; custom data classification as well as a universal search facility that I’ll detail later.

Each of these features is entering private preview. In addition, new automatic data classification and differential privacy features were announced as being in development.

One other management feature, though not technically part of Horizon, is a new cost management interface, providing “visibility, control, and optimization of Snowflake spend” according to the company.

The interface provides account-level spend and usage metrics, including a lists of the most expensive queries executed, rarely used objects carrying costs that could possibly be eliminated, and top warehouses by cost. The cost management interface is entering private preview.

SELECT Generative AI

👁 Image

LLM technologies in Snowflake. Credit: Snowflake

As a result of Snowflake’s acquisition of search company Neeva in May of this year, Snowflake announced a really broad selection of AI and ML features. What’s most impressive about these is how easy they are to use. A new component of Snowflake, called Cortex, provides backend integration with, and elastic compute around, a number of large language model services and traditional machine learning modeling libraries.

The capabilities around Cortex are available through simple function calls, usable from both SQL and Python, making them highly accessible to technologists without a data science background. This includes full vector embedding generation, storage, indexing, search and even support of a native embedding data type.

Cortex LLM-based functions for SQL and Python are going into private preview. They can handle language translation, text summarization, sentiment detection, vector embedding generation, vector search, and, of course, functions for sending prompts and contextual data to an LLM to solicit verbose or short answers. ML-based functions for forecasting and anomaly detection will be in GA soon. A “top insights” function is in public preview and a classification function will be in private preview soon.

‘Traditional’ ML and LLM-Based Experiences

A full Snowpark ML modeling API is being provided for data preprocessing and model training, and that meshes well with some new impressive MLOps capabilities, namely an ML model registry and a feature store. The model registry can accommodate models built in Snowflake, as well as models built externally, and can deploy them to service inferencing requests. Similarly, the feature store can be used to generate training datasets as well as to serve features in production for inference. The Snowpark ML modeling API is GA now; the model registry is in public preview and the feature store is in private preview.

New LLM-powered tools — or “experiences,” as Snowflake likes to call them — that were built on Cortex, are also on offer. They include Document AI (for making unstructured documents like invoices fully queryable with natural language), Copilot (for natural language-to-SQL query translation) and Universal Search (allowing searches for tables, views, databases, schemas and Snowflake Marketplace listings). All three are in private preview.

Developer Goodies

Developer and DevOps features are important too, and they have in no way been left out of the Snowday announcements. To start with, given all the AI-related work you can do with Snowflake, and the compatibility of those features with both SQL and Python programming, Snowflake has seen fit to add a new notebook coding interface right inside the Snowsight UI, in private preview.

These are not full-fledged Jupyter notebooks — they’re Snowflake-specific ones, whose cells can contain SQL, Python or Markdown only —  there are no kernels for Scala, R or other languages. That said, developers familiar with Jupyter notebooks should be right at home and will be able to use Cortex functions and the Snowpark ML modeling API right out of the gate. Streamlit chart elements are built in as well, to handle data visualization needs.

There’s also integration with Git-based CI/CD and version control in private preview, a new command line interface (CLI) in public preview, database change management capabilities in private preview soon and a new Event Tables feature that is GA. Beyond that, the Native App Framework which allows bundling of data, code and containers, will be GA on Amazon Web Services soon and will be entering public preview soon on Microsoft Azure.

Putting It All Together: Generative AI Demo

After taking journalists and analysts through a full inventory of these new capabilities, Snowflake’s Director of Product Management Jeff Hollan made it real by taking us through an end-to-end coding demo in Snowflake’s new notebook environment. He demonstrated a reasonable scenario of mining customer service chat transcripts, both in English and German and wiki articles, to get a summary of issues facing a fictitious ski equipment retailer. In the demo, Hollan was able to:

  1. Translate several German-language transcripts into English
  2. Summarize an English-language transcript generically
  3. Summarize it again with a custom prompt, asking for specific pieces of information
  4. Generate embeddings for a series of wiki articles in a Snowflake table and store those embeddings in a new table
  5. Do a vector search in that new table for articles relevant to a specific prompt
  6. Create an embedding for the prompt, and send it, along with the embeddings for the relevant wiki articles, to the LLama 2 LLM for a retrieval augmented generation (RAG) response to the prompt

By my count, the entire demo involved six SQL queries (a couple of which were elaborate, I’ll admit) and six lines of Python code, and was easy to follow. By the time Hollan demoed a full Streamlit app with similar functionality, I was tempted to say “You had me at embedded SQL functions,” but I kept that to myself. While most controlled demos have some amount of “smoke and mirrors” behind them, and this one relied on a number of technologies that are still in private preview, it seemed very reasonably based on the Cortex features that the Snowflake folks had detailed.

With that in mind, I’m willing to suspend most of my usual demo skepticism.

The Power of Integration

The ability to do what Hollan showed us has been out there in the market for a while, but it has required background knowledge, connecting the dots, and threading the needle, to mix a few metaphors. It’s been up to the developer to extract data, get embeddings done somewhere, do the vector search in a database, then extract the needed context to send to an LLM on some other platform.

While it’s not impenetrably hard to do that, it’s not terribly convenient either. It’s also a bit “Rube Goldberg” in nature and requires extracting data from its native platform and sending it somewhere else.

Snowflake has made a lot of this, I dare say, rather turnkey. Everything’s set up. It’s accessible through simple SQL or Python interfaces, which can be used, individually or in combination, from the new Snowflake notebook interface. The data started in Snowflake, and stayed there, and, because of all that, the power of generative AI seemed much more concrete than I’ve seen in other contexts. And for customers who want to use machine learning models to do more straight-up predictive analytics, the experience would be similarly straightforward.

Snowflake has more work to do to get all this stuff to GA. And in the interim, competitors may catch up to a certain degree. But right now, the company is providing an impressive value proposition to its enterprise customers and very solidly showing that the key to AI success is a solid analytics platform.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Dremio.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.