VOOZH about

URL: https://thenewstack.io/will-data-privacy-die-in-the-age-of-genai/

⇱ Will Data Privacy Die in the Age of GenAI? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-10-30 06:48:30
Will Data Privacy Die in the Age of GenAI?
sponsor-redpanda,sponsored-post-contributed,
AI Operations / Open Source / Security

Will Data Privacy Die in the Age of GenAI?

Enterprises can adopt generative AI without losing data privacy, security or sovereignty, but it’s not straightforward.
Oct 30th, 2024 6:48am by James Kinley
👁 Featued image for: Will Data Privacy Die in the Age of GenAI?
Featured image by Jp Valery on Unsplash.
Redpanda sponsored this post.

The race for AI supremacy is not just a competition between the big tech companies but also includes sovereign nations. The World Economic Forum defines sovereign AI as “the ability for a nation to build AI with homegrown talent … based on its local policy or national AI strategy.” From a macroeconomic, political and security standpoint, it’s clear why countries need to build advanced AI capabilities within the confines of their borders.

At the micro level, the definition of sovereign AI is somewhat analogous to a tech company’s purview. Take the definition above and swap the words “nation” with “organization,” and the meaning confines sovereign AI to the borders of individual companies. Sovereign AI then becomes a company’s competitive advantage, as shown by its ability to fine-tune models and instill safety and trust in AI applications.

But how do you ensure data privacy when the easiest way to consume generative AI (GenAI) is using a closed model over a cloud API?

GenAI’s Data Privacy Problem

Consider how a software developer might build a simple Q&A chatbot today. They would probably start by downloading a large language model (LLM) framework like LangChain and by following one of LangChain’s tutorials. Before long, your company’s sensitive data and maybe even your customers’ personally identifiable information (PII) is sent to OpenAI to retrieve text embeddings and generate chat responses, then stored in a vector database also hosted in the cloud.

Agents and agentic AI systems arguably make data privacy even harder. They have autonomy to make decisions and take actions without being prompted by a human. So, without proper governance, it would be difficult to track what sensitive data is being shared outside of your organization.

To be clear, this isn’t necessarily a problem. It depends on how permissive your company’s data privacy and data protection policy is. Even then, companies like OpenAI take data privacy very seriously and comply with internationally recognized data security standards like SOC2 and General Data Protection Regulation (GDPR) to keep data safe within their systems. This might be enough to satisfy your InfoSec team, but what are your options if your sensitive data must strictly stay within the confines of your network?

Flipping the Paradigm: Bring the Model to Your Data

Mark Zuckerberg says, “Open source AI is the path forward.” In his article, he advocates using Meta’s Llama models, which it defines as open source, as the best option for harnessing the power of GenAI because open source software, by its transparent nature, is more secure and trustworthy than closed-source alternatives.

The salient point here is that models like Llama can be run anywhere. That’s good for enterprises that don’t want to give their proprietary data away because they can bring the model to their data, not the other way around. The key for this staying true is no breakout models that are 10 times better than the rest, essentially unlocking new use cases that were not possible before.

Meta’s stated commitment to open source singles out Llama as being a top choice today, but other so-called open source models deserve an honorable mention, including Google’s Gemma and Mistral, among others. However, open models aren’t the only way to solve the data privacy problem with GenAI.

Companies like Cohere support private deployments and “bring your own cloud” (BYOC) deployments that allow you to deploy their LLMs into your cloud account and virtual private cloud (VPC). If your sensitive data is already secured in the cloud, then BYOC is another way to bring the model and other data platform tooling to where your data already resides.

👁 Generative AI application with and without sovereign AI.

Generative AI application with and without sovereign AI.

Gratis Vs. Libre

Llama and other similar models might be free to download and use, but the practical cost of running a frontier LLM on your own infrastructure is certainly not free. I won’t work out the costs of processing tokens in this article, but suffice to say, Nvidia H100 GPUs aren’t cheap, and you’ll need more than one H100 to run Llama’s 405B parameter model at a useful scale.

So, is this the price you must pay for data privacy in the GenAI era?

There’s always a balance, and the counterargument is that not every model needs to run inference on GPUs to perform well. Llama is actually a collection of models — a herd of llamas, if you like — that are pretrained and fine-tuned in various sizes. The smallest is a 1B parameter model lightweight enough to run on a mobile device for things like text summarization. People save a lot of sensitive data on their mobile phones, so being able to perform inference directly on a device is important from a privacy standpoint.

The mode and size of the model you use depends on the use case, but just know that there are options for building GenAI applications that don’t always require super-expensive hardware for inference.

The Verdict: Privacy Is Not Dead

Of course, data privacy is not dead in the age of AI. The solution to the data privacy problem has always been the same. If you have sensitive data and your data privacy policy mandates that data cannot be shared outside of your organization, then the software that comes into contact with that data has to run within your network, GenAI or otherwise.

What changes is the economics. If you cannot share data with a closed model over a cloud API or simply do not trust the closed model provider, then you have to figure out a way to provide access to an LLM within your network. There are closed-model providers like Cohere that support private deployments in the cloud or on premises, and Meta puts forward a compelling argument for so-called open source models like Llama.

Data security, data privacy and data sovereignty should not prohibit the adoption of GenAI technology in the enterprise. Open source models are constantly improving, and as smaller models become more powerful and the ecosystem of tools around them grows, applications built on GenAI technology can be your organization’s most powerful assets.

Redpanda is the streaming data platform for developers. Built with a native Kafka API, Redpanda eliminates complexity, maximizes performance and reduces costs. Its lean architecture gives you 10x lower latencies and up to a 6x lower cloud spend — without sacrificing reliability or durability.
Learn More
The latest from Redpanda
TRENDING STORIES
James Kinley is a Principal Solutions Architect at Redpanda Data with over 15 years of experience in technical pre-sales and post-sales roles. He has a proven track record for driving complex enterprise sales by translating technical requirements into impactful business...
Read more from James Kinley
Redpanda sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.