VOOZH about

URL: https://thenewstack.io/data-lake-security-dive-into-the-best-practices/

⇱ Data Lake Security: Dive into the Best Practices - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-01-26 07:34:54
Data Lake Security: Dive into the Best Practices
sponsor-imperva,sponsored-post-contributed,
Data / Storage

Data Lake Security: Dive into the Best Practices

Enjoy the benefits without compromising on security by implementing practices to reduce the risk of noncompliance, data mismanagement or other incidents.
Jan 26th, 2023 7:34am by Dan Neault
👁 Featued image for: Data Lake Security: Dive into the Best Practices
Imperva sponsored this post.

Data is the most valuable resource on earth, and a business’ success scales up with their ability to maximize value from data. That’s why many organizations are turning to data lakes to improve analytics, enable more effective collaboration and support data-driven decision-making at scale.

Different from traditional relational databases, data lakes are capable of ingesting data in its raw form from multiple sources.

While data lakes have the promise to deliver superior business outcomes, their rapid adoption creates a situation where some teams lack the resources and domain expertise to ensure compliance and security controls are in place. Complicating this, a broad set of internal and sometimes external roles are able to use the lake, amplifying potential risks to the business.

To realize the benefits of a data lake without compromising on security, organizations need to follow several best practices to reduce the risk of noncompliance, data mismanagement, data leakage or an otherwise security incident.

Imperva helps organizations protect critical applications, APIs, and data, anywhere, at scale, and with the highest ROI. With an integrated approach combining edge, application security, and data security, Imperva protects companies through all stages of their digital journey.
Learn More
The latest from Imperva

From Database to Data Lake

Database technology was introduced in the 1960s as computers became more accessible and organizations sought a solution to efficiently store and manage data. For decades, online transactional processing (OLTP) workloads and relational databases served as the workhorse — delivering rapid, accurate data processing.

Yet by the 1980s, data warehouses transformed data processing from transactional or operational systems to decision-support systems. This shift enabled companies to aggregate data from across multiple environments to gather business intelligence (BI) and support strategic decision-making.

Today almost every organization uses databases, data warehouses and BI to inform innovation and guide strategic decisions. However, with the rise of cloud computing and modern coding languages, the ways in which databases are used is evolving for several reasons:

  • Organizations realize they can get more value out of their data if they don’t apply a predefined schema or limit how it can be used across transactional or analytical systems.
  • Data is used to develop and train machine learning (ML) models for analytics or to modernize existing workloads running on any type of database.
  • Cloud computing allows for the rapid provisioning and modernizing of workloads at a pace and scale that was impossible just a few years ago.

While some businesses remain focused on relational databases or data warehouses, and primarily structured data, data-savvy customers increasingly raise an eyebrow at over focusing here.

Data warehouses work exceptionally well at processing and analyzing structured data, but they’re unable to capture raw and unstructured data, a severe limitation for digital businesses. As a result, nonrelational databases, such as data lakes, are growing in popularity, with some data architects now defaulting to data lakes for both new workloads and to modernize existing ones.

👁 Image

Why You Should Consider a Data Lake

Increasingly, organizations are starting their data life cycle in a data lake because they gain immediate value and can use it to build ML models, perform ad-hoc analytics queries, feed countless analytics systems and more.

Traditionally, data warehouses have been used to regularly analyze large amounts of structured data or to produce periodical reports. However, they require businesses to apply a predefined schema to data before processing and storing it, limiting how the data can be used across transactional or analytical systems.

Alternatively, data lakes don’t require the same upfront work. This allows for the integration and storage of data, unconverted or with minimal treatment, as it’s ingested into the data lake from multiple sources, including unstructured log data, internet of things (IoT) sensors, and social media or multimedia content.

This provides three benefits. Users can:

  • Process data as it flows into the data lake in near-real time using stream-processing tools like Apache Kafka.
  • Derive specific insights directly from the data lake using a high-performance query engine like Google BigQuery or Amazon Athena.
  • Process on-demand analytics on large volumes of structured and unstructured data with tools like Elasticsearch to search, filter and visualize data from logs and operational data.

Are Data Lakes Secure?

Data going to a data lake needs to be protected and given the same level, if not more, protection than data stored in a relational database as it serves as the sole repository for a company’s data.

The three key security risks facing data lakes are:

  • Access control: With no database tables and more fluid permissions, access control is more challenging in a data lake. Moreover, permissions are difficult to set up and must be based on specific objects or metadata definitions. Commonly, employees across the company also have access to the lake, which contains personal data or data that falls under compliance regulations. With 58% of security incidents caused by insider threats, according to a commissioned Forrester Consulting study, employee access to sensitive data is a security nightmare if left unchecked.
  • Data protection: Data lakes often serve as a singular repository for an organization’s information, making them a valuable target to attack. Without proper access controls in place, bad actors can gain access and obtain sensitive data from across the company.
  • Governance, privacy, and compliance: Because employees from across the company can feed data into the data lake without inspection, some data may contain privacy and regulatory requirements that other data doesn’t. What’s more, locating and monitoring personal data across data lake storage architecture can be challenging.

Not protecting these gaps could cause organizations to choose between limiting the data they store in a data lake and putting themselves at risk of noncompliance. Or in a worst-case scenario, it could lead to a data leak or security incident.

How to Secure a Data Lake 

Data is the lifeblood of the modern business, and an effective security strategy needs to start with securing it.

To gain visibility and control over a data lake, there are four steps a business should take:

  • Outline a standardized data access process: Used both by human users and integrated systems, the process should enable tracking of access and use of the data.
  • Create a data classification scheme and catalog: Data in the lake should be classified by content, usage scenarios, types and possible user groups with a catalog that enables the search and retrieval of data. There should also be a convenient method to separate the data you want to keep from data you want to delete.
  • Enable data protection: Security controls, data encryption and automatic monitoring must be in place, and alerts should be raised when unauthorized parties access the data or when authorized users perform suspicious activities.
  • Enforce data governance, privacy and compliance: There should be clear policies, communicated to all relevant employees, about how to navigate and make use of the data lake, how to promote data quality and the ethical use of sensitive data. A data lake commonly stores historical data, and that data should be stored in compliance with data privacy standards.

Maximize Data Value While Preventing Security and Privacy Risks

Historically, relational databases were the default storage systems for businesses, but new advancements in data storage, capture and analytics have provided capabilities for extracting value from raw data that was inconceivable only a few years ago.

More organizations are adopting nonrelational databases, like data lakes, thanks to their ability to provide real-time analytics and capture additional data types. However, data lakes present a complex challenge: managing security while maintaining compliance with privacy regulations.

To address the security and compliance risks associated with data lakes, organizations should start by creating an effective and efficient way to classify and discover data across their environment. Next, organizations must be able to identify who is accessing data, when a compromised user accesses sensitive data and prevent data from being stolen by malicious insiders.

While these security best practices serve as a foundational step toward creating a more secure data lake environment, organizations should invest in a holistic data-centric security solution that is designed to protect data no matter where it lives and whatever form it’s in.

Imperva helps organizations protect critical applications, APIs, and data, anywhere, at scale, and with the highest ROI. With an integrated approach combining edge, application security, and data security, Imperva protects companies through all stages of their digital journey.
Learn More
The latest from Imperva
TRENDING STORIES
Dan Neault is senior vice president and general manager of Data Security at Imperva. He is an accomplished technology executive with over 20 years of experience. He helped build and grow some of the most recognizable technology companies in the...
Read more from Dan Neault
Imperva sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, Enable.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.