VOOZH about

URL: https://thenewstack.io/apache-drill-eliminates-etl-data-transformation-mapr-database/

⇱ Apache Drill Eliminates ETL, Data Transformation for MapR Database - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2016-04-11 09:45:42
Apache Drill Eliminates ETL, Data Transformation for MapR Database
analysis,news,
Software Development

Apache Drill Eliminates ETL, Data Transformation for MapR Database

Apr 11th, 2016 9:45am by Susan Hall
👁 Featued image for: Apache Drill Eliminates ETL, Data Transformation for MapR Database

Hadoop distribution provider MapR is using the recently released Apache Drill query engine version 1.6 as its “unified SQL layer” for its converged data platform, to provide a tighter integration with the MapR-DB document database.

With the MapR-DB document database format plugin in Drill 1.6, a user can query JSON tables in MapR-DB directly, potentially eliminating the need for additional ETL (extract, transform and load) operations.

“You can have files, database tables, streams that are contained and managed through that converged platform and Apache Drill can be used to query across all the data regardless of where it’s located,” said Jack Norris, senior vice president of data and applications.

👁 drill-stack-mcdp

“Users can access those files directly with Drill, they can query them in the database tables; they can look at it through Hive. Regardless of how the data arrived or where the data is located, Drill is the SQL interface that allows access and queries directly on that data,” Norris said.

Six months ago, MapR released a developer preview of its JSON-based document database for use inside Hadoop. It announced MapR-DB document database capabilities as part of the MapR 5.1 release in March.

The Apache Software Foundation has elevated Drill to a top-level project in December 2014. It released version 1.0 last May.

Drill was designed as a schema-free SQL query engine for multiple data sources, including JSON, Parquet, and HBase. It not only allows rapid application development on Apache Hadoop, but empowers enterprise BI analysts to explore the data themselves — freeing IT staff from structuring the data for them.

Drill lets you analyze Hadoop data without ETL or creating schemas first; it generates schemas on the fly and keeps files in their original formats rather than converting them into tables or pre-specified formats before they’re loaded into the database system.

“The unique position that Apache Drill occupies is really in data exploration — to be able to support directly some of the most common formats out there that are also fairly difficult to query directly, things like JSON documents,” Norris said.

A web provider of bicycle equipment could, for instance, could offer a single search service that can both cover in-depth information such as documentation for the bikes, as well as returning results from simple product searches, such as from a catalog of accessories.

The information can be stored in a relational database, a NoSQL system such as HBase, a document database such as MapR, or even in the flat file.

👁 mapr-console

The Drill 1.6 release includes performance enhancements including:

  • Query planning speedups via early application of partition pruning.
  • Enhanced stability and scale with an improved memory allocator.
  • Faster query planning on Hive table queries.
  • Optimized reading of Parquet metadata cache.
  • And security through “client impersonation,” which Norris described as role-based views of the data without multiple different copies of it.

“Apache Drill is a game changer for us,” said Edmon Begoli, chief technology officer of PYA Analytics, a Tennessee-based advanced analytics company serving healthcare, defense and other industries.

“We’ve been able to query, in under 60 seconds, two years worth of flat PSV files of claims, billing, and clinical data from commercial and government entities, such as the Centers for Medicaid and Medicare Services,” Begoli said. “Drill has allowed us to bypass the traditional approach of ETL and data warehousing, convert flat files into efficient formats such as Parquet for improved performance, and use plain SQL against very large volumes of files.”

Feature Image: “Day 232 – Photo365 – Construction” by Makia Minich, licensed under CC BY-SA 2.0.

TRENDING STORIES
Susan Hall is the Sponsor Editor for The New Stack. Her job is to help sponsors attain the widest readership possible for their contributed content. She has written for The New Stack since its early days, as well as sites...
Read more from Susan Hall
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.