Data Analysis Using Hadoop Tools

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Johns Hopkins University

Data Analysis Using Hadoop Tools

This course is part of Big Data Processing Using Hadoop Specialization

👁 Karthik Shyamsunder

Instructor: Karthik Shyamsunder

Included with

•

Learn more

Ask Coursera

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Learn to set up and configure Hive, Pig, HBase, and Spark for efficient big data analysis and processing within the Hadoop ecosystem.
Master Hive’s SQL-like queries for data retrieval, management, and optimization using partitions and joins to enhance query performance.
Understand Pig Latin for scripting data transformations, including the use of operators like join and debug to process large datasets effectively.
Gain expertise in NoSQL databases with HBase for real-time read/write operations, and use Spark’s core programming model for fast data processing.

Skills you'll gain

Tools you'll learn

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

Assessments

15 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Big Data Processing Using Hadoop Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

👁 Image

There are 5 modules in this course

The course "Data Analysis Using Hadoop Tools" provides a thorough and hands-on introduction to key tools within the Hadoop ecosystem, such as Hive, Pig, HBase, and Apache Spark, for data processing, management, and analysis. Learners will gain practical experience with Hive's SQL-like interface for complex data querying, Pig Latin scripting for data transformation, and HBase's NoSQL capabilities for efficient big data management. The course also covers Apache Spark's powerful in-memory computation capabilities for high-performance data processing tasks. By the end, participants will be equipped with the skills to leverage these technologies within the Hadoop platform to address real-world big data challenges.

What makes this course unique is its comprehensive approach to integrating various Hadoop tools into a cohesive workflow. You'll not only learn how to use each tool individually but also understand how to effectively combine them to optimize data processing and analysis. Through hands-on exercises and examples, you'll gain the confidence and skills to tackle complex data challenges and extract valuable insights from big data. Whether you're looking to enhance your data analysis capabilities for work or want to deepen your knowledge of Hadoop and big data tools, this course offers valuable skills that will help you succeed.

This course provides a comprehensive overview of key tools within the Hadoop ecosystem, including Hive, Pig, HBase, and Apache Spark. You will learn how to set up and configure these technologies for data processing, management, and analysis. The course covers Hive's query execution, Pig's scripting language, and HBase's NoSQL capabilities. You'll also gain hands-on experience with Spark's core programming model for efficient big data processing. By the end, you'll be equipped to leverage these tools for optimized data analysis and management.

What's included

2 readings

2 readings•Total 15 minutes

Course Overview•5 minutes
Instructor Biography: Prof. Karthik Shyamsunder•10 minutes

In this module, we will cover MapReduce programming using a higher-level language called Hive which translates Hive SQL-like queries to MapReduce.

What's included

9 videos7 readings4 assignments

9 videos•Total 107 minutes

Introduction - Hive•2 minutes
Hive Overview and Architecture•23 minutes
Setting up Hive•26 minutes
Simple Hive Example•20 minutes
Loading Data•9 minutes
Hive Statements•11 minutes
Partitions•6 minutes
Joins•8 minutes
Summary- Hive•2 minutes

7 readings•Total 105 minutes

Hive Overview and Architecture•10 minutes
Setting up Hive•10 minutes
Simple Hive Example•10 minutes
Loading Data•10 minutes
Hive Statement•10 minutes
Partitions and Joins in Hive•15 minutes
Self-Reflective Reading: Balancing Hive, Java MapReduce, and Pig in Hadoop Architectures•40 minutes

4 assignments•Total 105 minutes

Data Analysis using Hive •60 minutes
Introduction to Hive: Overview, Architecture, and Setup•15 minutes
Working with Hive: Basic Examples, Data Loading, and Hive Statements•15 minutes
Advanced Hive: Partitions, Joins, and Summary•15 minutes

In this module, we will cover MapReduce programming using a higher-level language called Pig which translates Pig Latin queries to MapReduce.

What's included

9 videos7 readings4 assignments

9 videos•Total 132 minutes

Introduction - Pig•2 minutes
Pig: Overview and Architecture•22 minutes
Setting up Pig•8 minutes
Grunt Interactive Shell•18 minutes
Pig Latin Language Basics•10 minutes
Pig Data Types and Schema•15 minutes
Core Relational Operators•14 minutes
Join Operators•26 minutes
Debug Operators•17 minutes

7 readings•Total 102 minutes

Pig: Overview and Architecture•15 minutes
Grunt Interactive Shell•7 minutes
Exploring Pig Latin Basics: Data Structures, Syntax, and Commands•10 minutes
Understanding Schemas, Data Types, and Functions in Apache Pig•10 minutes
Core Relational Operators in Pig Latin: An Overview•10 minutes
Exploring Relational Join Operators in Apache Pig•10 minutes
Self-Reflective Reading: Hive vs. Pig for Your Big Data Strategy•40 minutes

4 assignments•Total 105 minutes

Data Analysis using Pig•60 minutes
Introduction to Pig: Overview, Architecture, and Setup•15 minutes
Pig Fundamentals: Grunt Shell, Pig Latin Basics, and Data Types•15 minutes
Pig Advanced: Core Operators, Joins, Debugging, and Summary•15 minutes

In this module, we will start with a primer of NoSQL databases and then dive into HBase, a NoSQL database built on top of Hadoop that allows for random, real-time read/write access to your Big Data.

What's included

8 videos3 readings3 assignments

8 videos•Total 175 minutes

Introduction - HBase•2 minutes
NoSQL Primer•35 minutes
HBase Overview and Architecture•31 minutes
Setting up HBase•25 minutes
HBase Data Model•16 minutes
HBase Shell•33 minutes
CRUD operations using Java API•31 minutes
Summary - HBase•3 minutes

3 readings•Total 60 minutes

HBase Overview and Architecture•10 minutes
HBase Data Model•10 minutes
Self-Reflective Reading: Coexisting Databases: Balancing NoSQL and RDBMS in Modern Applications•40 minutes

3 assignments•Total 90 minutes

Hadoop NOSQL Database HBase•60 minutes
Introduction to HBase: NoSQL Basics, Architecture, and Setup•15 minutes
HBase Fundamentals: Data Model, Shell, CRUD Operations, and Summary•15 minutes

In this module, we will cover the Spark engine and framework and show how it integrates on the Hadoop platform.

What's included

8 videos5 readings4 assignments

8 videos•Total 233 minutes

Introduction - Spark•3 minutes
Spark Overview•38 minutes
Spark Architecture•29 minutes
Setting up and Running Spark•55 minutes
Spark Core Programming Model •41 minutes
Hands-on Spark •39 minutes
Miscellaneous Spark Components•20 minutes
Summary - Spark•7 minutes

5 readings•Total 85 minutes

Spark Architecture•15 minutes
Reading References•10 minutes
Setting up and Running Spark•10 minutes
Reading References•10 minutes
Self-Reflective Reading: Hadoop vs Spark: Competing or Complementary?•40 minutes

4 assignments•Total 105 minutes

Spark•60 minutes
Introduction to Spark: Overview and Architecture•15 minutes
Spark Setup and Core Programming Model•15 minutes
Hands-On Spark: Core Components and Summary•15 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

👁 Karthik Shyamsunder

Karthik Shyamsunder

Johns Hopkins University

4 Courses•1,480 learners

Offered by

👁 Image

Johns Hopkins University

Explore more from Data Analysis

👁 Image
Status: Free Trial
E
EDUCBA
Hadoop Projects: Analyze Big Data with Hive & Pig
Course
👁 Image
Status: Free Trial
E
EDUCBA
Big Data with Hadoop: Apply MapReduce, Pig & Hive
Course
👁 Image
Status: Free Trial
E
EDUCBA
Hadoop Projects: Apply MapReduce, Pig & Hive
Course
👁 Image
U
University of California San Diego
Hadoop Platform and Application Framework
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/data-analysis-using-hadoop-tools