Data Analysis Using Hadoop Tools
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Data Analysis Using Hadoop Tools
This course is part of Big Data Processing Using Hadoop Specialization
Instructor: Karthik Shyamsunder
Included with
Learn more
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Learn to set up and configure Hive, Pig, HBase, and Spark for efficient big data analysis and processing within the Hadoop ecosystem.
Master Hiveβs SQL-like queries for data retrieval, management, and optimization using partitions and joins to enhance query performance.
Understand Pig Latin for scripting data transformations, including the use of operators like join and debug to process large datasets effectively.
Gain expertise in NoSQL databases with HBase for real-time read/write operations, and use Sparkβs core programming model for fast data processing.
Skills you'll gain
Tools you'll learn
Details to know
15 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 5 modules in this course
The course "Data Analysis Using Hadoop Tools" provides a thorough and hands-on introduction to key tools within the Hadoop ecosystem, such as Hive, Pig, HBase, and Apache Spark, for data processing, management, and analysis. Learners will gain practical experience with Hive's SQL-like interface for complex data querying, Pig Latin scripting for data transformation, and HBase's NoSQL capabilities for efficient big data management. The course also covers Apache Spark's powerful in-memory computation capabilities for high-performance data processing tasks. By the end, participants will be equipped with the skills to leverage these technologies within the Hadoop platform to address real-world big data challenges.
What makes this course unique is its comprehensive approach to integrating various Hadoop tools into a cohesive workflow. You'll not only learn how to use each tool individually but also understand how to effectively combine them to optimize data processing and analysis. Through hands-on exercises and examples, you'll gain the confidence and skills to tackle complex data challenges and extract valuable insights from big data. Whether you're looking to enhance your data analysis capabilities for work or want to deepen your knowledge of Hadoop and big data tools, this course offers valuable skills that will help you succeed.
This course provides a comprehensive overview of key tools within the Hadoop ecosystem, including Hive, Pig, HBase, and Apache Spark. You will learn how to set up and configure these technologies for data processing, management, and analysis. The course covers Hive's query execution, Pig's scripting language, and HBase's NoSQL capabilities. You'll also gain hands-on experience with Spark's core programming model for efficient big data processing. By the end, you'll be equipped to leverage these tools for optimized data analysis and management.
What's included
2 readings
2 readingsβ’Total 15 minutes
- Course Overviewβ’5 minutes
- Instructor Biography: Prof. Karthik Shyamsunderβ’10 minutes
In this module, we will cover MapReduce programming using a higher-level language called Hive which translates Hive SQL-like queries to MapReduce.
What's included
9 videos7 readings4 assignments
9 videosβ’Total 107 minutes
- Introduction - Hiveβ’2 minutes
- Hive Overview and Architectureβ’23 minutes
- Setting up Hiveβ’26 minutes
- Simple Hive Exampleβ’20 minutes
- Loading Dataβ’9 minutes
- Hive Statementsβ’11 minutes
- Partitionsβ’6 minutes
- Joinsβ’8 minutes
- Summary- Hiveβ’2 minutes
7 readingsβ’Total 105 minutes
- Hive Overview and Architectureβ’10 minutes
- Setting up Hiveβ’10 minutes
- Simple Hive Exampleβ’10 minutes
- Loading Dataβ’10 minutes
- Hive Statementβ’10 minutes
- Partitions and Joins in Hiveβ’15 minutes
- Self-Reflective Reading: Balancing Hive, Java MapReduce, and Pig in Hadoop Architecturesβ’40 minutes
4 assignmentsβ’Total 105 minutes
- Data Analysis using Hive β’60 minutes
- Introduction to Hive: Overview, Architecture, and Setupβ’15 minutes
- Working with Hive: Basic Examples, Data Loading, and Hive Statementsβ’15 minutes
- Advanced Hive: Partitions, Joins, and Summaryβ’15 minutes
In this module, we will cover MapReduce programming using a higher-level language called Pig which translates Pig Latin queries to MapReduce.
What's included
9 videos7 readings4 assignments
9 videosβ’Total 132 minutes
- Introduction - Pigβ’2 minutes
- Pig: Overview and Architectureβ’22 minutes
- Setting up Pigβ’8 minutes
- Grunt Interactive Shellβ’18 minutes
- Pig Latin Language Basicsβ’10 minutes
- Pig Data Types and Schemaβ’15 minutes
- Core Relational Operatorsβ’14 minutes
- Join Operatorsβ’26 minutes
- Debug Operatorsβ’17 minutes
7 readingsβ’Total 102 minutes
- Pig: Overview and Architectureβ’15 minutes
- Grunt Interactive Shellβ’7 minutes
- Exploring Pig Latin Basics: Data Structures, Syntax, and Commandsβ’10 minutes
- Understanding Schemas, Data Types, and Functions in Apache Pigβ’10 minutes
- Core Relational Operators in Pig Latin: An Overviewβ’10 minutes
- Exploring Relational Join Operators in Apache Pigβ’10 minutes
- Self-Reflective Reading: Hive vs. Pig for Your Big Data Strategyβ’40 minutes
4 assignmentsβ’Total 105 minutes
- Data Analysis using Pigβ’60 minutes
- Introduction to Pig: Overview, Architecture, and Setupβ’15 minutes
- Pig Fundamentals: Grunt Shell, Pig Latin Basics, and Data Typesβ’15 minutes
- Pig Advanced: Core Operators, Joins, Debugging, and Summaryβ’15 minutes
In this module, we will start with a primer of NoSQL databases and then dive into HBase, a NoSQL database built on top of Hadoop that allows for random, real-time read/write access to your Big Data.
What's included
8 videos3 readings3 assignments
8 videosβ’Total 175 minutes
- Introduction - HBaseβ’2 minutes
- NoSQL Primerβ’35 minutes
- HBase Overview and Architectureβ’31 minutes
- Setting up HBaseβ’25 minutes
- HBase Data Modelβ’16 minutes
- HBase Shellβ’33 minutes
- CRUD operations using Java APIβ’31 minutes
- Summary - HBaseβ’3 minutes
3 readingsβ’Total 60 minutes
- HBase Overview and Architectureβ’10 minutes
- HBase Data Modelβ’10 minutes
- Self-Reflective Reading: Coexisting Databases: Balancing NoSQL and RDBMS in Modern Applicationsβ’40 minutes
3 assignmentsβ’Total 90 minutes
- Hadoop NOSQL Database HBaseβ’60 minutes
- Introduction to HBase: NoSQL Basics, Architecture, and Setupβ’15 minutes
- HBase Fundamentals: Data Model, Shell, CRUD Operations, and Summaryβ’15 minutes
In this module, we will cover the Spark engine and framework and show how it integrates on the Hadoop platform.
What's included
8 videos5 readings4 assignments
8 videosβ’Total 233 minutes
- Introduction - Sparkβ’3 minutes
- Spark Overviewβ’38 minutes
- Spark Architectureβ’29 minutes
- Setting up and Running Sparkβ’55 minutes
- Spark Core Programming Model β’41 minutes
- Hands-on Spark β’39 minutes
- Miscellaneous Spark Componentsβ’20 minutes
- Summary - Sparkβ’7 minutes
5 readingsβ’Total 85 minutes
- Spark Architectureβ’15 minutes
- Reading Referencesβ’10 minutes
- Setting up and Running Sparkβ’10 minutes
- Reading Referencesβ’10 minutes
- Self-Reflective Reading: Hadoop vs Spark: Competing or Complementary?β’40 minutes
4 assignmentsβ’Total 105 minutes
- Sparkβ’60 minutes
- Introduction to Spark: Overview and Architectureβ’15 minutes
- Spark Setup and Core Programming Modelβ’15 minutes
- Hands-On Spark: Core Components and Summaryβ’15 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Data Analysis
- Status: Free Trial
Course
- Status: Free Trial
Course
- Status: Free Trial
Course
- U
University of California San Diego
Course
Why people choose Coursera for their career
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
