![]() |
VOOZH | about |
In this article, we demonstrate how a MapReduce program can process large-scale weather datasets to identify temperature extremes. By harnessing Hadoop’s parallel processing capabilities, program efficiently pinpoints hot and cold days an essential step for climate trend analysis, anomaly detection and building reliable forecasting systems.
Analyze semi-structured weather data collected by sensors globally. We will focus on temperature values (maximum and minimum) and identify hot days (temperature > 30°C) and cold days (temperature < 15°C) using MapReduce.
We used weather data from the NCEI, available in line-based ASCII text format. Each file contains fields like Date, Latitude, Longitude, Max Temp and Min Temp.
FileName: CRND0103-2020-AK_Fairbanks_11_NE.txt. Download the file from here.
This section walks you through the implementation of the MapReduce program to extract hot and cold days from large-scale weather data using Hadoop.
Below is the example of our dataset where column 6 and column 7 is showing Maximum and Minimum temperature, respectively.
👁 minnimum-and-maximum-temprature-field-in-datasetMake a project in Eclipse with below steps:
First Open Eclipse -> then, select File -> New -> Java Project -> Name it MyProject -> then, select use an execution environment -> choose, JavaSE-1.8 then, next -> Finish.
👁 create-java-projectIn this Project Create Java class with name MyMaxMin -> then, click Finish.
👁 create-java-classCopy the below source code to this MyMaxMin java class
To ensure imported packages work correctly, you need to add external JAR files to your project. Download the Hadoop Common and Hadoop MapReduce Core JAR files that match your installed Hadoop version.
Check Hadoop version with below command:
👁 check-hadoop-versionhadoop version
Now, to add external jars to MyProject:
Right Click on MyProject -> then, Build Path -> Click on, Configure Build Path and select Add External jars then Add jars from it's download location then click -> Apply and Close.
👁 adding-external-jar-files-to-our-projectNow export the project as jar file.
Right-click on MyProject choose Export -> go to, Java -> JAR file -> click, Next then, choose your export destination then click -> Next.
👁 export-java-MyProjectChoose Main Class as MyMaxMin by clicking -> Browse and then click -> Finish -> Ok.
👁 select-main-classStart HDFS and YARN daemons:
start-dfs.sh
start-yarn.sh
Command:
hdfs dfs -put /path/to/CRND0103-2020-AK_Fairbanks_11_NE.txt /
To verify:
👁 copying-the-dataset-to-our-HDFShdfs dfs -ls /
Now Run your Jar File with below command and produce the output in MyOutput File.
Syntax:
hadoop jar /path/to/Project.jar /input_file_in_HDFS /output_directory
Example:
👁 running-our-jar-file-for-analysishadoop jar /home/user/Documents/Project.jar /CRND0103-2020-AK_Fairbanks_11_NE.txt /MyOutput
After the MapReduce job completes, you can check the final results through the Hadoop web interface.
Visit:
http://localhost:50070/
Then navigate to: Utilities -> Browse the file system -> /MyOutput -> part-r-00000.
👁 hdfs-view-1Download the result file.
Each line in the output shows: