Apache Hadoop is a software framework for distributed processing of large data sets using simple programming models(MapReduce). To use hadoop in P2C, we provided a set of utilities to deploy clusters similar to the interface of vcluster. In this tutorial, we will describe how to run the wordcount example from the Apache hadoop distribution deployed on P2C. 1. First, connect to the P2C cloud controller using ssh with p2cuser as username and password. (Note: You need to obtain access permission from the P2C administrator).$ ssh p2cuser@10.0.3.82 $ vhadoop wordcount 2 $ ssh hduser@10.0.3.227 $ ./rebuild.sh $ hdfs dfsadmin -report $ hdfs dfs -mkdir /wc-in $ hdfs dfs -ls / $ hdfs dfs -copyFromLocal examples/tagalog.txt /wc-in $ hdfs dfs -ls /wc-in $ hadoop jar examples/hadoop-mapreduce-examples-2.4.0.jar wordcount /wc-in /wc-out $ hdfs dfs -ls / 12. Congratulations!You have succesfully run a MapReduce application on a three-node Apache Hadoop cluster!View the actual result of the count using the command below. $ hdfs dfs -cat /wc-out/part-r-00000 | less You can also view the status of the mapreduce jobs through a web interface at http://<ip of master node>:8080. HDFS status can be viewed at http://<ip of master node>:50070. For more information, email jchermocilla@up.edu.ph. |