Though quite late, but around a week back I started on Hadoop, it took couple of days for me (with the help of my team members) to set up a local Hadoop installation on my system using cygwin.
I wrote an example Map Reduce, in which Mapper processes a given file to calculate some GPS displacement for a person based on lattitute and longitude information and finally Reducer figures out the maximum displacement on the combined displacement list.
Every thing went well, I got stuck at a point where I was unable to understand how KeyIn and ValueIn are mapped from HDFS file read. How can I make customize which goes in key and what goes in value, hadoop wiki states,
Hence, it depends on the specific implementation of RecordReader, in case of TextInputFormat we use LineRecordReader which makes meaningless LongWritable (Hadoop's serialization format, Writable's implementation for Long datatype) keys as an input to Mapper. KeyValueLineRecordReader in KeyValueTextInputFormat (not in hadoop-core-0.20.2, I can see in mapreduce trunk), reads the text file and seperates key and value by /t (tab) seperator in the input file.
2 comments:
If you're trying local Hadoop and HBase installation on Cygwin, please try my windows installation script @ http://hadoop4win.sf.net.
Some sample to test Hadoop and HBase on Cygwin could be found at http://trac.nchc.org.tw/cloud/wiki/Hadoop4Win
Sorry that I had not yet translate the document to English.
Just for your reference.
Thanks for sharing this.,
Leanpitch provides crash course in Facilitating change everyone can use it wisely.
Facilitating change
Facilitating change in the workplace
Post a Comment