Though quite late, but around a week back I started on Hadoop, it took couple of days for me (with the help of my team members) to set up a local Hadoop installation on my system using cygwin.
I wrote an example Map Reduce, in which Mapper processes a given file to calculate some GPS displacement for a person based on lattitute and longitude information and finally Reducer figures out the maximum displacement on the combined displacement list.
Every thing went well, I got stuck at a point where I was unable to understand how KeyIn and ValueIn are mapped from HDFS file read. How can I make customize which goes in key and what goes in value, hadoop wiki states,
Hence, it depends on the specific implementation of RecordReader, in case of TextInputFormat we use LineRecordReader which makes meaningless LongWritable (Hadoop's serialization format, Writable's implementation for Long datatype) keys as an input to Mapper. KeyValueLineRecordReader in KeyValueTextInputFormat (not in hadoop-core-0.20.2, I can see in mapreduce trunk), reads the text file and seperates key and value by /t (tab) seperator in the input file.