Premature optimization is the root of all evil

Today during the code review, an important lesson was learned.

If you wrote Hadoop reducer before, you will know that one Reducer host will have many keys assigned to it based on the partition method. And in the run() method, it will iterate the keys and corresponding values and pass them to reducer() method, so each call of reducer() will handle only one key and its values.

Continue reading



Continue reading

fix STATIC_URL is empty in Django

When using static files in my Django application, the STATIC_URL is empty even I set it in the, after a few research, here is how I fixed it.

My static image files are located at movie/static/img (my application is movie)

1. in

from django.template import RequestContext

if you use render_to_response, make sure you add context_instance=RequestContext(request) in it,

render_to_response('movie/index.html', {'movie_list': movie_list}, context_instance=RequestContext(request));

Continue reading

setup Django 1.4 on Godaddy linux economy host

In case you want to use Django framework in Godaddy’s Linux Economy Host, here is the steps:

1. Godaddy has virtualenv installed, so first, create a virtual environment venv: (I use $HOME/lib/ for all the installed stuff below)

cd ~/
mkdir lib
cd lib
virtualenv --no-site-packages venv

The python package folder is $HOME/lib/venv/lib/python2.7/site-packages

2. Install the latest Django through pip
pip install Django

Continue reading

setup Git on Godaddy linux economy host

There is no version control on Godaddy linux host, which makes the code deployment very painful. So here is how to setup Git on the Godaddy host.

In this tutorial, we set repository on the Godaddy host.

1. download the pre-build git binaries, Godaddy uses CentOS, and the binary is built under CentOS. (Thanks to here)
In your $HOME folder on Godaddy host

% mkdir lib
% cd lib
% mkdir git
% cd git
% wget
% tar -xvzf centos5.2-git.tar.gz

Continue reading

Hadoop MultipleInputs sample usage

MultipleInputs is a feature that supports different input formats in the MapReduce.

For example, we have two files with different formats:

(1) First file format:


(2) Second file format:


In order to read the custom format, we need to write Record Class, RecordReader, InputFormat for each one.

InputFormat is needed by MultipleInputs, an InputFormat use RecordReader to read the file and return value, the value is a Record Class instance

Here is the implementation:
Continue reading

Hadoop GenericWritable sample usage

GenericWritable is another Hadoop feature that let you pass values with different types to the reducer, it’s a wrapper for Writable instances.

Suppose we have different input formats (see MultipleInput), one is FirstClass and another one is SecondClass.(note: you can have multiple, not just 2). And you want to include both of them in your reducer based on the same key value, here is what you can do:

We use the same code used in MultipleInput.

Continue reading

Build Local Single Node Hadoop Cluster on Linux

This post shows how to build a local single node Hadoop cluster on Linux.


(1) Install JDK , Download Link

(2) install ANT, Download Link

Use the bin version, and add the following lines in your

export PATH=${PATH}:${ANT_HOME}/bin


Install Hadoop:

(1) Download Hadoop, Download Link
Continue reading