Hadoop GenericWritable sample usage

GenericWritable is another Hadoop feature that let you pass values with different types to the reducer, it’s a wrapper for Writable instances.

Suppose we have different input formats (see MultipleInput), one is FirstClass and another one is SecondClass.(note: you can have multiple, not just 2). And you want to include both of them in your reducer based on the same key value, here is what you can do:

We use the same code used in MultipleInput.

1. Write a GenericWritable class MyGenericWritable

public class MyGenericWritable extends GenericWritable {

    private static Class<? extends Writable>[] CLASSES = null;

    static {
        CLASSES = (Class<? extends Writable>[]) new Class[] {
            FirstClass.class,
            SecondClass.class
             //add as many different class as you want
        };
    }
    //this empty initialize is required by Hadoop
    public MyGenericWritable() {
    }

    public MyGenericWritable(Writable instance) {
        set(instance);
    }

    @Override
    protected Class<? extends Writable>[] getTypes() {
        return CLASSES;
    }

    @Override
    public String toString() {
        return "MyGenericWritable [getTypes()=" + Arrays.toString(getTypes()) + "]";
    }
}

2. In your Mappers,

public static class FirstMap extends Mapper<Text, FirstClass, Text, MyGenericWritable> {
     public void map(Text key, FirstClass value, Context context) throws IOException, InterruptedException {
         System.out.println("FirstMap:" + key.toString() + " " + value.toString());
         context.write(key, new MyGenericWritable(value));
     }
}

public static class SecondMap extends Mapper<Text, SecondClass, Text, MyGenericWritable> {
     public void map(Text key, SecondClass value, Context context) throws IOException, InterruptedException {
         System.out.println("FirstMap:" + key.toString() + " " + value.toString());
         context.write(key, new MyGenericWritable(value));
     }
}

3. In your Reducer, use it like the following:

public class Reduce extends Reducer<Text, MyGenericWritable, Text, Text> {
    public void reduce(Text key, Iterable<MyGenericWritable> values, Context context) throws IOException, InterruptedException {
        for (MyGenericWritable value : values) {
            Writable rawValue = value.get();
            if(rawValue instanceof FirstClass){
                FirstClass firstClass = (FirstClass)rawValue;
                //do something
            }
            if(rawValue instanceof SecondClass){
                SecondClass firstClass = (SecondClass)rawValue;
                //do something
            }
        }
    }
}

3. In your job configuration, change the map output value class to MyGenericWritable

job.setMapOutputValueClass(MyGenericWritable.class);

pretty simple right?

8 thoughts on “Hadoop GenericWritable sample usage”

lidoublewen March 22, 2013 at 2:03 am

我也是这样是用GenricWritable的，为什么在reduce的时候会出现java.lang.nosuchmethodException的？

Reply ↓
1. Chun 大春饵 Post authorMarch 25, 2013 at 5:25 pm
  
  你能把具体的exception贴出来吗？
  
  Reply ↓
Ayushman Ahish February 12, 2014 at 1:45 am

Awesome post buddy! You rocks…. Thanks for such a great example

Reply ↓
Balaji April 9, 2014 at 3:40 pm

Nice one.

Reply ↓
Matthew July 8, 2014 at 3:52 am

Thanks, used it a message class for Apache Giraph and it works great.

Reply ↓
1. purplechun Post authorJuly 8, 2014 at 12:57 pm
  
  Glad to know it helps:)
  
  Reply ↓
Darya October 22, 2019 at 12:23 pm

Hi there,
Thank you for sharing these great code! I am getting the error while using them. The error is “The type of instance is: class org.apache.hadoop.io.Text, which is not registered. ” at the end of getTypes() method.
Do you have any idea about this error?
Best regards.

Reply ↓
1. Darya October 22, 2019 at 5:24 pm
  
  I could resolve it. Your code worked perfectly.
  Thanks!
  
  Reply ↓