{"id":69,"date":"2012-05-24T10:27:55","date_gmt":"2012-05-24T14:27:55","guid":{"rendered":"http:\/\/lichun.cc\/blog\/?p=69"},"modified":"2012-05-24T10:27:55","modified_gmt":"2012-05-24T14:27:55","slug":"hadoop-genericwritable-sample-usage","status":"publish","type":"post","link":"https:\/\/www.lichun.cc\/blog\/2012\/05\/hadoop-genericwritable-sample-usage\/","title":{"rendered":"Hadoop GenericWritable sample usage"},"content":{"rendered":"<p><strong>GenericWritable<\/strong> is another Hadoop feature that let you pass values with different types to the reducer, it&#8217;s a wrapper for Writable instances.<\/p>\n<p>Suppose we have different input formats (see <a title=\"Hadoop MultipleInputs \u7684\u7528\u6cd5\" href=\"http:\/\/lichun.cc\/blog\/2012\/05\/hadoop-multipleinputs-usage\/\" target=\"_blank\">MultipleInput<\/a>), one is <strong>FirstClass<\/strong> and another one is <strong>SecondClass<\/strong>.(note: you can have multiple, not just 2). And you want to include both of them in your reducer based on the same key <strong>value<\/strong>, here is what you can do:<\/p>\n<p><span style=\"color: #ff0000;\">We use the same code used in<\/span> <a title=\"Hadoop MultipleInputs \u7684\u7528\u6cd5\" href=\"http:\/\/lichun.cc\/blog\/2012\/05\/hadoop-multipleinputs-usage\/\" target=\"_blank\">MultipleInput<\/a>.<\/p>\n<p><!--more--><\/p>\n<p>1. Write a GenericWritable class <strong>MyGenericWritable<\/strong><\/p>\n<pre>public class MyGenericWritable extends GenericWritable {\n\n\u00a0\u00a0\u00a0\u00a0private static Class&lt;? extends Writable&gt;[] CLASSES = null;\n\n\u00a0\u00a0 \u00a0static {\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0CLASSES = (Class&lt;? extends Writable&gt;[]) new Class[] {\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0FirstClass.class,\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0SecondClass.class\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/add as many different class as you want\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0};\n\u00a0\u00a0 \u00a0}\n\u00a0\u00a0\u00a0\u00a0\/\/this empty initialize is required by Hadoop\n\u00a0\u00a0 \u00a0public MyGenericWritable() {\n\u00a0\u00a0 \u00a0}\n\n\u00a0\u00a0 \u00a0public MyGenericWritable(Writable instance) {\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0set(instance);\n\u00a0\u00a0 \u00a0}\n\n\u00a0\u00a0 \u00a0@Override\n\u00a0\u00a0 \u00a0protected Class&lt;? extends Writable&gt;[] getTypes() {\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0return CLASSES;\n\u00a0\u00a0 \u00a0}\n\n\u00a0\u00a0\u00a0\u00a0@Override\n\u00a0\u00a0\u00a0 public String toString() {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return \"MyGenericWritable [getTypes()=\" + Arrays.toString(getTypes()) + \"]\";\n\u00a0\u00a0\u00a0\u00a0}\n}<\/pre>\n<p>2. In your Mappers,<\/p>\n<pre>public static class FirstMap extends Mapper&lt;Text, FirstClass, Text, MyGenericWritable&gt; {\n     public void map(Text key, FirstClass value, Context context) throws IOException, InterruptedException {\n         System.out.println(\"FirstMap:\" + key.toString() + \" \" + value.toString());\n         context.write(key, new MyGenericWritable(value));\n     }\n}<\/pre>\n<pre>public static class SecondMap extends Mapper&lt;Text, SecondClass, Text, MyGenericWritable&gt; {\n     public void map(Text key, SecondClass value, Context context) throws IOException, InterruptedException {\n         System.out.println(\"FirstMap:\" + key.toString() + \" \" + value.toString());\n         context.write(key, new MyGenericWritable(value));\n     }\n}<\/pre>\n<p>3. In your Reducer, use it like the following:<\/p>\n<pre>public class Reduce extends Reducer&lt;Text, MyGenericWritable, Text, Text&gt; {\n\u00a0\u00a0 \u00a0public void reduce(Text key, Iterable&lt;MyGenericWritable&gt; values, Context context) throws IOException, InterruptedException {\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0for (MyGenericWritable value : values) {\n            Writable rawValue = value.get();\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if(rawValue instanceof FirstClass){\n                FirstClass firstClass = (FirstClass)rawValue;\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\/\/do something\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0if(rawValue instanceof SecondClass){\n                SecondClass firstClass = (SecondClass)rawValue;\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\/\/do something\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0}\n\u00a0\u00a0 \u00a0}\n}<\/pre>\n<p>3. In your job configuration, change the map output value class to <strong>MyGenericWritable<\/strong><\/p>\n<pre>job.setMapOutputValueClass(MyGenericWritable.class);<\/pre>\n<p>pretty simple right?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>GenericWritable is another Hadoop feature that let you pass values with different types to the reducer, it&#8217;s a wrapper for Writable instances. Suppose we have different input formats (see MultipleInput), one is FirstClass and another one is SecondClass.(note: you can have multiple, not just 2). And you want to include both of them in your [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[19],"tags":[21,16,83],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p2s9sh-17","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/posts\/69"}],"collection":[{"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/comments?post=69"}],"version-history":[{"count":0,"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/posts\/69\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/media?parent=69"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/categories?post=69"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lichun.cc\/blog\/wp-json\/wp\/v2\/tags?post=69"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}