Just like MultipleInputs, Hadoop also supports MultipleOutputs, thanks to the equality, we can output different data/format in the same MapReduce job.

It’s very easy to use this useful feature, as before, I will mainly use Java code to demonstrate the usage, hope the code can explain itself 🙂

Note: I wrote and ran the following code using Hadoop 1.0.3, but it should be working in 0.20.205 as well

1. MultipleOutputs class

First of all, import the MultipleOutputs,
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

2. introduce the `MultipleOutputs.addNamedOutput`

There are 5 parameters for this method:

Job job
         pass the haddop Job created
String namedOutput
         give a unique name for this output, the output for
         this one will be nameOutput-r-XXXXX
Class<? extends OutputFormat> outputFormatClass  
         If you have a custom output format, pass the output 
         format in, if you just output text format,     
         use the hadoop TextOutputFormat.class
Class<?> keyClass 
         the class type of the key, if you don't output key, 
         use NullWritable.class
Class<?> valueClass
         the class type of the value, if you have a custom 
         value class, use it here, if the value is text 
         format, use Text.class

3. Codes

What I tried to do here is to separate the columns for a given input, different columns go to different output.

Sample Data:

1 APPLE RED
2 ORANGE BLACK
3 BANANA GREEN

Here I want to separate the fruit column and the color column.

3.1 Setup the driver for this MapReduce job:

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
   Path inputDir = new Path(args[0]);
   Path outputDir = new Path(args[1]);

   Configuration conf = new Configuration();

   Job job = new Job(conf);
   job.setJarByClass(MultipleOutputsTest.class);
   job.setJobName("MultipleOutputs Test");

   job.setMapOutputKeyClass(Text.class);
   job.setMapOutputValueClass(Text.class);

   job.setMapperClass(myMapper.class);
   job.setReducerClass(myReducer.class);

   FileInputFormat.setInputPaths(job, inputDir);
   FileOutputFormat.setOutputPath(job, outputDir);

   MultipleOutputs.addNamedOutput(job, fruitOutputName, TextOutputFormat.class, NullWritable.class, Text.class);
   MultipleOutputs.addNamedOutput(job, colorOutputName, TextOutputFormat.class, NullWritable.class, Text.class);

   job.waitForCompletion(true);
}

The fruitOutputName and colorOutputName are string I defined, they are “fruit” and “color” respectively, so for fruit output, the file name will be fruit-r-000XX.

3.2 Reducer

The next important part is the reducer. For single output, we use context.write(KEY, VALUE), but here it’s different.

public static class myReducer extends Reducer<Text, Text, Text, Writable> {
    MultipleOutputs<Text, Text> mos;

    @override
    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        for (Text value : values) {
            String str = value.toString();
            String[] items = str.split("\t");

            mos.write(fruitOutputName, NullWritable.get(), new Text(items[1]));
            mos.write(colorOutputName, NullWritable.get(), new Text(items[2]));
        }
    }

    @override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        mos.close();
    }
}

Please pay attention to the setup and cleanup function, there will be error if you didn’t initialize or close the MultipleOutputs object.

3.3 Output

As we expected, the output of the sample inputs will be:

fruit-r-00000:
APPLE
ORANGE
BANANA
color-r-00000:
RED
BLACK
GREEN

Here is the example code I used. Download

15 thoughts on “How to use Hadoop MultipleOutputs”

Amit March 11, 2014 at 2:40 am

Nice Post.But how to write these files to separate directories like fruit/fruit-r-00000 and color/color-r-0000?

Reply ↓

purplechun Post authorMarch 11, 2014 at 10:28 pm

in order to do that, you need to overwrite a multipleOutput method. I can update the blog later for that information

Reply ↓

Ajay April 5, 2014 at 12:42 am

To get color/color-r-0000, try:

mos.write(colorOutputName, NullWritable.get(), new Text(items[2]), colorOutputName + “/” + colorOutputName);

Reply ↓

Sunil April 14, 2014 at 3:15 am

Can some one give me an example on how to use outputs of 2 mapper function in one reducer function? My task is to read data from two input files which will be accessed in two separate mapper function and then use the results of the both to come up with some solution..any help would be appreciated..
Thanks..

Reply ↓

purplechun Post authorApril 14, 2014 at 12:20 pm

I have a post about MultipleInputs : http://www.lichun.cc/blog/2012/05/hadoop-multipleinputs-usage/, which should solve your problem

Reply ↓

Andrew December 14, 2014 at 2:08 pm

Hi,
thank you for your this nice example but I would like to ask you for something yet. You wrote that I will get files fruit-r-00000 which consits of 3 words (apple,orange, banana) and the second file color-r-00000 which consits of 3 words too (but in this case from words red, black, green). Unfortunately, in my case I get 3 files for fruit case (fruit-r-00080,fruit-r-00081 and fruit-r-00082). Each this file contents only one “fruit word”. Analogously for color case, I get 3 files for color (color-r-00080,color-r-00081,color-r-00082) and again each file contents only one “color word”. Sure, at the end I can merge these files (using “hdfs dfs -getmerge /path/to/files/color* /path/to/destiny/path” and analogously for fruit case) but I would like to know where I can have problem..I use “Configuration conf = getConf(); ” insted of “Configuration conf = new Configuration();” in the driver part but the whole rest of code is the same as you present and I think that this difference does not cause this. Job is done without problems, no errors, no warnings…I would really appreciate your help or any advices for it. Thanks… Best, Andrew

Reply ↓

purplechun Post authorDecember 17, 2014 at 1:20 pm

there are three files because 3 reducers are working on it and each one has its own output, if you only want 1 output, put this line in your driver: job.setNumReduceTasks(1);

Reply ↓

Naveen Kumar B V April 13, 2015 at 2:43 pm

Hi,

I have a map-only job and cannot control the number of mappers as it depends on the number of input splits. Can you please let me know if there is any way to customize the name of the output file. I’m trying to

1. Generate just one output file from a mapper and
2. Customize the name of the output file to remove -m-0000 completely.

Thanks.
Naveen Kumar B.V

Reply ↓

Prashant June 17, 2015 at 4:31 am

Thank you Chun ,
Very Nice explanation ,moreover the code demonstration is self explanatory 🙂

Reply ↓

purplechun Post authorJune 20, 2015 at 8:31 am

Good to know! thanks 🙂

Reply ↓

Abdullah Khan June 23, 2015 at 12:57 am

Hi, I wanted to calculate the frequency of the words of a text file and at the same time the total number of words too. Frequencies are stored in the output path defined by FileOutputFormat.setOutputPath. Now, I want to store the total number of words in another text file. How, can I do that?
Thanks.

Reply ↓

Rahul May 7, 2016 at 4:57 am

Can you please tell me how to generate two output file from mapper , one is in format which i am transferring to reducer and other one is format which is transferring to final output ?

Reply ↓

Rahul May 7, 2016 at 11:16 am

Thanks great it solved

Reply ↓

Ankit May 11, 2016 at 5:17 am

Great doc.
I want further enhancement to this want to add header to each type of file

Reply ↓

shah September 27, 2016 at 4:44 am

Hi,
Great document for beginners.
i want to know the mrunit for this code.

Reply ↓

Chun

心有多大，舞台就有多大

How to use Hadoop MultipleOutputs

1. MultipleOutputs class

2. introduce the `MultipleOutputs.addNamedOutput`

3. Codes

3.1 Setup the driver for this MapReduce job:

3.2 Reducer

3.3 Output

Like this:

15 thoughts on “How to use Hadoop MultipleOutputs”

Leave a Reply to Abdullah Khan Cancel reply

1. MultipleOutputs class

2. introduce the MultipleOutputs.addNamedOutput

3. Codes

3.1 Setup the driver for this MapReduce job:

3.2 Reducer

3.3 Output

Share this:

Like this:

15 thoughts on “How to use Hadoop MultipleOutputs”

Leave a Reply to Abdullah Khan Cancel reply

2. introduce the `MultipleOutputs.addNamedOutput`