Monthly Archives: June 2012

《To the Moon》, 玩游戏久违的感动

《To the Moon》是那种你第一眼看上去会嗤之以鼻的游戏,简单略显粗糙的画面让你觉得至少回到了十年前。但正是这样一款70MB、16bit画面的游戏,被Gamespot评为2011年最佳剧情独立游戏。我想这样一款外表平凡的游戏能获此殊荣,一定有特别过人之处,再加上评论说游戏音轨特别的赞,于是一冲动就去官网花12刀买了正版,支持一下原创独立游戏。

Continue reading

Use Hadoop DistributedCache to cache files in MapReduce

DistributedCache is a very useful Hadoop feature that enables you to pass resource files to each mapper or reducer.

For example, you have a file stopWordList.txt that contains all the stop words you want to exclude when you do word count. And In your reducer, you want to check each value passed by mapper, if the value appears in the stop word list, we pass it and goes to the next value.

In order to use DistributedCache, first you need to set the file in the job configuration driver:

Continue reading

Wordcount mapreduce example using Hive on local and EMR

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

In short, you can run a Hadoop MapReduce using SQL-like statements with Hive.

Here is an WordCount example I did using Hive. The example first shows how to do it on your Local machine, then I will show how to do it using Amazon EMR.

Local

1. Install Hive.

First you need to install Hadoop on your local, here is a post for how to do it. After you installed Hadoop, you can use this official tutorial.

Continue reading

平淡宁静的电影

一直觉得,看电影是一种寻觅感觉的过程。推荐一些电影给大家,平淡、节奏舒缓、不讲大道理,只从帧里行间,凭借风一样的触手,拨动着观影人。希望这些电影能在这炎炎夏日,紧张浮躁的社会环境下带给大家心灵一丝平淡的安静吧。

(P.S. 影片挑选全凭个人感觉,如果看官感觉归档不妥,还请见谅)

(P.S.S. 没有写一些影评,一方面是因为自己文笔一般,另一方面是作为推荐的电影,还是大家自己来细细品味吧,我的感觉不重要,哈哈)

一、东京日和

导演: 竹中直人

编剧: 荒木经惟 / 岩松了

主演: 竹中直人 / 中山美穗 / 松隆子 / 浅野忠信 / 森田芳光 / 冢本晋也 / 三浦友和

Continue reading

那些年我们一起走过的高考

高考过去快两周了,对于我而言,已经是七年。七年本是可以抹掉很多记忆的,但不适用于高考。而由于不争气的我参加过两次高考,这种记忆尤为深刻。

第一次高考感觉来的很突然,虽然也在复习,也在看着黑板上的倒计时,也在毕业留影,但当它临近时,依旧还是不知所措,以至于我对于第一次高考的过程印象不是很深刻。

Continue reading

Premature optimization is the root of all evil

Today during the code review, an important lesson was learned.

If you wrote Hadoop reducer before, you will know that one Reducer host will have many keys assigned to it based on the partition method. And in the run() method, it will iterate the keys and corresponding values and pass them to reducer() method, so each call of reducer() will handle only one key and its values.

Continue reading