Skip to content
Chris Lu edited this page Nov 23, 2015 · 5 revisions

Map() and Filter() are the very basic parts of MapReduce.

In Glow, the input types are always user specified. Now we assume input types is Value, or (Key, Value)

Filter()

Filter() accepts:

 func(Value) bool
 func(Key, Value) bool
 func(Key, leftValue, rightValue) bool    // invoke on outputs from Join()
 func(Key, leftValues, rightValues) bool  // invoke on outputs from CoJoin()

Map()

Map() accepts 2 groups of functions. One is using function return values as output, the other is outputting to channels.

Normal function return:

 func(Value) NewValueType
 func(Key, Value) NewValueType
 func(Key, Value) (NewKeyType, NewValueType)
 func(Key, leftValue, rightValue) (NewKeyType, NewValueType)    // invoke on outputs from Join()
 func(Key, leftValues, rightValues) (NewKeyType, NewValueType)  // invoke on outputs from CoJoin()

On other frameworks, you can flatten outputs via flatMap(). In Glow, we can just use channels as a way to emit zero or multiple values from one value.

Output through channel,

 func(Value, chan NewValueType)
 func(Key, Value, chan NewValueType)
 func(Key, Value, chan flow.KeyValue)  // flow.KeyValue is defined by Glow. Not user specified.

Example

	flow.New().TextFile(
		"/etc/passwd", 2,
	).Filter(func(line string) bool {
		// println("filter:", line)
		return !strings.HasPrefix(line, "#")
	}).Map(func(line string, ch chan string) {
		for _, token := range strings.Split(line, ":") {
			ch <- token
		}
	}).Map(func(key string) int {
		// println("map:", key)
		return 1
	}).Reduce(func(x int, y int) int {
		// println("reduce:", x+y)
		return x + y
	}).Map(func(x int) {
		println("count:", x)
	}).Run()

Glow BasicMapReduce

Clone this wiki locally