-
Notifications
You must be signed in to change notification settings - Fork 249
Chris Lu edited this page Nov 23, 2015
·
5 revisions
Map() and Filter() are the very basic parts of MapReduce.
In Glow, the input types are always user specified. Now we assume input types is Value, or (Key, Value)
Filter() accepts:
func(Value) bool
func(Key, Value) bool
func(Key, leftValue, rightValue) bool // invoke on outputs from Join()
func(Key, leftValues, rightValues) bool // invoke on outputs from CoJoin()
Map() accepts 2 groups of functions. One is using function return values as output, the other is outputting to channels.
Normal function return:
func(Value) NewValueType
func(Key, Value) NewValueType
func(Key, Value) (NewKeyType, NewValueType)
func(Key, leftValue, rightValue) (NewKeyType, NewValueType) // invoke on outputs from Join()
func(Key, leftValues, rightValues) (NewKeyType, NewValueType) // invoke on outputs from CoJoin()
On other frameworks, you can flatten outputs via flatMap(). In Glow, we can just use channels as a way to emit zero or multiple values from one value.
Output through channel,
func(Value, chan NewValueType)
func(Key, Value, chan NewValueType)
func(Key, Value, chan flow.KeyValue) // flow.KeyValue is defined by Glow. Not user specified.
flow.New().TextFile(
"/etc/passwd", 2,
).Filter(func(line string) bool {
// println("filter:", line)
return !strings.HasPrefix(line, "#")
}).Map(func(line string, ch chan string) {
for _, token := range strings.Split(line, ":") {
ch <- token
}
}).Map(func(key string) int {
// println("map:", key)
return 1
}).Reduce(func(x int, y int) int {
// println("reduce:", x+y)
return x + y
}).Map(func(x int) {
println("count:", x)
}).Run()