You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Datablox system is a directed graph with "modules of execution" as vertices and data flow represented by the directed edges. We need names for the following:
"module of execution": "block" seems to be the choice. Other contenders: "element" (used by Click), "node" (used by Storm), "modules". "Block" might be confused with blocking - i.e. processes that wait until some event happens.
Edges: "connection" seems to be the choice.
End-points of an edge: "ports", specifically "input" and "output" ports to show the direction of the edge.
Different kinds of end-points: "Push" indicates that data flows only in one direction. "Query" indicates that data is bidirectional but in a request-response pattern.
data flowing between "modules of execution": "log", specifically "typed logs" to show that logs have metadata/semantics associated with them.
Unfortunately Datablox clashes with "data blocks" - a terminology used by the backup people where they divide a file into a certain number of "blocks" and detect duplicates among those blocks. They also use "chunks" but that, I think, is use for variable size blocks.
I think the clash with backup data blocks is ok, as the framework isn't exclusively aimed at backup developers. We can make the distinction clearer by spelling our execution units as "blox". :-)
The name "log" for the message queues is a little confusing, given that Python has a "logging" framework for publishing events (e.g. for debugging). Is there a clearer name we can use?
We could use "message" or "event", but those aren't that good either.
I have currently renamed the rest of the things in the code and pushed the changes so I won't have the lock on it anymore. If we have a better name for "log", I'll rename it.
A Datablox system is a directed graph with "modules of execution" as vertices and data flow represented by the directed edges. We need names for the following:
"module of execution": "block" seems to be the choice. Other contenders: "element" (used by Click), "node" (used by Storm), "modules". "Block" might be confused with blocking - i.e. processes that wait until some event happens.
Edges: "connection" seems to be the choice.
End-points of an edge: "ports", specifically "input" and "output" ports to show the direction of the edge.
Different kinds of end-points: "Push" indicates that data flows only in one direction. "Query" indicates that data is bidirectional but in a request-response pattern.
data flowing between "modules of execution": "log", specifically "typed logs" to show that logs have metadata/semantics associated with them.
Directed Graph describing the system: "topology" (Storm), "configuration" (Click), "pipeline", "cluster", "graph"
Machines (physical or virtual) on which "modules of execution" run: "node"
Physical network connecting machines: "network"
Some modules which manage other modules: "shard"
The Datablox runtime which manages all modules: "master"
Datablox runtime unit which manages a single physical machine: "care-taker"
The text was updated successfully, but these errors were encountered: