Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize vocabulary #9

Open
t-saideep opened this issue Nov 29, 2011 · 3 comments
Open

Formalize vocabulary #9

t-saideep opened this issue Nov 29, 2011 · 3 comments

Comments

@t-saideep
Copy link
Contributor

A Datablox system is a directed graph with "modules of execution" as vertices and data flow represented by the directed edges. We need names for the following:

"module of execution": "block" seems to be the choice. Other contenders: "element" (used by Click), "node" (used by Storm), "modules". "Block" might be confused with blocking - i.e. processes that wait until some event happens.

Edges: "connection" seems to be the choice.

End-points of an edge: "ports", specifically "input" and "output" ports to show the direction of the edge.

Different kinds of end-points: "Push" indicates that data flows only in one direction. "Query" indicates that data is bidirectional but in a request-response pattern.

data flowing between "modules of execution": "log", specifically "typed logs" to show that logs have metadata/semantics associated with them.

Directed Graph describing the system: "topology" (Storm), "configuration" (Click), "pipeline", "cluster", "graph"

Machines (physical or virtual) on which "modules of execution" run: "node"

Physical network connecting machines: "network"

Some modules which manage other modules: "shard"

The Datablox runtime which manages all modules: "master"

Datablox runtime unit which manages a single physical machine: "care-taker"

@t-saideep
Copy link
Contributor Author

Unfortunately Datablox clashes with "data blocks" - a terminology used by the backup people where they divide a file into a certain number of "blocks" and detect duplicates among those blocks. They also use "chunks" but that, I think, is use for variable size blocks.

@jfischer
Copy link
Contributor

I think the clash with backup data blocks is ok, as the framework isn't exclusively aimed at backup developers. We can make the distinction clearer by spelling our execution units as "blox". :-)

The name "log" for the message queues is a little confusing, given that Python has a "logging" framework for publishing events (e.g. for debugging). Is there a clearer name we can use?

@ghost ghost assigned t-saideep Jan 17, 2012
@t-saideep
Copy link
Contributor Author

We could use "message" or "event", but those aren't that good either.

I have currently renamed the rest of the things in the code and pushed the changes so I won't have the lock on it anymore. If we have a better name for "log", I'll rename it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants