Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a broadcast via spark & switched to a single long-lived spark context #51

Merged
merged 2 commits into from
Apr 21, 2015
Merged

Added a broadcast via spark & switched to a single long-lived spark context #51

merged 2 commits into from
Apr 21, 2015

Conversation

tomerk
Copy link
Contributor

@tomerk tomerk commented Apr 17, 2015

Relates to (but doesn't fully cover) issues #48 #49 and #46

@tomerk
Copy link
Contributor Author

tomerk commented Apr 17, 2015

@dcrankshaw

@@ -54,9 +56,17 @@ class VeloxApplication extends Application[VeloxConfiguration] with Logging {

// this assumes that etcd is running on each velox server
val etcdClient = new EtcdClient(conf.hostname, 4001, conf.hostname, new DispatchUtil)
logWarning("Starting spark context")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/logWarning/logInfo

@dcrankshaw
Copy link
Contributor

Overall this looks good. I like having the long-lived SparkContext. And using Spark for broadcasts seems to significantly simplify things in the long-run, at the cost of a well-understood performance penalty (an extra copy). However, I'm a little concerned about having a single shared SparkContext and BroadcastProvider. It should be the case that a single model is only ever interacting with Spark from a single thread, because bulk retrain is guarded by a global per-model lock. However, there is nothing to stop multiple models from doing bulk retrain at once. Should there be?

private val cachedValues: mutable.Map[Version, T] = mutable.Map()

override def put(value: T, version: Version): Unit = this.synchronized {
sc.parallelize(Seq(value)).saveAsObjectFile(s"$path/$version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if these methods get called concurrently and they aren't synchronized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, dumb question. They could both be writing to the same place at the same time.

@dcrankshaw
Copy link
Contributor

LGTM

dcrankshaw added a commit that referenced this pull request Apr 21, 2015
Added a broadcast via spark & switched to a single long-lived spark context
@dcrankshaw dcrankshaw merged commit 3eff393 into amplab:develop Apr 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants