-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[performance] feed_to_graph_path is slow on larger feeds #12
Comments
Addressed (but still slow) via #14 |
Parallelization with performant pickling enabled via #12 |
Noticing the unaccounted for stop id management step is taking quite a while:
^ Example from LA Metro GTFS zip file. |
On smaller feeds (or even mid-sized feeds, like AC Transit), MP is slower. I need to figure out how to intelligently navigate away from using MP in these situations. Sigh, this whole performance issue is not good. Example:
Above run once with MP as False and one time as True. No MP:
Yes MP:
|
Updated performance, with the last few updates incorporates (see all commits from Wed to today): Without MP: 87.5s (63.3% faster) cc @yiyange |
I am curious in what cases using multi-processing is faster; when i played with it, it is much slower than without using it. |
There is a higher initialization cost to using multiprocessing. The gains can be seen primarily on larger datasets, such as LA Metro. I should bench mark that. |
Whoops sorry didn't mean to close. |
LA Metro (without digging around for the exact numbers) used to take 12-15 minutes. It now takes: So, no observable improvement. Of course, it's running in a Docker environment that only has access to 2 CPUs on my '16 Macbook Pro. A better test would be to use a virtual machine on AWS / GCloud or wherever and see what gains are achieved there. That said, we can observe that there are pretty limited (essentially no observable) gains to be had by MP for the typical user/use case (local machine, in a Notebook like environment). This is something that should be addressed long term. |
test_feed_to_graph_path
itself is the slowest test by far. Create benchmarks and identify which steps are slowest. Find ways to speed up operations and get graph creation process to be as fast as possible.The text was updated successfully, but these errors were encountered: