-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
listTree performance #45
Comments
|
That sounds pretty cool @ Q streams. In a particular case that I am working on, I need stat results on all files to check their mtimes, and so it is no problem if I get the entire array all at once; (The only problem is that after I listTree(), then I have to run stat() on all the paths. It sounds like Q streams or the guard emitter might help with that... but that's a separate issue from this.) Edit: I should say: streaming isn't a problem yet here for me, and probably won't be in this application. My directory trees and associated data aren't big enough to eat up all the RAM - they can happily be all resident in memory for now. But I can see benefit to streaming though otherwise. |
The guard function can already do what you need, I think. FS.listTree(path, function (name, stat) {
// return true to include
// return false to exclude
// return null to exclude and skip subtree
}))
.then(function () {
}) |
That's a cool trick with guard. Looks like this would do the trick of snarfing the stats: var stats = [];
FS.listTree(path, function(name, stat) {
stats.push([name, stat])
return true; // edit: don't forget, this is a guard function afterall
})
.then(function() {
return stats;
})
.then(function(stats) {
// awesome, got the stats I need!
}); But this doesn't address the speed gap of using Q. The issue here is how long it takes listTree() to finish. The fact that it collects a list isn't really a problem. Using Q on stat and readdir inside of listTree is the primary cause of the slow down. I can rewrite listTree() to internally not use promises, and then it works like a little speed demon. Obviously, then it loses the clarity that promises provide which is undesirable. So, I'm wondering if there is potential to identify a specific bottleneck that for some reason causes Q-code to be slower than Node.js callback-style code (and by a large factor: 10x). |
@dhbaird I presume that the problem is recursive array aggregation. It might be faster to pass a single array forward and push results onto it instead of concatenating results from |
I thought recursive array aggregation might have been a problem too, but take a look at When you get a moment, could you take a look at the following: Edit: Here's the diff (to make it easier to compare the two): --- a.js 2013-08-12 17:58:04.111106000 -0600
+++ b.js 2013-08-12 17:58:07.262917000 -0600
@@ -1,7 +1,9 @@
+var Q = require('q');
var path = require('path');
var FS = require('fs');
-function asyncFindTree(basePath, cb) {
- FS.stat(basePath, function(error, stat) {
+function asyncFindTreeQstat(basePath, cb) {
+ return Q.nfcall(FS.stat, basePath)
+ .then(function(stat) {
if (stat.isDirectory()) {
FS.readdir(basePath, function(error, children) {
children = children.map(function(child) { return path.join(basePath, child); }); |
(Note - I edited previous comment to include diff) |
@dhbaird That is very curious. |
@kriskowal something like this? var stat2 = Q.nfbind(FS.stat); // aka denodeify
function asyncFindTreeQstat(basePath, cb) {
return stat2(basePath)
.then(function(stat) {
... Good idea, but performance remains about the same. |
Weirder and weirder. I am going to have to let this issue idle for a while, but if you send a PR that helps the issue without impacting the API, I can get a patch out quickly. If it requires an API change, I can entertain a proposal as well. |
Sounds good to me. I just wanted to get the word out, and I'm happy if you want to close this and re-open it later. I'll keep looking into it. |
I will probably introduce listStream() and listTreeStream() in the v2 branch. |
I noticed that listTree() was going slower than I expected for large numbers of files. Therefore, I decided to do some benchmarks to narrow down the slow down. As a result, it looks like an opportunity exists to close a performance gap in either Q or Q-IO. Performance becomes very noticeably affected in the benchmark code with Node-style callbacks versus Q promises, and therefore, that may be a good place to investigate optimization. HTH! More info and/or pull requests to follow, I hope.
Time information in seconds: lower Real time is better.
Edit: I updated a wrong measurement for
require('q-io/fs').listTree('stuff')
. The correct measurement is ~40 seconds (not 35 seconds as I had written).The vanilla sync and async times are very good. The shell
find | stat
result is also provided for a baseline reference outside of Node.js. I've also included the glob() tool, which looks like it could benefit from optimization as well. Our goal is to get Q and/or Q-IO almost as good as vanilla async.Benchmark codes are:
The text was updated successfully, but these errors were encountered: