-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tweak usage of fastpriorityqueue #124
Conversation
@@ -1012,8 +1012,7 @@ module.exports = function (log, indexesPath) { | |||
function sortedByTimestamp(bitset, descending) { | |||
updateCacheWithLog() | |||
const order = descending ? 'descending' : 'ascending' | |||
if (sortedCache[order].has(bitset)) | |||
return sortedCache[order].get(bitset).clone() | |||
if (sortedCache[order].has(bitset)) return sortedCache[order].get(bitset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this clone()
is needed, but I'm moving it outside.
@@ -1023,7 +1022,8 @@ module.exports = function (log, indexesPath) { | |||
timestamp: indexes['timestamp'].tarr[seq], | |||
}) | |||
}) | |||
sortedCache[order].set(bitset, fpq.clone()) | |||
fpq.trim() | |||
sortedCache[order].set(bitset, fpq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I think this clone was not needed.
And I added the trim() because FPQ docs said
trim(): clean up unused memory in the heap; useful after high-churn operations like many add()s then remove()s.
sorted = sorted.clone() | ||
sorted.removeMany(() => true, seq) | ||
} | ||
sliced = sorted.kSmallest(limit || Infinity) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in this block is that there is an optimization for the case of seq=0
, limit=1
, where we don't need to mutate the FPQ, and therefore we can eliminate the clone() and we don't need to do any add or remove (which are O(log n)
) and instead we can use a O(1)
operation, peek(). It doesn't happen that often, but might make startup a bit more pleasant.
The second block also avoids a clone()
if seq === 0
, because there's nothing to remove. So we postpone the clone() until it's absolutely necessary. I'm not sure how removeMany
works, but here I'm assuming that it runs through the elements in order. I'm curious whether this is true and whether it works in the wild.
kSmallest
, though, is a good find! It's actually not documented on the readme, but it's in the implementation. Comments in the code say:
// return the k 'smallest' elements of the queue
// runs in O(k log k) time
// this is the equivalent of repeatedly calling poll, but
// it has a better computational complexity, which can be
// important for large data sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really nice find
This PR didn't run the tests because CI is configured to only run PRs that target |
This is great, I was thinking along the same lines after you talked about optimizing this, glad you found the extra method. I have tested this and it seems to be a bit faster. I'm testing 2 x top 25 public and top 25 private. |
@arj03 My review of #123 is in the form of a PR (that once merged would go into branch
faster-sort
). Comments inline. Tests pass but I'm not 100% sure this is correct, so if you could confirm on your benchmarks of public/private, whether it got faster and whether it gaves the exact same results, that would be good.