Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweak usage of fastpriorityqueue #124

Merged
merged 1 commit into from
Feb 20, 2021
Merged

tweak usage of fastpriorityqueue #124

merged 1 commit into from
Feb 20, 2021

Conversation

staltz
Copy link
Member

@staltz staltz commented Feb 20, 2021

@arj03 My review of #123 is in the form of a PR (that once merged would go into branch faster-sort). Comments inline. Tests pass but I'm not 100% sure this is correct, so if you could confirm on your benchmarks of public/private, whether it got faster and whether it gaves the exact same results, that would be good.

@@ -1012,8 +1012,7 @@ module.exports = function (log, indexesPath) {
function sortedByTimestamp(bitset, descending) {
updateCacheWithLog()
const order = descending ? 'descending' : 'ascending'
if (sortedCache[order].has(bitset))
return sortedCache[order].get(bitset).clone()
if (sortedCache[order].has(bitset)) return sortedCache[order].get(bitset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this clone() is needed, but I'm moving it outside.

@@ -1023,7 +1022,8 @@ module.exports = function (log, indexesPath) {
timestamp: indexes['timestamp'].tarr[seq],
})
})
sortedCache[order].set(bitset, fpq.clone())
fpq.trim()
sortedCache[order].set(bitset, fpq)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I think this clone was not needed.

And I added the trim() because FPQ docs said

trim(): clean up unused memory in the heap; useful after high-churn operations like many add()s then remove()s.

sorted = sorted.clone()
sorted.removeMany(() => true, seq)
}
sliced = sorted.kSmallest(limit || Infinity)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this block is that there is an optimization for the case of seq=0, limit=1, where we don't need to mutate the FPQ, and therefore we can eliminate the clone() and we don't need to do any add or remove (which are O(log n)) and instead we can use a O(1) operation, peek(). It doesn't happen that often, but might make startup a bit more pleasant.

The second block also avoids a clone() if seq === 0, because there's nothing to remove. So we postpone the clone() until it's absolutely necessary. I'm not sure how removeMany works, but here I'm assuming that it runs through the elements in order. I'm curious whether this is true and whether it works in the wild.

kSmallest, though, is a good find! It's actually not documented on the readme, but it's in the implementation. Comments in the code say:

// return the k 'smallest' elements of the queue
// runs in O(k log k) time
// this is the equivalent of repeatedly calling poll, but
// it has a better computational complexity, which can be
// important for large data sets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really nice find

@staltz staltz requested a review from arj03 February 20, 2021 12:32
@staltz
Copy link
Member Author

staltz commented Feb 20, 2021

This PR didn't run the tests because CI is configured to only run PRs that target master branch

@arj03
Copy link
Member

arj03 commented Feb 20, 2021

This is great, I was thinking along the same lines after you talked about optimizing this, glad you found the extra method. I have tested this and it seems to be a bit faster. I'm testing 2 x top 25 public and top 25 private.

@arj03 arj03 merged commit c40b8c0 into faster-sort Feb 20, 2021
@arj03 arj03 deleted the faster-sort2 branch February 20, 2021 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants