Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DFS while marking connected components #14022

Merged
merged 3 commits into from
Jan 6, 2025

Conversation

viswanathk
Copy link
Contributor

@viswanathk viswanathk commented Nov 27, 2024

Part of #14002

Stack depth was growing more than it should causing excessive allocations. This should help reduce them, and may potentially speed up process.

Benchmark while indexing 100k docs:

cat max_stack_depth_optimized.txt | grep maxStackDepth | sort -t= -k2 -n | tail
maxStackDepth=8592
maxStackDepth=8605
maxStackDepth=8666
maxStackDepth=8738
maxStackDepth=8779
maxStackDepth=8825
maxStackDepth=9084
maxStackDepth=39925
maxStackDepth=67764
maxStackDepth=68239
cat max_stack_depth_optimized.txt | tail


Results:
recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.532         0.224  100000   100      50       16        250     4 bits     3.56      28129.40           6.71             1            47.82         42.915         4.768
 0.661         0.206  100000   100      50       16        250     7 bits     3.42      29231.22           6.35             1            50.85         47.684         9.537
 0.830         0.263  100000   100      50       16        250         no     3.16      31665.61           5.40             1            42.36         38.147        38.147

BUILD SUCCESSFUL in 34s
2 actionable tasks: 1 executed, 1 up-to-date
cat max_stack_depth_non_optimized.txt | grep maxStackDepth | sort -t= -k2 -n | tail
maxStackDepth=138439
maxStackDepth=139713
maxStackDepth=140014
maxStackDepth=140365
maxStackDepth=140955
maxStackDepth=147255
maxStackDepth=152292
maxStackDepth=618303
maxStackDepth=1128533
maxStackDepth=1505067
cat max_stack_depth_non_optimized.txt | tail


Results:
recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.532         0.244  100000   100      50       16        250     4 bits     3.52      28376.84           8.04             1            47.85         42.915         4.768
 0.662         0.309  100000   100      50       16        250     7 bits     3.54      28288.54           7.64             1            50.86         47.684         9.537
 0.810         0.271  100000   100      50       16        250         no     3.20      31230.48           5.65             1            41.99         38.147        38.147

BUILD SUCCESSFUL in 37s
2 actionable tasks: 1 executed, 1 up-to-date

cc: @msokolov @vigyasharma

@viswanathk
Copy link
Contributor Author

The force merge time shows some improvement.

@viswanathk
Copy link
Contributor Author

Please let me know if we need to run full benchmark suit on this

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I don't think we have a need to support the partial bitset case (do we @msokolov ?), and the optimization seems worth it. But let's document this change (that the biset should be empty) in the function docstring.

@msokolov
Copy link
Contributor

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

@msokolov
Copy link
Contributor

msokolov commented Dec 16, 2024

Benchmark while indexing 100k docs:

could you say what data set you used here -- is this random vectors? If so, it would be great to use some non-random vectors so we can have realistic expectations for impact

Copy link
Contributor

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - are you able to address the comments? It's probably OK as is, but it would be great if we could remove the empties and address the testing question

@@ -163,6 +164,10 @@ private static Component markRooted(
throws IOException {
// Start at entry point and search all nodes on this level
// System.out.println("markRooted level=" + level + " entryPoint=" + entryPoint);
if (connectedNodes.get(entryPoint)) {
return new Component(entryPoint, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never happen, right? because we enter with the next non-connected node. Can we add an assert false here before the return statement so we catch during testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, this can happen because we iterate over all the entryPoints. Q: do we need this zero-size component for anything? Can we recall what happens with these componentws when we're done - the only purpose is to use them for reconnecting the graph. Yeah it looks like we will try to connect them again, which we could skip. Let's not add these empty components to the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, this can happen because we iterate over all the entryPoints. Q: do we need this zero-size component for anything? Can we recall what happens with these componentws when we're done - the only purpose is to use them for reconnecting the graph. Yeah it looks like we will try to connect them again, which we could skip. Let's not add these empty components to the list.

I don't think we are adding the empty components to the list though. We are adding to the list with the total of the entryPoints for that level (which seems unlikely).

In the other places we add, we start the markRooted process with the nextClearBit, so it won't return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are adding the empty components to the list though. We are adding to the list with the total of the entryPoints for that level (which seems unlikely).

But seems like a good check. Updated the PR.

@viswanathk
Copy link
Contributor Author

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

Yes, this was my understanding too.

@viswanathk
Copy link
Contributor Author

Benchmark while indexing 100k docs:

could you say what data set you used here -- is this random vectors? If so, it would be great to use some non-random vectors so we can have realistic expectations for impact

I used the knnPerfTest to run the benchmark. It uses enwiki-20120502-lines-1k for doc vectors, and glove-6B-100 for query vectors.

@vigyasharma
Copy link
Contributor

With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitset and it'll get updated with all the values.

I'm not sure -- doesn't it still expect that connectedNodes is preserved between calls? The overall flow is like: find the next not-connected node, and traverse all of its connections -- it might run into an already-connected node (marked as connected in the bitset) because the relation is asymmetric. We used to continue traversing anyway although it's kind of pointless. Maybe it would tell you the size of the "rooted" component of the graph, but we don't really use this size information, so I think it's OK to early-terminate once you find something that is already rooted in an earlier component. And we still expect to remember the visited set across calls.

Okay, I hadn't looked at the calling function HnswUtil.components() and was thrown off by the early return if entry point is already visited. We do need to pass the same bitset for each entry point.

Since we skip visited nodes now, can this function be impacted if new nodes got added to the graph in between the markRooted() calls? I'm not sure if we allow adding nodes once finish() has been invoked (but not completed). Does it even matter if we don't traverse some new nodes (looks like we assert on the total here?).

Copy link

github-actions bot commented Jan 2, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 2, 2025
@msokolov
Copy link
Contributor

msokolov commented Jan 6, 2025

sorry for the delay - holidays intervened!

@msokolov msokolov merged commit 5460da8 into apache:main Jan 6, 2025
5 checks passed
@msokolov
Copy link
Contributor

msokolov commented Jan 6, 2025

@viswanathk I just merged and then belatedly realized we should also have a CHANGES.txt entry for this - I guess it belongs under Optimizations heading -- do you want to add? And then we should also backport to branch_10x so this will get released also on the next point release. This is usually done by cherry-picking from main branch. Do you want to undertake?

@viswanathk
Copy link
Contributor Author

Yeah, let me make them real quick.

@viswanathk
Copy link
Contributor Author

@msokolov raised the two PRs. Please take a look.

viswanathk added a commit to viswanathk/lucene that referenced this pull request Jan 7, 2025
msokolov pushed a commit that referenced this pull request Jan 7, 2025
* Optimize DFS while marking connected components (#14022)

* Add CHANGES.txt entry for HNSW DFS Optimization #14022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants