Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17582 Stream CLUSTERSTATUS API response #2916

Merged
merged 12 commits into from
Jan 4, 2025

Conversation

mlbiscoc
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-17582

Description

CLUSTERSTATUS would aggregate and build the whole response including all collection properties into an NamedList before finally passing it to the response writers. With many or thousands of collections, this can use up memory.

Solution

Instead of returning a NamedList, return a MapWriter. The Mapwriter will pull an Iterator from the documentStream and create SolrCollectionProperiesIterator implementing iterator and override next() to create and write the response in time, instead of writing the whole thing in memory.

Tests

Fix appropriate tests that were using NamedList to Map because of the MapWriter.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!

Comment on lines 386 to 389
if (shard != null) {
String[] paramShards = shard.split(",");
requestedShards.addAll(Arrays.asList(paramShards));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this is done up front; not per iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it out so its not per iteration.

Comment on lines 391 to 393
byte[] bytes = Utils.toJSON(clusterStateCollection);
@SuppressWarnings("unchecked")
Map<String, Object> docCollection = (Map<String, Object>) Utils.fromJSON(bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why round-trip this?

Copy link
Contributor Author

@mlbiscoc mlbiscoc Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looked at this and I understand now why it was done this way. It wants to just write this out to the TextWriter as a normal POJO but that doesn't seem to be what the DocCollection class is. So instead what was done was to just write it to JSON byte[] then back to a Map which was the "easiest" way.

Was looking to avoid this back and forth, but there were a few options. I tried using an ObjectMapper to Map.class but there is an error for unable to cast Instant due to a missing dependency for Jackson we need to introduce.
Java 8 date/time type java.time.Instant not supported by default Issue

Other way is to introduce a some kind of toMap method so that the TextWriter can write this as a generic Map.

Another option which actually looks like the way we should go is I found that the DocCollection class extends ZkNodeProps which implements MapWriter. DocCollection already overrides writeMap so we could just return this to the TextWriter! Unfortunately the ClusterStatus class does a bunch of JSON post processing such as Health added to the Map that the output is missing some things because of this postProcessCollectionJSON() method.

I am thinking we should refactor DocCollection to so we can just return this but the changes were much more drastic but may be worth it. Maybe in a different JIRA? This scope continues to creep with me adding improvement to NamedList. Would be happy to refactor this and pick it up if you agree. Would clean up much more code and avoid this JSON processing for every collection iteration.

@@ -368,4 +341,77 @@ public static Map<String, Object> postProcessCollectionJSON(Map<String, Object>
collection.put("health", Health.combine(healthStates).toString());
return collection;
}

private class SolrCollectionProperiesIterator implements Iterator<NamedList<Object>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we wouldn't need a custom Iterator. We do need a method that takes in DocCollection (and some other context that is the same across collections) and returns a NamedList. With such a method, we can call it via streamOfDocCollection.map(collState -> theMethod(collState, routeKey, liveNodes, etc.).iterator()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah don't know why I didn't do this first. Changed it appropriately.

Comment on lines 219 to 239
Iterator<Map<String, Object>> it =
collectionStream
.map(
(collectionState) ->
collectionPropsResponse(
collectionState,
collectionVsAliases,
routeKey,
liveNodes,
requestedShards))
.iterator();
while (it.hasNext()) {
Map<String, Object> props = it.next();
props.forEach(
(key, value) -> {
try {
ew.put(key, value);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking, if we can change up DocCollection in another PR, we can delete all those post processing functions and condense the response to this.

MapWriter collectionPropsWriter =
        ew -> {
          Iterator<DocCollection> it = collectionStream.iterator();
          while (it.hasNext()) {
            DocCollection collState = it.next();
            ew.put(collState.getName(), collState);
          }
        };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should mention Solr has two similar APIs, ClusterStatus & ColStatus. See org.apache.solr.handler.admin.ColStatus#getColStatus. It'd be awesome if there was a single method that takes a DocCollection (and some other params as needed) to produce a NamedList. At least ColStatus's code doesn't have the sad JSON round-trip. Maybe it's own PR or not as you wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'll take a look at this in a separate PR/Jira

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the round-trip serialization -- I didn't realize it was that way before.
Leave findRecursive be... not as easy as you thought and probably deserves its own issue.

Comment on lines 219 to 239
Iterator<Map<String, Object>> it =
collectionStream
.map(
(collectionState) ->
collectionPropsResponse(
collectionState,
collectionVsAliases,
routeKey,
liveNodes,
requestedShards))
.iterator();
while (it.hasNext()) {
Map<String, Object> props = it.next();
props.forEach(
(key, value) -> {
try {
ew.put(key, value);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should mention Solr has two similar APIs, ClusterStatus & ColStatus. See org.apache.solr.handler.admin.ColStatus#getColStatus. It'd be awesome if there was a single method that takes a DocCollection (and some other params as needed) to produce a NamedList. At least ColStatus's code doesn't have the sad JSON round-trip. Maybe it's own PR or not as you wish.

Comment on lines 356 to 364
Set<String> shards = new HashSet<>(requestedShards);
String name = clusterStateCollection.getName();

if (routeKey != null) {
DocRouter router = clusterStateCollection.getRouter();
Collection<Slice> slices = router.getSearchSlices(routeKey, null, clusterStateCollection);
for (Slice slice : slices) {
requestedShards.add(slice.getName());
shards.add(slice.getName());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something seems wrong to me here. If requestedShards is specified, we should use that. If routeKey is specified, we should use that (to compute the shards). Or neither (99% of users won't do either). But we shouldn't do both if for no other reason that it's confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could throw a SolrException(BAD_REQUEST) higher up the stack if passed both shard and _route_. I can update the ref-docs to say you can only do one or the other. But this could break people who are doing both.

I also changed that logic into a single line and forEach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that but understand if you leave out of scope.
(I really don't think this is going to break anyone's current usage, BTW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it in when I come back for the refactor.

Comment on lines 291 to 296
Map<String, Object> shards = (Map<String, Object>) collectionProps.get("shards");
for (Object nextShard : shards.values()) {
Map<String, Object> shardMap = (Map<String, Object>) nextShard;
Map<String, Object> replicas = (Map<String, Object>) shardMap.get("replicas");
for (Object nextReplica : replicas.values()) {
Map<String, Object> replicaMap = (Map<String, Object>) nextReplica;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to cry just glancing at this.
This is the poster-child for why Java introduced "var". And there may be other approaches to improve it but that's the simplest.
(Yeah you didn't write it; I know)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change those Maps to var but its another reason I think I'll come back to this. I could try to improve on this and I think its possible to remove a bunch of code here like buildResponseForCollection and postProcessCollectionJSON

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally understood. In some old Solr code like this, there's always "and one more thing" we could/should do but ultimately snowballs the scope out of control. I leave it to you to do as you wish. Thank you for your contribution here; I didn't mean to get more out of you than you bargained for :-)

liveNodes,
requestedShards));
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Contributor

@dsmiley dsmiley Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does buildResponseForCollection throw IOException? That's suspicious. If you must, catch in there and throw a suitable exception like SolrException (which extends RuntimeException and is generally preferred within Solr over RE).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildResponseForCollection doesn't throw the IOException, the EntryWriter does but looks like there is actually a putNoEx method that catches for us and that throws a generic RuntimeException. I could throw a SolrException there with better logging if you think its better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putNoEx then

Comment on lines 219 to 220
collectionStream.forEach(
(collectionState) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friggin beautiful now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putNoEx removes the try/catch we can get it cleaner!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh; a back-compat problem is dawning on me. Any SolrJ consumer of this API (especially using the default "javabin" format) is going to break here. The change to BaseHttpClusterStateProvider should have made this glaringly obvious but I overlooked its implication earlier (face-palm).

I wouldn't mind if we do this only for V2 (which doesn't yet have back-compatibility concerns) but V2 only has one collection status list returning API (/cluster) and I've debated against the very existence of /cluster for V2. There's not yet a proposal for an alternative. Furthermore V2 raises the bigger question of coalescing on a single DocCollection serialization method.

If we forge ahead anyway (not waiting for V2), we'll have to work around this compatibility. I had a couple ideas but whittled down to one:

  • detect the response format in this handler. If it's Javabin, we write to a NamedList (or better, SimpleOrderedMap), defeating the point of this PR for that format. If JSON/XML is used, this PR will accomplish its goal. Need to write a little test for JSON that at least partially sanity checks the result. Such a test should pass without this PR.

@mlbiscoc
Copy link
Contributor Author

Uh oh; a back-compat problem is dawning on me. Any SolrJ consumer of this API (especially using the default "javabin" format) is going to break here. The change to BaseHttpClusterStateProvider should have made this glaringly obvious but I overlooked its implication earlier (face-palm).

I wouldn't mind if we do this only for V2 (which doesn't yet have back-compatibility concerns) but V2 only has one collection status list returning API (/cluster) and I've debated against the very existence of /cluster for V2. There's not yet a proposal for an alternative. Furthermore V2 raises the bigger question of coalescing on a single DocCollection serialization method.

If we forge ahead anyway (not waiting for V2), we'll have to work around this compatibility. I had a couple ideas but whittled down to one:

  • detect the response format in this handler. If it's Javabin, we write to a NamedList (or better, SimpleOrderedMap), defeating the point of this PR for that format. If JSON/XML is used, this PR will accomplish its goal. Need to write a little test for JSON that at least partially sanity checks the result. Such a test should pass without this PR.

Well thats unfortunate... I'm not too familiar with SolrJ (I'll maybe play with it more to catch stuff like this) but I am assuming now the API is going to return a Map and no longer a NamedList to the client thus breaking them now?

I'll continue with this and add in your ideas to keep this backwards compatible.

@dsmiley
Copy link
Contributor

dsmiley commented Dec 23, 2024

Yes; as the code you modified in a ClusterStateProvider shows, the NamedList->Map happened and broke that code.

You could add a check for {{params.get(CommonParams.WT)}} and check if "javabin" is the value.

@github-actions github-actions bot removed the cat:cli label Dec 24, 2024
@mlbiscoc
Copy link
Contributor Author

Yes; as the code you modified in a ClusterStateProvider shows, the NamedList->Map happened and broke that code.

You could add a check for {{params.get(CommonParams.WT)}} and check if "javabin" is the value.

Added in that check for javabin to keep backwards compatibility with SolrJ and reverted all those SolrJ changes. Also created clusterStatusWithCollectionAndShardJSON to sanity check the json response from MapWriter. Happy holidays!

* SOLR-17582: The CLUSTERSTATUS API will now stream each collection's status to the response,
  fetching and computing it on the fly.  To avoid a backwards compatibilty concern, this won't work for wt=javabin.  (Matthew Biscocho, David Smiley)
@dsmiley dsmiley merged commit 1b1c925 into apache:main Jan 4, 2025
1 check passed
dsmiley pushed a commit that referenced this pull request Jan 7, 2025
The CLUSTERSTATUS API will now stream each collection's status to the response, fetching and computing it on the fly.  To avoid a backwards compatibility concern, this won't work for wt=javabin.

(cherry picked from commit 1b1c925)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants