Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-12094.Implement ozone debug replicas verify checksums command #7748

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ptlrs
Copy link
Contributor

@ptlrs ptlrs commented Jan 24, 2025

What changes were proposed in this pull request?

This PR:

  • adds a new ozone debug replicas verify command
  • renamed read-replicas to checksums command
  • updated checksums command with the ability to walk the file tree and calculate checksums of all files
  • made a URI a required parameter for all subcommands of replicas verify

Note: This is a WIP change. Further enhancements and refactoring will be done and acceptance tests will be updated.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12057

How was this patch tested?

Manual testing in docker-compose environment
CI: https://github.com/ptlrs/ozone/actions/runs/12954171235

OzoneConfiguration configuration = new OzoneConfiguration(getConf());
configuration.setBoolean("ozone.client.verify.checksum",
!isChecksumVerifyEnabled);
RpcClient newClient = new RpcClient(configuration, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method processKey() is going to be called multiple times for all keys in a bucket.
The RpcClient object can be reused for multiple keys. I'd also suggest move the instantiation of OzoneConfiguration to be only one-time overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also worth noting that the command will process one key at a time with no parallelism so could take a long time for a big bucket.

@MetaInfServices(DebugSubcommand.class)
public class ReadReplicas extends KeyHandler implements DebugSubcommand {
public class ReadReplicas extends Handler implements DebugSubcommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to rename this class/file name as well to match the command name.

File dir = createDirectory(volumeName, bucketName, keyName);
OzoneKeyDetails keyInfoDetails = checksumClient.getKeyDetails(volumeName, bucketName, keyName);
Map<OmKeyLocationInfo, Map<DatanodeDetails, OzoneInputStream>> replicas =
checksumClient.getKeysEveryReplicas(volumeName, bucketName, keyName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally related, but RpcClient.getKeysEveryReplicas() doesn't seems to create input stream that refreshes container cache upon failure. Could that lead problems in such a corner case?

https://github.com/apache/ozone/blob/master/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java#L1609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants