Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flushes hash cache data file's mmap after done writing #2567

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Aug 12, 2024

Problem

We've been seen intermittent issues with the accounts hash calculation where the cache files have large chunks of zeroes. This is very elusive to observe in the wild.

Right now, when saving a new cache data file we:

  1. create new file
  2. write a zero to the last byte
  3. flush the file
  4. mmap this file
  5. write the header and all entries through the mmap

We never flush after writing all the entries though!

When the reader used to also mmap, I think we handed the same mmap from writer to reader, so we had the full memory resident in the same process and there was no issue. Now, however, the reader does not use the same mmap (or no mmap at all). So if the file is opened and the OS hasn't flushed the mmap back to the file, it is possible the reader does not see the data.

Summary of Changes

Flush the mmap after writing. Also add a stat for how long the flushing takes.

@brooksprumo brooksprumo self-assigned this Aug 12, 2024
@brooksprumo brooksprumo added the v2.0 Backport to v2.0 branch label Aug 12, 2024
Copy link

mergify bot commented Aug 12, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@brooksprumo brooksprumo marked this pull request as ready for review August 13, 2024 12:50
Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
a symptom this may cause is the cache hash data file may be the right size, but at some point will become zeros and stop containing correct data. The data is sorted, just effectively 'truncated', except it is replaced by zeros (zero pubkey, zero lamports, zero hash).

@brooksprumo
Copy link
Author

Yah. I think the files start off as all zeroes and of the right size. If they do not get flushed fully, they'll still have zeroes. And since the hash of the cache file itself is based on the account storage files, we'll never try to regenerate an already-bad cache file.

@brooksprumo brooksprumo merged commit 65f6466 into anza-xyz:master Aug 13, 2024
42 checks passed
@brooksprumo brooksprumo deleted the hash-cache/flush-mmap branch August 13, 2024 15:23
mergify bot pushed a commit that referenced this pull request Aug 13, 2024
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

brooksprumo added a commit that referenced this pull request Aug 14, 2024
…t of #2567) (#2575)

Flushes hash cache data file's mmap after done writing (#2567)

(cherry picked from commit 65f6466)

Co-authored-by: Brooks <[email protected]>
ray-kast pushed a commit to abklabs/agave that referenced this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.0 Backport to v2.0 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants