-
-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refs: optimise writing unchanged refs #1120
base: master
Are you sure you want to change the base?
Conversation
Thanks! I'll take a look later today As an aside, I think one of the things currently missing from Dulwich is the explicit (user-accessible) concept of a (read) transaction during we could lock files and cache results. |
Well, looking at it from the hg-git perspective, the “transactions” in Mercurial and Dulwich are quite different. A Mercurial transaction is global and atomic; you either perform all changes or none at all. A Dulwich transaction, on the other hand, is file-specific. I guess some of this difference is driven by how Git itself handles transaction, but it'd be nice to have a similar, global transaction, if possible. EDIT: One option, and a case where this would be useful, is for packed refs. There isn't really any support for locking that file during a transaction, bunching up updates, and writing them all at once. Instead, the client has to do that manually, which might easily lead to accidental O(n²) behaviour… |
I suppose it matters more for pack files, where we currently have to repeatedly check which pack files exist if you're doing lots of random access to them - since new ones may have appeared. If we could just remember which ones existed at the start of a (read) transaction, that would probably help performance a lot. That said, for writing we could also provide transactions that cross individual ref files - we can just keep multiple locks open at the same time. |
# first, check if we need to write anything at all without | ||
# acquiring the lock, which triggers a fs write, close and | ||
# thus fsync() | ||
if self.read_loose_ref(realname) != new_ref: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what about the case where somebody else has the lock open? Previously you'd get a FileLocked error, but now the code would run fine.
Perhaps it's better to just do what we previously did but f.abort() if orig_ref == new_ref ? That way you'd still avoid the fsync() - and you'd avoid an extra read if things have changed.
f1ae053
to
cd30df4
Compare
This is inspired by an issue filed in hg-git; essentially, writing a lot of unchanged references is quite slow, due to the
fsync()
implied in re-writing the file. Although we could fix this within hg-git, it seems to me that the general optimisation might apply to other users of Dulwich, so this PR makes it so that it avoids acquiring the lock to a ref if that ref already matches the expected update.