Introduce commit subset check (part of stack) #5881

Caleb-T-Owens · 2025-01-02T13:09:04Z

Introduces a function for determining if the changes introduced by a commit, are a subset of the changes introduced by another.

This is helpful for an upcoming PR which should greatly improve our integration check.

get_exclusive_tree is able to take a stack of commits IE Base -> A -> B. And return a tree which only contains the introductions of the B commit (when diffed against the Base).

We do this by merging commits B and Base against the parent of B. Any conflicts are resolved in favour of the B commit.

is_subset makes use of get_exclusive_tree to be able to compare the subset and superset candidates without worry about changes that they may have been cherry-picked on top of

This is part 1 of 2 in a stack made with GitButler:

2 Improved integration #5893
1 Introduce commit subset check (part of stack) #5881 👈

vercel · 2025-01-02T13:09:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
gitbutler-components	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 9, 2025 11:23am
gitbutler-web	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 9, 2025 11:23am

Byron

Thanks for the invite, I took a close look, but only from a Rust + gix perspective, not from a "does this really work and is it sound" point of view.

For simplicity, I just created two commits with changes I'd do, please see them as suggestions. The second one has more of a "careful, better do it this way" narrative though, but either way they can be changed, refactored or removed.

Generally, when doing checks like these, one should assure that the Repository is configured to not write objects to disk. This is fast enough to do it on the fly, but as it would require cloning the repository that also strips object caches which would hurt performance if many commits are checked in a row. Thus I added documentation to the method that I think might be public one day, but ideally the top-level caller will take care of that.

I hope this helps.

As a final something to think about: I personally find it helpful to know about lifetimes despite the added noise, and enable #![deny(rust_2018_idioms)] in lib.rs in all crates. This came up when seeing get_first_parent(commit: gix::Commit) and I had to wonder why it's passed like this, until I realized that it does have a reference to the Repository which was just hidden. With get_first_parent(commit: gix::Commit<'_>) it would be clear that contains references, just like the git2 types of the same kind. When passing it as & which should be done with a Clone type, it also became obvious why it was done like that - having all lifetimes visible may have helped here, too.

Byron · 2025-01-02T13:26:05Z

crates/gitbutler-oxidize/src/lib.rs

+}
+
+impl OidExt for git2::Oid {
+    fn to_gix(self) -> gix::ObjectId {


That's wonderful!

Byron · 2025-01-02T13:29:17Z

crates/gitbutler-oxidize/src/lib.rs

@@ -101,3 +121,16 @@ pub fn gix_to_git2_index(index: &gix::index::State) -> anyhow::Result<git2::Inde
    }
    Ok(out)
 }
+
+pub fn print_tree(tree: gix::Tree<'_>) {


This would probably be a better fit for the testsupport crate.

I was wanting to use it for debugging inside an application function rather than inside a test case, which afaik, means it can't be inside testsupport

Within #[test] code dev-dependencies are available, no matter where it is.
Thus, moving it to the testsupport crate will allow it to be used in any test, anywhere, as long as testsupport is a dev-dependency.

Byron · 2025-01-02T13:33:44Z

crates/gitbutler-oxidize/src/lib.rs

+
+pub fn print_tree(tree: gix::Tree<'_>) {
+    let mut recorder = gix::traverse::tree::Recorder::default();
+    tree.traverse().breadthfirst(&mut recorder).unwrap();


When implementing the depth-first traversal I noticed that this isn't really breadth-first as it effectively lists files first, then directories, which makes for a funky ordering that isn't too helpful when debugging.

The old version of gix tree entries -r shows the effects of this nicely.

The reason I never noticed is that I never used it, but went with termtree instead.

pub fn visualize_tree( id: &gix_hash::oid, odb: &impl gix_object::Find, name_and_mode: Option<(&BStr, EntryMode)>, ) -> termtree::Tree<String> { fn short_id(id: &gix_hash::oid) -> String { id.to_string()[..7].to_string() } let entry_name = |id: &gix_hash::oid, name: Option<(&BStr, EntryMode)>| -> String { let mut buf = Vec::new(); match name { None => short_id(id), Some((name, mode)) => { format!( "{name}:{mode}{} {}", short_id(id), match odb.find_blob(id, &mut buf) { Ok(blob) => format!("{:?}", blob.data.as_bstr()), Err(_) => "".into(), }, mode = if mode.is_tree() { "".into() } else { format!("{:o}:", mode.0) } ) } } }; let mut tree = termtree::Tree::new(entry_name(id, name_and_mode)); let mut buf = Vec::new(); for entry in odb.find_tree(id, &mut buf).unwrap().entries { if entry.mode.is_tree() { tree.push(visualize_tree(entry.oid, odb, Some((entry.filename, entry.mode)))); } else { tree.push(entry_name(entry.oid, Some((entry.filename, entry.mode)))); } } tree }

The above (copied from the tests of the gix-merge crate) lists a tree as is and prints all information about it that one would want in a simple recursive function. It can certainly be rewritten to power print_tree(), and I think it will be for the better.

I wasn't too concerned with the ordering/bredthfirstness of the function. In some ways, it's possibly more helpful if it prints in same order as the tree entries.

I'll have a think about what a helpful debug output would look like and revisit

Byron · 2025-01-02T14:07:33Z

crates/gitbutler-branch-actions/src/commit_ops.rs

+        use super::*;
+
+        #[test]
+        fn a_commit_is_a_subset_of_itself() {


All these tests are read-only, which is when the read_only::fixture* methods are faster. They create a fixture only once and make it easy to inspect on disk. They come at the disadvantage of separating the test-setup from the test itself though, but to me it's easy enough to check the script and I can understand more easily how the structure looks like by looking at it, and if in doubt the final structure is easy to inspect on disk.

Of course, these are not the pinnacle of what's possible and I always thought that it would be great to define tests more like cargo does, but to apply the same techniques to easily define complex Git repositories.

Caleb-T-Owens · 2025-01-02T15:09:13Z

Thanks for the invite, I took a close look, but only from a Rust + gix perspective, not from a "does this really work and is it sound" point of view.

Thank you! This is greatly appreciated

For simplicity, I just created two commits with changes I'd do, please see them as suggestions. The second one has more of a "careful, better do it this way" narrative though, but either way they can be changed, refactored or removed.

Thanks for these! Both look good.

Generally, when doing checks like these, one should assure that the Repository is configured to not write objects to disk. This is fast enough to do it on the fly, but as it would require cloning the repository that also strips object caches which would hurt performance if many commits are checked in a row. Thus I added documentation to the method that I think might be public one day, but ideally the top-level caller will take care of that.

The new integration check will run over a full stack rather than getting called on an individual commit. As such, it will almost always be called many times in a row. As such, it makes more sense to open the in-memory repository outside of the is_subset function and instead inside the new integration function.

As a final something to think about: I personally find it helpful to know about lifetimes despite the added noise, and enable #![deny(rust_2018_idioms)] in lib.rs in all crates. This came up when seeing get_first_parent(commit: gix::Commit) and I had to wonder why it's passed like this, until I realized that it does have a reference to the Repository which was just hidden. With get_first_parent(commit: gix::Commit<'_>) it would be clear that contains references, just like the git2 types of the same kind. When passing it as & which should be done with a Clone type, it also became obvious why it was done like that - having all lifetimes visible may have helped here, too.

I'm wary of becoming a perl codebase, but would it make sense to add #![deny(rust_2018_idioms)] to the top of new files so we can incrementally move over to better idioms?

Byron · 2025-01-02T18:16:24Z

I'm wary of becoming a perl codebase, but would it make sense to add #![deny(rust_2018_idioms)] to the top of new files so we can incrementally move over to better idioms?

It absolutely can be done that way, which certainly helps to get a feel for it. Additionally, I still don't understand how crates specified as edition = "2021" won't even warn about the rust-2018 idioms.
The reason it's not the default might be exactly due to the added noise, even though I find it most useful to know if references are involved.

This is more of a suggestion than anything else.

github-actions bot added the rust Pull requests that update Rust code label Jan 2, 2025

vercel bot deployed to Preview – gitbutler-components January 2, 2025 13:09 View deployment

vercel bot deployed to Preview – gitbutler-web January 2, 2025 13:11 View deployment

Caleb-T-Owens requested review from Byron and krlvi January 2, 2025 13:16

Byron reviewed Jan 2, 2025

View reviewed changes

vercel bot deployed to Preview – gitbutler-components January 2, 2025 14:11 View deployment

vercel bot deployed to Preview – gitbutler-web January 2, 2025 14:12 View deployment

vercel bot deployed to Preview – gitbutler-web January 5, 2025 15:18 View deployment

vercel bot deployed to Preview – gitbutler-components January 5, 2025 15:18 View deployment

Caleb-T-Owens mentioned this pull request Jan 6, 2025

Improved integration #5893

Open

Caleb-T-Owens force-pushed the subset-check branch from 9de4abc to e84e7b7 Compare January 8, 2025 17:21

vercel bot deployed to Preview – gitbutler-components January 8, 2025 17:22 View deployment

vercel bot deployed to Preview – gitbutler-web January 8, 2025 17:22 View deployment

Caleb-T-Owens and others added 4 commits January 9, 2025 12:21

Introduce commit subset check

d419edb

minimal refactor

653e8d5

This is more of a suggestion than anything else.

When checking for conflicts, always be explicit about what's a conflict.

f4383b2

Use an enum result to better describe subset

930c144

Caleb-T-Owens force-pushed the subset-check branch from e84e7b7 to 930c144 Compare January 9, 2025 11:21

vercel bot deployed to Preview – gitbutler-components January 9, 2025 11:22 View deployment

vercel bot deployed to Preview – gitbutler-web January 9, 2025 11:23 View deployment

krlvi approved these changes Jan 9, 2025

View reviewed changes

krlvi merged commit b22e3fd into master Jan 9, 2025
21 checks passed

krlvi deleted the subset-check branch January 9, 2025 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce commit subset check (part of stack) #5881

Introduce commit subset check (part of stack) #5881

Caleb-T-Owens commented Jan 2, 2025 •

edited

Loading

vercel bot commented Jan 2, 2025 •

edited

Loading

Byron left a comment

Byron Jan 2, 2025

Byron Jan 2, 2025

Caleb-T-Owens Jan 2, 2025

Byron Jan 2, 2025

Byron Jan 2, 2025

Caleb-T-Owens Jan 2, 2025

Byron Jan 2, 2025

Caleb-T-Owens commented Jan 2, 2025

Byron commented Jan 2, 2025

Introduce commit subset check (part of stack) #5881

Introduce commit subset check (part of stack) #5881

Conversation

Caleb-T-Owens commented Jan 2, 2025 • edited Loading

vercel bot commented Jan 2, 2025 • edited Loading

Byron left a comment

Choose a reason for hiding this comment

Byron Jan 2, 2025

Choose a reason for hiding this comment

Byron Jan 2, 2025

Choose a reason for hiding this comment

Caleb-T-Owens Jan 2, 2025

Choose a reason for hiding this comment

Byron Jan 2, 2025

Choose a reason for hiding this comment

Byron Jan 2, 2025

Choose a reason for hiding this comment

Caleb-T-Owens Jan 2, 2025

Choose a reason for hiding this comment

Byron Jan 2, 2025

Choose a reason for hiding this comment

Caleb-T-Owens commented Jan 2, 2025

Byron commented Jan 2, 2025

Caleb-T-Owens commented Jan 2, 2025 •

edited

Loading

vercel bot commented Jan 2, 2025 •

edited

Loading