Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce commit subset check (part of stack) #5881

Merged
merged 4 commits into from
Jan 9, 2025
Merged

Introduce commit subset check (part of stack) #5881

merged 4 commits into from
Jan 9, 2025

Conversation

Caleb-T-Owens
Copy link
Contributor

@Caleb-T-Owens Caleb-T-Owens commented Jan 2, 2025

Introduces a function for determining if the changes introduced by a commit, are a subset of the changes introduced by another.

This is helpful for an upcoming PR which should greatly improve our integration check.


get_exclusive_tree is able to take a stack of commits IE Base -> A -> B. And return a tree which only contains the introductions of the B commit (when diffed against the Base).

We do this by merging commits B and Base against the parent of B. Any conflicts are resolved in favour of the B commit.

is_subset makes use of get_exclusive_tree to be able to compare the subset and superset candidates without worry about changes that they may have been cherry-picked on top of


This is part 1 of 2 in a stack made with GitButler:

Copy link

vercel bot commented Jan 2, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
gitbutler-components ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2025 11:23am
gitbutler-web ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2025 11:23am

Copy link
Collaborator

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the invite, I took a close look, but only from a Rust + gix perspective, not from a "does this really work and is it sound" point of view.

For simplicity, I just created two commits with changes I'd do, please see them as suggestions. The second one has more of a "careful, better do it this way" narrative though, but either way they can be changed, refactored or removed.

Generally, when doing checks like these, one should assure that the Repository is configured to not write objects to disk. This is fast enough to do it on the fly, but as it would require cloning the repository that also strips object caches which would hurt performance if many commits are checked in a row. Thus I added documentation to the method that I think might be public one day, but ideally the top-level caller will take care of that.

I hope this helps.


As a final something to think about: I personally find it helpful to know about lifetimes despite the added noise, and enable #![deny(rust_2018_idioms)] in lib.rs in all crates. This came up when seeing get_first_parent(commit: gix::Commit) and I had to wonder why it's passed like this, until I realized that it does have a reference to the Repository which was just hidden. With get_first_parent(commit: gix::Commit<'_>) it would be clear that contains references, just like the git2 types of the same kind. When passing it as & which should be done with a Clone type, it also became obvious why it was done like that - having all lifetimes visible may have helped here, too.

}

impl OidExt for git2::Oid {
fn to_gix(self) -> gix::ObjectId {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's wonderful!

@@ -101,3 +121,16 @@ pub fn gix_to_git2_index(index: &gix::index::State) -> anyhow::Result<git2::Inde
}
Ok(out)
}

pub fn print_tree(tree: gix::Tree<'_>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be a better fit for the testsupport crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wanting to use it for debugging inside an application function rather than inside a test case, which afaik, means it can't be inside testsupport

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within #[test] code dev-dependencies are available, no matter where it is.
Thus, moving it to the testsupport crate will allow it to be used in any test, anywhere, as long as testsupport is a dev-dependency.


pub fn print_tree(tree: gix::Tree<'_>) {
let mut recorder = gix::traverse::tree::Recorder::default();
tree.traverse().breadthfirst(&mut recorder).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing the depth-first traversal I noticed that this isn't really breadth-first as it effectively lists files first, then directories, which makes for a funky ordering that isn't too helpful when debugging.

The old version of gix tree entries -r shows the effects of this nicely.

The reason I never noticed is that I never used it, but went with termtree instead.

pub fn visualize_tree(
    id: &gix_hash::oid,
    odb: &impl gix_object::Find,
    name_and_mode: Option<(&BStr, EntryMode)>,
) -> termtree::Tree<String> {
    fn short_id(id: &gix_hash::oid) -> String {
        id.to_string()[..7].to_string()
    }
    let entry_name = |id: &gix_hash::oid, name: Option<(&BStr, EntryMode)>| -> String {
        let mut buf = Vec::new();
        match name {
            None => short_id(id),
            Some((name, mode)) => {
                format!(
                    "{name}:{mode}{} {}",
                    short_id(id),
                    match odb.find_blob(id, &mut buf) {
                        Ok(blob) => format!("{:?}", blob.data.as_bstr()),
                        Err(_) => "".into(),
                    },
                    mode = if mode.is_tree() {
                        "".into()
                    } else {
                        format!("{:o}:", mode.0)
                    }
                )
            }
        }
    };

    let mut tree = termtree::Tree::new(entry_name(id, name_and_mode));
    let mut buf = Vec::new();
    for entry in odb.find_tree(id, &mut buf).unwrap().entries {
        if entry.mode.is_tree() {
            tree.push(visualize_tree(entry.oid, odb, Some((entry.filename, entry.mode))));
        } else {
            tree.push(entry_name(entry.oid, Some((entry.filename, entry.mode))));
        }
    }
    tree
}

The above (copied from the tests of the gix-merge crate) lists a tree as is and prints all information about it that one would want in a simple recursive function. It can certainly be rewritten to power print_tree(), and I think it will be for the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too concerned with the ordering/bredthfirstness of the function. In some ways, it's possibly more helpful if it prints in same order as the tree entries.

I'll have a think about what a helpful debug output would look like and revisit

use super::*;

#[test]
fn a_commit_is_a_subset_of_itself() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tests are read-only, which is when the read_only::fixture* methods are faster. They create a fixture only once and make it easy to inspect on disk. They come at the disadvantage of separating the test-setup from the test itself though, but to me it's easy enough to check the script and I can understand more easily how the structure looks like by looking at it, and if in doubt the final structure is easy to inspect on disk.

Of course, these are not the pinnacle of what's possible and I always thought that it would be great to define tests more like cargo does, but to apply the same techniques to easily define complex Git repositories.

@Caleb-T-Owens
Copy link
Contributor Author

Thanks for the invite, I took a close look, but only from a Rust + gix perspective, not from a "does this really work and is it sound" point of view.

Thank you! This is greatly appreciated

For simplicity, I just created two commits with changes I'd do, please see them as suggestions. The second one has more of a "careful, better do it this way" narrative though, but either way they can be changed, refactored or removed.

Thanks for these! Both look good.

Generally, when doing checks like these, one should assure that the Repository is configured to not write objects to disk. This is fast enough to do it on the fly, but as it would require cloning the repository that also strips object caches which would hurt performance if many commits are checked in a row. Thus I added documentation to the method that I think might be public one day, but ideally the top-level caller will take care of that.

The new integration check will run over a full stack rather than getting called on an individual commit. As such, it will almost always be called many times in a row. As such, it makes more sense to open the in-memory repository outside of the is_subset function and instead inside the new integration function.

As a final something to think about: I personally find it helpful to know about lifetimes despite the added noise, and enable #![deny(rust_2018_idioms)] in lib.rs in all crates. This came up when seeing get_first_parent(commit: gix::Commit) and I had to wonder why it's passed like this, until I realized that it does have a reference to the Repository which was just hidden. With get_first_parent(commit: gix::Commit<'_>) it would be clear that contains references, just like the git2 types of the same kind. When passing it as & which should be done with a Clone type, it also became obvious why it was done like that - having all lifetimes visible may have helped here, too.

I'm wary of becoming a perl codebase, but would it make sense to add #![deny(rust_2018_idioms)] to the top of new files so we can incrementally move over to better idioms?

@Byron
Copy link
Collaborator

Byron commented Jan 2, 2025

I'm wary of becoming a perl codebase, but would it make sense to add #![deny(rust_2018_idioms)] to the top of new files so we can incrementally move over to better idioms?

It absolutely can be done that way, which certainly helps to get a feel for it. Additionally, I still don't understand how crates specified as edition = "2021" won't even warn about the rust-2018 idioms.
The reason it's not the default might be exactly due to the added noise, even though I find it most useful to know if references are involved.

@krlvi krlvi merged commit b22e3fd into master Jan 9, 2025
21 checks passed
@krlvi krlvi deleted the subset-check branch January 9, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants