-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce commit subset check (part of stack) #5881
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the invite, I took a close look, but only from a Rust + gix
perspective, not from a "does this really work and is it sound" point of view.
For simplicity, I just created two commits with changes I'd do, please see them as suggestions. The second one has more of a "careful, better do it this way" narrative though, but either way they can be changed, refactored or removed.
Generally, when doing checks like these, one should assure that the Repository
is configured to not write objects to disk. This is fast enough to do it on the fly, but as it would require cloning the repository that also strips object caches which would hurt performance if many commits are checked in a row. Thus I added documentation to the method that I think might be public one day, but ideally the top-level caller will take care of that.
I hope this helps.
As a final something to think about: I personally find it helpful to know about lifetimes despite the added noise, and enable #![deny(rust_2018_idioms)]
in lib.rs
in all crates. This came up when seeing get_first_parent(commit: gix::Commit)
and I had to wonder why it's passed like this, until I realized that it does have a reference to the Repository
which was just hidden. With get_first_parent(commit: gix::Commit<'_>)
it would be clear that contains references, just like the git2
types of the same kind. When passing it as &
which should be done with a Clone
type, it also became obvious why it was done like that - having all lifetimes visible may have helped here, too.
} | ||
|
||
impl OidExt for git2::Oid { | ||
fn to_gix(self) -> gix::ObjectId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's wonderful!
@@ -101,3 +121,16 @@ pub fn gix_to_git2_index(index: &gix::index::State) -> anyhow::Result<git2::Inde | |||
} | |||
Ok(out) | |||
} | |||
|
|||
pub fn print_tree(tree: gix::Tree<'_>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be a better fit for the testsupport
crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wanting to use it for debugging inside an application function rather than inside a test case, which afaik, means it can't be inside testsupport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within #[test]
code dev-dependencies
are available, no matter where it is.
Thus, moving it to the testsupport
crate will allow it to be used in any test, anywhere, as long as testsupport
is a dev-dependency
.
|
||
pub fn print_tree(tree: gix::Tree<'_>) { | ||
let mut recorder = gix::traverse::tree::Recorder::default(); | ||
tree.traverse().breadthfirst(&mut recorder).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing the depth-first traversal I noticed that this isn't really breadth-first as it effectively lists files first, then directories, which makes for a funky ordering that isn't too helpful when debugging.
The old version of gix tree entries -r
shows the effects of this nicely.
The reason I never noticed is that I never used it, but went with termtree
instead.
pub fn visualize_tree(
id: &gix_hash::oid,
odb: &impl gix_object::Find,
name_and_mode: Option<(&BStr, EntryMode)>,
) -> termtree::Tree<String> {
fn short_id(id: &gix_hash::oid) -> String {
id.to_string()[..7].to_string()
}
let entry_name = |id: &gix_hash::oid, name: Option<(&BStr, EntryMode)>| -> String {
let mut buf = Vec::new();
match name {
None => short_id(id),
Some((name, mode)) => {
format!(
"{name}:{mode}{} {}",
short_id(id),
match odb.find_blob(id, &mut buf) {
Ok(blob) => format!("{:?}", blob.data.as_bstr()),
Err(_) => "".into(),
},
mode = if mode.is_tree() {
"".into()
} else {
format!("{:o}:", mode.0)
}
)
}
}
};
let mut tree = termtree::Tree::new(entry_name(id, name_and_mode));
let mut buf = Vec::new();
for entry in odb.find_tree(id, &mut buf).unwrap().entries {
if entry.mode.is_tree() {
tree.push(visualize_tree(entry.oid, odb, Some((entry.filename, entry.mode))));
} else {
tree.push(entry_name(entry.oid, Some((entry.filename, entry.mode))));
}
}
tree
}
The above (copied from the tests of the gix-merge
crate) lists a tree as is and prints all information about it that one would want in a simple recursive function. It can certainly be rewritten to power print_tree()
, and I think it will be for the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't too concerned with the ordering/bredthfirstness of the function. In some ways, it's possibly more helpful if it prints in same order as the tree entries.
I'll have a think about what a helpful debug output would look like and revisit
use super::*; | ||
|
||
#[test] | ||
fn a_commit_is_a_subset_of_itself() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these tests are read-only, which is when the read_only::fixture*
methods are faster. They create a fixture only once and make it easy to inspect on disk. They come at the disadvantage of separating the test-setup from the test itself though, but to me it's easy enough to check the script and I can understand more easily how the structure looks like by looking at it, and if in doubt the final structure is easy to inspect on disk.
Of course, these are not the pinnacle of what's possible and I always thought that it would be great to define tests more like cargo
does, but to apply the same techniques to easily define complex Git repositories.
Thank you! This is greatly appreciated
Thanks for these! Both look good.
The new integration check will run over a full stack rather than getting called on an individual commit. As such, it will almost always be called many times in a row. As such, it makes more sense to open the in-memory repository outside of the
I'm wary of becoming a perl codebase, but would it make sense to add |
It absolutely can be done that way, which certainly helps to get a feel for it. Additionally, I still don't understand how crates specified as |
9de4abc
to
e84e7b7
Compare
This is more of a suggestion than anything else.
e84e7b7
to
930c144
Compare
Introduces a function for determining if the changes introduced by a commit, are a subset of the changes introduced by another.
This is helpful for an upcoming PR which should greatly improve our integration check.
get_exclusive_tree
is able to take a stack of commits IEBase -> A -> B
. And return a tree which only contains the introductions of theB
commit (when diffed against the Base).We do this by merging commits
B
andBase
against the parent ofB
. Any conflicts are resolved in favour of theB
commit.is_subset
makes use ofget_exclusive_tree
to be able to compare the subset and superset candidates without worry about changes that they may have been cherry-picked on top ofThis is part 1 of 2 in a stack made with GitButler: