From f93d642c8d93e3f394b04d7cb655863268a60ee8 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 3 Jul 2024 14:16:57 -0400 Subject: [PATCH 1/4] commit-reach: add get_branch_base_for_tip Add a new reachability algorithm that intends to discover (from a heuristic) which branch was used as the starting point for a given commit. Add focused tests using the 'test-tool reach' command. In repositories that use pull requests (or merge requests) to advance one or more "protected" branches, the history of that reference can be recovered by following the first-parent history in most cases. Most are completed using no-fast-forward merges, though squash merges are quite common. Less common is rebase-and-merge, which still validates this assumption. Finally, the case that breaks this assumption is the fast-forward update (with potential rebasing). Even in this case, the previous commit commonly appears in the first-parent history of the branch. Similar assumptions can be made for a topic branch created by a single user with the intention to merge back into another branch. Using 'git commit', 'git merge', and 'git cherry-pick' from HEAD will default to having the first-parent commit be the previous commit at HEAD. This history changes only with commands such as 'git reset' or 'git rebase', where the command names also imply that the branch is starting from a new location. With this movement of branches in mind, the following heuristic is proposed as a way to determine the base branch for a given source branch: Among a list of candidate base branches, select the candidate that minimizes the number of commits in the first-parent history of the source that are not in the first-parent history of the candidate. Prior third-party solutions to this problem have used this optimization criteria, but have relied upon extracting the first-parent history and comparing those lists as tables instead of using commit-graph walks. Given current command-line interface options, this optimization criteria is not easy to detect directly. Even using the command git rev-list --count --first-parent .. does not measure this count, as it uses full reachability from to determine which commits to remove from the range '..'. This may lead to one asking if we should instead be using the full reachability of the candidate and only the first-parent history of the source. This, unfortunately, does not work for repositories that use long-lived branches and automation to merge across those branches. In extremely large repositories, merging into a single trunk may not be feasible. This is usually due to the desired frequency of updates (thousands of engineers doing daily work) combined with the time required to perform a validation build. These factors combine to create significant risk of semantic merge conflicts, leading to build breaks on the trunk. In response, repository maintainers can create a single Level Zero (L0) trunk and multiple Level One (L1) branches. By partitioning the engineers by organization, these engineers may see lower risk of semantic merge conflicts as well as be protected against build breaks in other L1 branches. The key to making this system work is a semi-automated process of merging L1 branches into the L0 trunk and vice-versa. In a large enough organization, these L1 branches may further split into L2 or L3 branches, but the same principles apply for merging across deeper levels. If these automated merges use a typical merge with the second parent bringing in the "new" content, then each L0 and L1 branch can track its previous positions by following first-parent history, which appear as parallel paths (until reaching the first place where the branches diverged). If we also walk to second parents, then the histories overlap significantly and cannot be distinguished except for very-recent changes. For this reason, the first-parent condition should be symmetrical across the base and source branches. Another common case for desiring the result of this optimization method is the use of release branches. When releasing a version of a repository, a branch can be used to track that release. Any updates that are worth fixing in that release can be merged to the release branch and shipped with only the necessary fixes without any new features introduced in the trunk branch. The 'maint-2.' branches represent this pattern in the Git project. The microsoft/git fork uses 'vfs-2..' branches to track the changes that are custom to that fork on top of each upstream Git release 2... This application doesn't need the symmetrical first-parent condition, but the use of first-parent histories does not change the results for these branches. To determine the base branch from a list of candidates, create a new method in commit-reach.c that performs a single* commit-graph walk. The core concept is to walk first-parents starting at the candidate bases and the source, tracking the "best" base to reach a given commit. Use generation numbers to ensure that a commit is walked at most once and all children have been explored before visiting it. When reaching a commit that is reachable from both a base and the source, we will then have a guarantee that this is the closest intersection of first-parent histories. Track the best base to reach that commit and return it as a result. In rare cases involving multiple root commits, the first-parent history of the source may never intersect any of the candidates and thus a null result is returned. * There are up to two walks, since we require all commits to have a computed generation number in order to avoid incorrect results. This is similar to the need for computed generation numbers in ahead_behind() as implemented in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20). In order to track the "best" base, use a new commit slab that stores an integer. This value defaults to zero upon initialization, so use -1 to track that the source commit can reach this commit and use 'i + 1' to track that the ith base can reach this commit. When multiple bases can reach a commit, minimize the index to break ties. This allows the caller to specify an order to the bases that determines some amount of preference when the heuristic does not result in a unique result. The trickiest part of the integer slab is what happens when reaching a collision among the histories of the bases and the history of the source. This is noticed when viewing the first parent and seeing that it has a slab value that differs in sign (negative or positive). In this case, the collision commit is stored in the method variable 'branch_point' and its slab value is set to -1. The index of the best base (so far) is stored in the method variable 'best_index'. It is possible that there are multiple commits that have the branch_point as its first parent, leading to multiple updates of best_index. The result is determined when 'branch_point' is visited in the commit walk, giving the guarantee that all commits that could reach 'branch_point' were visited. Several interesting cases of collisions and different results are tested in the t6600-test-reach.sh script. Recall that this script also tests the algorithm in three possible states involving the commit-graph file and how many commits are written in the file. This provides some coverage of the need (and lack of need) for the ensure_generations_valid() method. Signed-off-by: Derrick Stolee --- commit-reach.c | 126 ++++++++++++++++++++++++++++++++++++++++++ commit-reach.h | 17 ++++++ t/helper/test-reach.c | 2 + t/t6600-test-reach.sh | 61 ++++++++++++++++++++ 4 files changed, 206 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index 8f9b008f876787..4753197ec888dd 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1222,3 +1222,129 @@ void tips_reachable_from_bases(struct repository *r, free(commits); repo_clear_commit_marks(r, SEEN); } + +/* + * This slab initializes integers to zero, so use "-1" for "tip is best" and + * "i + 1" for "bases[i] is best". + */ +define_commit_slab(best_branch_base, int); +static struct best_branch_base best_branch_base; +#define get_best(c) (*best_branch_base_at(&best_branch_base, (c))) +#define set_best(c,v) (*best_branch_base_at(&best_branch_base, (c)) = (v)) + +int get_branch_base_for_tip(struct repository *r, + struct commit *tip, + struct commit **bases, + size_t bases_nr) +{ + int best_index = -1; + struct commit *branch_point = NULL; + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + int found_missing_gen = 0; + + if (!bases_nr) + return -1; + + repo_parse_commit(r, tip); + if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY) + found_missing_gen = 1; + + /* Check for missing generation numbers. */ + for (size_t i = 0; i < bases_nr; i++) { + struct commit *c = bases[i]; + repo_parse_commit(r, c); + if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY) + found_missing_gen = 1; + } + + if (found_missing_gen) { + struct commit **commits; + size_t commits_nr = bases_nr + 1; + + CALLOC_ARRAY(commits, commits_nr); + COPY_ARRAY(commits, bases, bases_nr); + commits[bases_nr] = tip; + ensure_generations_valid(r, commits, commits_nr); + free(commits); + } + + /* Initialize queue and slab now that generations are guaranteed. */ + init_best_branch_base(&best_branch_base); + set_best(tip, -1); + prio_queue_put(&queue, tip); + + for (size_t i = 0; i < bases_nr; i++) { + struct commit *c = bases[i]; + int best = get_best(c); + + /* Has this already been marked as best by another commit? */ + if (best) { + if (best == -1) { + /* We agree at this position. Stop now. */ + best_index = i + 1; + goto cleanup; + } + continue; + } + + set_best(c, i + 1); + prio_queue_put(&queue, c); + } + + while (queue.nr) { + struct commit *c = prio_queue_get(&queue); + int best_for_c = get_best(c); + int best_for_p, positive; + struct commit *parent; + + /* Have we reached a known branch point? It's optimal. */ + if (c == branch_point) + break; + + repo_parse_commit(r, c); + if (!c->parents) + continue; + + parent = c->parents->item; + repo_parse_commit(r, parent); + best_for_p = get_best(parent); + + if (!best_for_p) { + /* 'parent' is new, so pass along best_for_c. */ + set_best(parent, best_for_c); + prio_queue_put(&queue, parent); + continue; + } + + if (best_for_p > 0 && best_for_c > 0) { + /* Collision among bases. Minimize. */ + if (best_for_c < best_for_p) + set_best(parent, best_for_c); + continue; + } + + /* + * At this point, we have reached a commit that is reachable + * from the tip, either from 'c' or from an earlier commit to + * have 'parent' as its first parent. + * + * Update 'best_index' to match the minimum of all base indices + * to reach 'parent'. + */ + + /* Exactly one is positive due to initial conditions. */ + positive = (best_for_c < 0) ? best_for_p : best_for_c; + + if (best_index < 0 || positive < best_index) + best_index = positive; + + /* No matter what, track that the parent is reachable from tip. */ + set_best(parent, -1); + branch_point = parent; + } + +cleanup: + clear_best_branch_base(&best_branch_base); + clear_prio_queue(&queue); + return best_index > 0 ? best_index - 1 : -1; +} diff --git a/commit-reach.h b/commit-reach.h index bf63cc468fd311..9a745b7e176685 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -139,4 +139,21 @@ void tips_reachable_from_bases(struct repository *r, struct commit **tips, size_t tips_nr, int mark); +/* + * Given a 'tip' commit and a list potential 'bases', return the index 'i' that + * minimizes the number of commits in the first-parent history of 'tip' and not + * in the first-parent history of 'bases[i]'. + * + * Among a list of long-lived branches that are updated only by merges (with the + * first parent being the previous position of the branch), this would inform + * which branch was used to create the tip reference. + * + * Returns -1 if no common point is found in first-parent histories, which is + * rare, but possible with multiple root commits. + */ +int get_branch_base_for_tip(struct repository *r, + struct commit *tip, + struct commit **bases, + size_t bases_nr); + #endif diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 1e3b431e3e7211..8579b607aa586e 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -114,6 +114,8 @@ int cmd__reach(int ac, const char **av) repo_in_merge_bases_many(the_repository, A, X_nr, X_array, 0)); else if (!strcmp(av[1], "is_descendant_of")) printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X)); + else if (!strcmp(av[1], "get_branch_base_for_tip")) + printf("%s(A,X):%d\n", av[1], get_branch_base_for_tip(r, A, X_array, X_nr)); else if (!strcmp(av[1], "get_merge_bases_many")) { struct commit_list *list = NULL; if (repo_get_merge_bases_many(the_repository, diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b330945f497840..e789a4720c15ca 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -612,4 +612,65 @@ test_expect_success 'for-each-ref merged:none' ' --format="%(refname)" --stdin ' +# For get_branch_base_for_tip, we only care about +# first-parent history. Here is the test graph with +# second parents removed: +# +# (10,10) +# / +# (10,9) (9,10) +# / / +# (10,8) (9,9) (8,10) +# / / / +# ( continued...) +# \ / / / +# (3,1) (2,2) (1,3) +# \ / / +# (2,1) (1,2) +# \ / +# (1,1) +# +# In short, for a commit (i,j), the first-parent history +# walks all commits (i, k) with k from j to 1, then the +# commits (l, 1) with l from i to 1. + +test_expect_success 'get_branch_base_for_tip: none reach' ' + # (2,3) branched from the first tip (i,4) in X with i > 2 + cat >input <<-\EOF && + A:commit-2-3 + X:commit-1-2 + X:commit-1-4 + X:commit-4-4 + X:commit-8-4 + X:commit-10-4 + EOF + echo "get_branch_base_for_tip(A,X):2" >expect && + test_all_modes get_branch_base_for_tip +' + +test_expect_success 'get_branch_base_for_tip: equal to tip' ' + # (2,3) branched from the first tip (i,4) in X with i > 2 + cat >input <<-\EOF && + A:commit-8-4 + X:commit-1-2 + X:commit-1-4 + X:commit-4-4 + X:commit-8-4 + X:commit-10-4 + EOF + echo "get_branch_base_for_tip(A,X):3" >expect && + test_all_modes get_branch_base_for_tip +' + +test_expect_success 'get_branch_base_for_tip: all reach tip' ' + # (2,3) branched from the first tip (i,4) in X with i > 2 + cat >input <<-\EOF && + A:commit-4-1 + X:commit-4-2 + X:commit-5-1 + EOF + echo "get_branch_base_for_tip(A,X):0" >expect && + test_all_modes get_branch_base_for_tip +' + test_done From 5240c2a7b328e3d356574a1ab00e2faa8a71d92a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 13 Aug 2024 09:19:16 -0400 Subject: [PATCH 2/4] commit: add gentle reference lookup method The lookup_commit_reference_by_name() method uses lookup_commit_reference() without an option to use lookup_commit_reference_gently(). Create a gentle version of the method so it can be used in locations where non-commits may be found but error messages should be silenced. Signed-off-by: Derrick Stolee --- commit.c | 8 +++++++- commit.h | 2 ++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/commit.c b/commit.c index 1a479a997c4e47..ed49be8dce5a1d 100644 --- a/commit.c +++ b/commit.c @@ -82,13 +82,19 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid) } struct commit *lookup_commit_reference_by_name(const char *name) +{ + return lookup_commit_reference_by_name_gently(name, 0); +} + +struct commit *lookup_commit_reference_by_name_gently(const char *name, + int quiet) { struct object_id oid; struct commit *commit; if (repo_get_oid_committish(the_repository, name, &oid)) return NULL; - commit = lookup_commit_reference(the_repository, &oid); + commit = lookup_commit_reference_gently(the_repository, &oid, quiet); if (repo_parse_commit(the_repository, commit)) return NULL; return commit; diff --git a/commit.h b/commit.h index 62fe0d77a70738..ef17668cc6977d 100644 --- a/commit.h +++ b/commit.h @@ -81,6 +81,8 @@ struct commit *lookup_commit_reference_gently(struct repository *r, const struct object_id *oid, int quiet); struct commit *lookup_commit_reference_by_name(const char *name); +struct commit *lookup_commit_reference_by_name_gently(const char *name, + int quiet); /* * Look up object named by "oid", dereference tag as necessary, From df05cee6003d7ad5ea750835cad763848365c043 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 3 Jul 2024 14:58:38 -0400 Subject: [PATCH 3/4] for-each-ref: add 'is-base' token The previous change introduced the get_branch_base_for_tip() method in commit-reach.c. The motivation of that change was about using a heuristic to deteremine the base branch for a source commit from a list of candidate commit tips. This change makes that algorithm visible to users via a new atom in the 'git for-each-ref' format. This change is very similar to the chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom, 2023-03-20). Introduce the 'is-base:' atom, which will indicate that the algorithm should be computed and the result of the algorithm is reported using an indicator of the form '()'. For example, using '%(is-base:HEAD)' would result in one line having the token '(HEAD)'. Use the sorted order of refs included in the ref filter to break ties in the algorithm's heuristic. In the previous change, the motivating examples include using an L0 trunk, long-lived L1 branches, and temporary release branches. A caller could communicate the ordered preference among these categories using the input refpecs and avoiding a different sort mechanism. This sorting behavior is tested in the test scripts. It is important to include this atom as a special case to can_do_iterative_format() to match the expectations created in bd98f9774e1 (ref-filter.c: filter & format refs in the same callback, 2023-11-14). The ahead-behind atom was one of the special cases, and this similarly requires using an algorithm across all input refs before starting the format of any single ref. In the test script, the format tokens use colons or lack whitespace to avoid Git complaining about trailing whitespace errors. Signed-off-by: Derrick Stolee --- Documentation/git-for-each-ref.txt | 42 ++++++++++++++++ ref-filter.c | 77 +++++++++++++++++++++++++++++- ref-filter.h | 15 ++++++ t/t6300-for-each-ref.sh | 9 ++++ t/t6600-test-reach.sh | 60 +++++++++++++++++++++++ 5 files changed, 202 insertions(+), 1 deletion(-) diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index c1dd12b93cfd5c..d3764401a2334b 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -264,6 +264,48 @@ ahead-behind::: commits ahead and behind, respectively, when comparing the output ref to the `` specified in the format. +is-base::: + In at most one row, `()` will appear to indicate the ref + that is most likely the ref used as a starting point for the branch + that produced ``. This choice is made using a heuristic: + choose the ref that minimizes the number of commits in the + first-parent history of `` and not in the first-parent + history of the ref. ++ +For example, consider the following figure of first-parent histories of +several refs: ++ +---- +*--*--*--*--*--* refs/heads/A +\ + \ + *--*--*--* refs/heads/B + \ \ + \ \ + * * refs/heads/C + \ + \ + *--* refs/heads/D +---- ++ +Here, if `A`, `B`, and `C` are the filtered references, and the format +string is `%(refname):%(is-base:D)`, then the output would be ++ +---- +refs/heads/A: +refs/heads/B:(D) +refs/heads/C: +---- ++ +This is because the first-parent history of `D` has its earliest +intersection with the first-parent histories of the filtered refs at a +common first-parent ancestor of `B` and `C` and ties are broken by the +earliest ref in the sorted order. ++ +Note that this token will not appear if the first-parent history of +`` does not intersect the first-parent histories of the +filtered refs. + describe[:options]:: A human-readable name, like linkgit:git-describe[1]; empty string for undescribable commits. The `describe` string may diff --git a/ref-filter.c b/ref-filter.c index 59ad6f54ddb098..3d598f6b6e6de3 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -167,6 +167,7 @@ enum atom_type { ATOM_ELSE, ATOM_REST, ATOM_AHEADBEHIND, + ATOM_ISBASE, }; /* @@ -889,6 +890,23 @@ static int ahead_behind_atom_parser(struct ref_format *format, return 0; } +static int is_base_atom_parser(struct ref_format *format, + struct used_atom *atom UNUSED, + const char *arg, struct strbuf *err) +{ + struct string_list_item *item; + + if (!arg) + return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:)")); + + item = string_list_append(&format->is_base_tips, arg); + item->util = lookup_commit_reference_by_name(arg); + if (!item->util) + die("failed to find '%s'", arg); + + return 0; +} + static int head_atom_parser(struct ref_format *format UNUSED, struct used_atom *atom, const char *arg, struct strbuf *err) @@ -952,6 +970,7 @@ static struct { [ATOM_ELSE] = { "else", SOURCE_NONE }, [ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser }, [ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser }, + [ATOM_ISBASE] = { "is-base", SOURCE_OTHER, FIELD_STR, is_base_atom_parser }, /* * Please update $__git_ref_fieldlist in git-completion.bash * when you add new atoms @@ -2334,6 +2353,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) int i; struct object_info empty = OBJECT_INFO_INIT; int ahead_behind_atoms = 0; + int is_base_atoms = 0; CALLOC_ARRAY(ref->value, used_atom_cnt); @@ -2475,6 +2495,15 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) v->s = xstrdup(""); } continue; + } else if (atom_type == ATOM_ISBASE) { + if (ref->is_base && ref->is_base[is_base_atoms]) { + v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]); + free(ref->is_base[is_base_atoms]); + } else { + v->s = xstrdup(""); + } + is_base_atoms++; + continue; } else continue; @@ -2876,6 +2905,7 @@ static void free_array_item(struct ref_array_item *item) free(item->value); } free(item->counts); + free(item->is_base); free(item); } @@ -3040,6 +3070,49 @@ void filter_ahead_behind(struct repository *r, free(commits); } +void filter_is_base(struct repository *r, + struct ref_format *format, + struct ref_array *array) +{ + struct commit **bases; + size_t bases_nr = 0; + struct ref_array_item **back_index; + + if (!format->is_base_tips.nr || !array->nr) + return; + + CALLOC_ARRAY(back_index, array->nr); + CALLOC_ARRAY(bases, array->nr); + + for (size_t i = 0; i < array->nr; i++) { + const char *name = array->items[i]->refname; + struct commit *c = lookup_commit_reference_by_name_gently(name, 1); + + CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr); + + if (!c) + continue; + + back_index[bases_nr] = array->items[i]; + bases[bases_nr] = c; + bases_nr++; + } + + for (size_t i = 0; i < format->is_base_tips.nr; i++) { + struct commit *tip = format->is_base_tips.items[i].util; + int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr); + + if (base_index < 0) + continue; + + /* Store the string for use in output later. */ + back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string); + } + + free(back_index); + free(bases); +} + static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data) { int ret = 0; @@ -3126,7 +3199,8 @@ static inline int can_do_iterative_format(struct ref_filter *filter, return !(filter->reachable_from || filter->unreachable_from || sorting || - format->bases.nr); + format->bases.nr || + format->is_base_tips.nr); } void filter_and_format_refs(struct ref_filter *filter, unsigned int type, @@ -3150,6 +3224,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type, struct ref_array array = { 0 }; filter_refs(&array, filter, type); filter_ahead_behind(the_repository, format, &array); + filter_is_base(the_repository, format, &array); ref_array_sort(sorting, &array); print_formatted_ref_array(&array, format); ref_array_clear(&array); diff --git a/ref-filter.h b/ref-filter.h index 0ca28d2bba6f2a..20419a56218796 100644 --- a/ref-filter.h +++ b/ref-filter.h @@ -48,6 +48,7 @@ struct ref_array_item { struct commit *commit; struct atom_value *value; struct ahead_behind_count **counts; + char **is_base; char refname[FLEX_ARRAY]; }; @@ -101,6 +102,9 @@ struct ref_format { /* List of bases for ahead-behind counts. */ struct string_list bases; + /* List of bases for is-base indicators. */ + struct string_list is_base_tips; + struct { int max_count; int omit_empty; @@ -114,6 +118,7 @@ struct ref_format { #define REF_FORMAT_INIT { \ .use_color = -1, \ .bases = STRING_LIST_INIT_DUP, \ + .is_base_tips = STRING_LIST_INIT_DUP, \ } /* Macros for checking --merged and --no-merged options */ @@ -203,6 +208,16 @@ void filter_ahead_behind(struct repository *r, struct ref_format *format, struct ref_array *array); +/* + * If the provided format includes is-base atoms, then compute the base checks + * for those tips against all refs. + * + * If this is not called, then any is-base atoms will be blank. + */ +void filter_is_base(struct repository *r, + struct ref_format *format, + struct ref_array *array); + void ref_filter_init(struct ref_filter *filter); void ref_filter_clear(struct ref_filter *filter); diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index eb6c8204e8bd0e..8d15713cc6709c 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -1907,6 +1907,15 @@ test_expect_success 'git for-each-ref with nested tags' ' test_cmp expect actual ' +test_expect_success 'is-base atom with non-commits' ' + git for-each-ref --format="%(is-base:HEAD) %(refname)" >out 2>err && + grep "(HEAD) refs/heads/main" out && + + test_line_count = 2 err && + grep "error: object .* is a commit, not a blob" err && + grep "error: bad tag pointer to" err +' + GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)" TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)" diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index e789a4720c15ca..2591f8b8b39bf4 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -673,4 +673,64 @@ test_expect_success 'get_branch_base_for_tip: all reach tip' ' test_all_modes get_branch_base_for_tip ' +test_expect_success 'for-each-ref is-base: none reach' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + refs/heads/commit-8-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1: + refs/heads/commit-4-2:(commit-2-3) + refs/heads/commit-4-4: + refs/heads/commit-8-4: + EOF + run_all_modes git for-each-ref \ + --format="%(refname):%(is-base:commit-2-3)" --stdin +' + +test_expect_success 'for-each-ref is-base: all reach' ' + cat >input <<-\EOF && + refs/heads/commit-4-2 + refs/heads/commit-5-1 + EOF + cat >expect <<-\EOF && + refs/heads/commit-4-2:(commit-4-1) + refs/heads/commit-5-1: + EOF + run_all_modes git for-each-ref \ + --format="%(refname):%(is-base:commit-4-1)" --stdin +' + +test_expect_success 'for-each-ref is-base: equal to tip' ' + cat >input <<-\EOF && + refs/heads/commit-4-2 + refs/heads/commit-5-1 + EOF + cat >expect <<-\EOF && + refs/heads/commit-4-2:(commit-4-2) + refs/heads/commit-5-1: + EOF + run_all_modes git for-each-ref \ + --format="%(refname):%(is-base:commit-4-2)" --stdin +' + +test_expect_success 'for-each-ref is-base:multiple' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + refs/heads/commit-8-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1[-] + refs/heads/commit-4-2[(commit-2-3)-] + refs/heads/commit-4-4[-] + refs/heads/commit-8-4[-(commit-6-5)] + EOF + run_all_modes git for-each-ref \ + --format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin +' + test_done From cce9921bbd8825c3a12d32b2c8b62a2c4b06333a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 8 Jul 2024 11:20:19 -0400 Subject: [PATCH 4/4] p1500: add is-base performance tests The previous two changes introduced a commit walking heuristic for finding the most likely base branch for a given source. This algorithm walks first-parent histories until reaching a collision. This walk _should_ be very fast. Exceptions include cases where a commit-graph file does not exist, leading to a full walk of all reachable commits to compute generation numbers, or a case where no collision in the first-parent history exists, leading to a walk of all first-parent history to the root commits. The p1500 test script guarantees a complete commit-graph file during its setup, so we will not test that scenario. Do create a new root commit in an effort to test the scenario of parallel first-parent histories. Even with the extra root commit, these tests take no longer than 0.02 seconds on my machine for the Git repository. However, the results are slightly more interesting in a copy of the Linux kernel repository: Test --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.12 1500.3: ahead-behind counts: git branch 0.12 1500.4: ahead-behind counts: git tag 0.12 1500.5: contains: git for-each-ref --merged 0.04 1500.6: contains: git branch --merged 0.04 1500.7: contains: git tag --merged 0.04 1500.8: is-base check: test-tool reach (refs) 0.03 1500.9: is-base check: test-tool reach (tags) 0.03 1500.10: is-base check: git for-each-ref 0.03 1500.11: is-base check: git for-each-ref (disjoint-base) 0.07 Signed-off-by: Derrick Stolee --- t/perf/p1500-graph-walks.sh | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh index e14e7620cce383..5b23ce5db93ad3 100755 --- a/t/perf/p1500-graph-walks.sh +++ b/t/perf/p1500-graph-walks.sh @@ -20,6 +20,21 @@ test_expect_success 'setup' ' echo tag-$ref || return 1 done >tags && + + echo "A:HEAD" >test-tool-refs && + for line in $(cat refs) + do + echo "X:$line" >>test-tool-refs || return 1 + done && + echo "A:HEAD" >test-tool-tags && + for line in $(cat tags) + do + echo "X:$line" >>test-tool-tags || return 1 + done && + + commit=$(git commit-tree $(git rev-parse HEAD^{tree})) && + git update-ref refs/heads/disjoint-base $commit && + git commit-graph write --reachable ' @@ -47,4 +62,20 @@ test_perf 'contains: git tag --merged' ' xargs git tag --merged=HEAD