Skip to content

Commit

Permalink
Add path walk API and its use in 'git pack-objects' (#5171)
Browse files Browse the repository at this point in the history
This is a follow up to #5157 as well as motivated by the RFC in
gitgitgadget#1786.

We have ways of walking all objects, but it is focused on visiting a
single commit and then expanding the new trees and blobs reachable from
that commit that have not been visited yet. This means that objects
arrive without any locality based on their path.

Add a new "path walk API" that focuses on walking objects in batches
according to their type and path. This will walk all annotated tags, all
commits, all root trees, and then start a depth-first search among all
paths in the repo to collect trees and blobs in batches.

The most important application for this is being fast-tracked to Git for
Windows: `git pack-objects --path-walk`. This application of the path
walk API discovers the objects to pack via this batched walk, and
automatically groups objects that appear at a common path so they can be
checked for delta comparisons.

This use completely avoids any name-hash collisions (even the collisions
that sometimes occur with the new `--full-name-hash` option) and can be
much faster to compute since the first pass of delta calculations does
not waste time on objects that are unlikely to be diffable.

Some statistics are available in the commit messages.
  • Loading branch information
derrickstolee authored and dscho committed Feb 5, 2025
2 parents 2b5c888 + b07ced2 commit 47153c7
Show file tree
Hide file tree
Showing 23 changed files with 537 additions and 42 deletions.
4 changes: 4 additions & 0 deletions Documentation/config/feature.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ walking fewer objects.
+
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
reusing objects from multiple packs instead of just one.
+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
significantly smaller in the presence of certain filename collisions with Git's
default name-hash.

feature.manyFiles::
Enable config options that optimize for repos with many files in the
Expand Down
8 changes: 8 additions & 0 deletions Documentation/config/pack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,14 @@ pack.useSparse::
commits contain certain types of direct renames. Default is
`true`.

pack.usePathWalk::
When true, git will default to using the '--path-walk' option in
'git pack-objects' when the '--revs' option is present. This
algorithm groups objects by path to maximize the ability to
compute delta chains across historical versions of the same
object. This may disable other options, such as using bitmaps to
enumerate objects.

pack.preferBitmapTips::
When selecting which commits will receive bitmaps, prefer a
commit at the tip of any reference that is a suffix of any value
Expand Down
12 changes: 11 additions & 1 deletion Documentation/git-pack-objects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ SYNOPSIS
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse]
[--full-name-hash] < <object-list>
[--full-name-hash] [--path-walk] < <object-list>


DESCRIPTION
Expand Down Expand Up @@ -346,6 +346,16 @@ raise an error.
Restrict delta matches based on "islands". See DELTA ISLANDS
below.

--path-walk::
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`,
`--shallow`, or `--filter`.

DELTA ISLANDS
-------------
Expand Down
15 changes: 14 additions & 1 deletion Documentation/git-repack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ SYNOPSIS
[verse]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
[--write-midx] [--full-name-hash]
[--write-midx] [--full-name-hash] [--path-walk]

DESCRIPTION
-----------
Expand Down Expand Up @@ -251,6 +251,19 @@ linkgit:git-multi-pack-index[1]).
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
containing the non-redundant packs.

--path-walk::
This option passes the `--path-walk` option to the underlying
`git pack-options` process (see linkgit:git-pack-objects[1]).
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`
or `--filter`.

CONFIGURATION
-------------

Expand Down
3 changes: 2 additions & 1 deletion Documentation/technical/api-path-walk.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,5 @@ Examples
--------

See example usages in:
`t/helper/test-path-walk.c`
`t/helper/test-path-walk.c`,
`builtin/pack-objects.c`
Loading

0 comments on commit 47153c7

Please sign in to comment.