forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WORK-IN-PROGRESS] Introduce the path walk API into Git for Windows #5146
Closed
Closed
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
6354d7a
path-walk: introduce an object walk by path
derrickstolee c8e08c3
backfill: add builtin boilerplate
derrickstolee b05a276
backfill: basic functionality and tests
derrickstolee e02f7b3
backfill: add --batch-size=<n> option
derrickstolee 4236e4f
backfill: add --sparse option
derrickstolee 31c9b45
path-walk: allow consumer to specify object types
derrickstolee 356abc9
backfill: assume --sparse when sparse-checkout is enabled
derrickstolee 3a421ff
path-walk: allow visiting tags
derrickstolee b9471b6
survey: stub in new experimental `git-survey` command
jeffhostetler c4b3490
survey: add command line opts to select references
jeffhostetler ca37a49
survey: collect the set of requested refs
jeffhostetler 0a20a17
survey: start pretty printing data in table form
derrickstolee 91c4d57
survey: add object count summary
derrickstolee 53632be
revision: create mark_trees_uninteresting_dense()
derrickstolee c63928e
survey: summarize total sizes by object type
derrickstolee 3e9b671
path-walk: add prune_all_uninteresting option
derrickstolee af7d53f
survey: show progress during object walk
derrickstolee d192ae7
pack-objects: add --path-walk option
derrickstolee 5f7e131
survey: add ability to track prioritized lists
derrickstolee ab0bc08
pack-objects: extract should_attempt_deltas()
derrickstolee bd8b5b5
survey: add report of "largest" paths
derrickstolee c6d4832
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
derrickstolee c2092f0
p5313: add size comparison test
derrickstolee bbc57f7
repack: add --path-walk option
derrickstolee 32fca07
pack-objects: enable --path-walk via config
derrickstolee c145b9e
pack-objects: add --full-name-hash option
derrickstolee 72191a0
test-name-hash: add helper to compute name-hash functions
derrickstolee 5039f03
p5314: add a size test for name-hash collisions
derrickstolee e43582c
scalar: enable path-walk during push via config
derrickstolee 88fee5b
pack-objects: output debug info about deltas
derrickstolee d17e503
Merge branch 'backfill'
dscho d7e7283
Merge branch 'survey'
dscho 98a5786
Merge branch 'pack-path-walk'
dscho 9d0690a
Merge branch 'path-walk'
dscho 556335a
fixup! survey: collect the set of requested refs
dscho 69aa8d8
fixup! pack-objects: output debug info about deltas
dscho 5001883
fixup! survey: summarize total sizes by object type
dscho 3ab1bda
fixup! survey: add report of "largest" paths
dscho 84c8a06
fixup! survey: summarize total sizes by object type
dscho 16cd9a3
fixup! pack-objects: output debug info about deltas
dscho c8f1239
fixup! survey: start pretty printing data in table form
dscho b5c2265
fixup! survey: add object count summary
dscho fee8f88
fixup! survey: summarize total sizes by object type
dscho 489ce0c
test-tool: add the `path-walk` subcommand
dscho 9b78d40
fixup! test-tool: add the `path-walk` subcommand
dscho File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
git-backfill(1) | ||
=============== | ||
|
||
NAME | ||
---- | ||
git-backfill - Download missing objects in a partial clone | ||
|
||
|
||
SYNOPSIS | ||
-------- | ||
[verse] | ||
'git backfill' [--batch-size=<n>] [--[no-]sparse] | ||
|
||
DESCRIPTION | ||
----------- | ||
|
||
Blobless partial clones are created using `git clone --filter=blob:none` | ||
and then configure the local repository such that the Git client avoids | ||
downloading blob objects unless they are required for a local operation. | ||
This initially means that the clone and later fetches download reachable | ||
commits and trees but no blobs. Later operations that change the `HEAD` | ||
pointer, such as `git checkout` or `git merge`, may need to download | ||
missing blobs in order to complete their operation. | ||
|
||
In the worst cases, commands that compute blob diffs, such as `git blame`, | ||
become very slow as they download the missing blobs in single-blob | ||
requests to satisfy the missing object as the Git command needs it. This | ||
leads to multiple download requests and no ability for the Git server to | ||
provide delta compression across those objects. | ||
|
||
The `git backfill` command provides a way for the user to request that | ||
Git downloads the missing blobs (with optional filters) such that the | ||
missing blobs representing historical versions of files can be downloaded | ||
in batches. The `backfill` command attempts to optimize the request by | ||
grouping blobs that appear at the same path, hopefully leading to good | ||
delta compression in the packfile sent by the server. | ||
|
||
By default, `git backfill` downloads all blobs reachable from the `HEAD` | ||
commit. This set can be restricted or expanded using various options. | ||
|
||
OPTIONS | ||
------- | ||
|
||
--batch-size=<n>:: | ||
Specify a minimum size for a batch of missing objects to request | ||
from the server. This size may be exceeded by the last set of | ||
blobs seen at a given path. Default batch size is 16,000. | ||
|
||
--[no-]sparse:: | ||
Only download objects if they appear at a path that matches the | ||
current sparse-checkout. If the sparse-checkout feature is enabled, | ||
then `--sparse` is assumed and can be disabled with `--no-sparse`. | ||
|
||
SEE ALSO | ||
-------- | ||
linkgit:git-clone[1]. | ||
|
||
GIT | ||
--- | ||
Part of the linkgit:git[1] suite |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
git-survey(1) | ||
============= | ||
|
||
NAME | ||
---- | ||
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale | ||
|
||
SYNOPSIS | ||
-------- | ||
[verse] | ||
(EXPERIMENTAL!) `git survey` <options> | ||
|
||
DESCRIPTION | ||
----------- | ||
|
||
Survey the repository and measure various dimensions of scale. | ||
|
||
As repositories grow to "monorepo" size, certain data shapes can cause | ||
performance problems. `git-survey` attempts to measure and report on | ||
known problem areas. | ||
|
||
Ref Selection and Reachable Objects | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
In this first analysis phase, `git survey` will iterate over the set of | ||
requested branches, tags, and other refs and treewalk over all of the | ||
reachable commits, trees, and blobs and generate various statistics. | ||
|
||
OPTIONS | ||
------- | ||
|
||
--progress:: | ||
Show progress. This is automatically enabled when interactive. | ||
|
||
Ref Selection | ||
~~~~~~~~~~~~~ | ||
|
||
The following options control the set of refs that `git survey` will examine. | ||
By default, `git survey` will look at tags, local branches, and remote refs. | ||
If any of the following options are given, the default set is cleared and | ||
only refs for the given options are added. | ||
|
||
--all-refs:: | ||
Use all refs. This includes local branches, tags, remote refs, | ||
notes, and stashes. This option overrides all of the following. | ||
|
||
--branches:: | ||
Add local branches (`refs/heads/`) to the set. | ||
|
||
--tags:: | ||
Add tags (`refs/tags/`) to the set. | ||
|
||
--remotes:: | ||
Add remote branches (`refs/remote/`) to the set. | ||
|
||
--detached:: | ||
Add HEAD to the set. | ||
|
||
--other:: | ||
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set. | ||
|
||
OUTPUT | ||
------ | ||
|
||
By default, `git survey` will print information about the repository in a | ||
human-readable format that includes overviews and tables. | ||
|
||
GIT | ||
--- | ||
Part of the linkgit:git[1] suite |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
#include "builtin.h" | ||
#include "git-compat-util.h" | ||
#include "config.h" | ||
#include "parse-options.h" | ||
#include "repository.h" | ||
#include "commit.h" | ||
#include "dir.h" | ||
#include "environment.h" | ||
#include "hex.h" | ||
#include "tree.h" | ||
#include "tree-walk.h" | ||
#include "object.h" | ||
#include "object-store-ll.h" | ||
#include "oid-array.h" | ||
#include "oidset.h" | ||
#include "promisor-remote.h" | ||
#include "strmap.h" | ||
#include "string-list.h" | ||
#include "revision.h" | ||
#include "trace2.h" | ||
#include "progress.h" | ||
#include "packfile.h" | ||
#include "path-walk.h" | ||
|
||
static const char * const builtin_backfill_usage[] = { | ||
N_("git backfill [--batch-size=<n>] [--[no-]sparse]"), | ||
NULL | ||
}; | ||
|
||
struct backfill_context { | ||
struct repository *repo; | ||
struct oid_array current_batch; | ||
size_t batch_size; | ||
int sparse; | ||
}; | ||
|
||
static void clear_backfill_context(struct backfill_context *ctx) | ||
{ | ||
oid_array_clear(&ctx->current_batch); | ||
} | ||
|
||
static void download_batch(struct backfill_context *ctx) | ||
{ | ||
promisor_remote_get_direct(ctx->repo, | ||
ctx->current_batch.oid, | ||
ctx->current_batch.nr); | ||
oid_array_clear(&ctx->current_batch); | ||
|
||
/* | ||
* We likely have a new packfile. Add it to the packed list to | ||
* avoid possible duplicate downloads of the same objects. | ||
*/ | ||
reprepare_packed_git(ctx->repo); | ||
} | ||
|
||
static int fill_missing_blobs(const char *path, | ||
struct oid_array *list, | ||
enum object_type type, | ||
void *data) | ||
{ | ||
struct backfill_context *ctx = data; | ||
|
||
if (type != OBJ_BLOB) | ||
BUG("fill_missing_blobs only takes blob objects"); | ||
|
||
for (size_t i = 0; i < list->nr; i++) { | ||
off_t size = 0; | ||
struct object_info info = OBJECT_INFO_INIT; | ||
info.disk_sizep = &size; | ||
if (oid_object_info_extended(the_repository, | ||
&list->oid[i], | ||
&info, | ||
OBJECT_INFO_FOR_PREFETCH) || | ||
!size) | ||
oid_array_append(&ctx->current_batch, &list->oid[i]); | ||
} | ||
|
||
if (ctx->current_batch.nr >= ctx->batch_size) | ||
download_batch(ctx); | ||
|
||
return 0; | ||
} | ||
|
||
static int do_backfill(struct backfill_context *ctx) | ||
{ | ||
struct rev_info revs; | ||
struct path_walk_info info = PATH_WALK_INFO_INIT; | ||
int ret; | ||
|
||
if (ctx->sparse) { | ||
CALLOC_ARRAY(info.pl, 1); | ||
if (get_sparse_checkout_patterns(info.pl)) | ||
return error(_("problem loading sparse-checkout")); | ||
} | ||
|
||
repo_init_revisions(ctx->repo, &revs, ""); | ||
handle_revision_arg("HEAD", &revs, 0, 0); | ||
|
||
info.revs = &revs; | ||
info.path_fn = fill_missing_blobs; | ||
info.path_fn_data = ctx; | ||
|
||
ret = walk_objects_by_path(&info); | ||
|
||
/* Download the objects that did not fill a batch. */ | ||
if (!ret) | ||
download_batch(ctx); | ||
|
||
clear_backfill_context(ctx); | ||
return ret; | ||
} | ||
|
||
int cmd_backfill(int argc, const char **argv, const char *prefix) | ||
{ | ||
struct backfill_context ctx = { | ||
.repo = the_repository, | ||
.current_batch = OID_ARRAY_INIT, | ||
.batch_size = 16000, | ||
.sparse = 0, | ||
}; | ||
struct option options[] = { | ||
OPT_INTEGER(0, "batch-size", &ctx.batch_size, | ||
N_("Minimun number of objects to request at a time")), | ||
OPT_BOOL(0, "sparse", &ctx.sparse, | ||
N_("Restrict the missing objects to the current sparse-checkout")), | ||
OPT_END(), | ||
}; | ||
|
||
if (argc == 2 && !strcmp(argv[1], "-h")) | ||
usage_with_options(builtin_backfill_usage, options); | ||
|
||
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage, | ||
0); | ||
|
||
git_config(git_default_config, NULL); | ||
|
||
if (ctx.sparse < 0) | ||
ctx.sparse = core_apply_sparse_checkout; | ||
|
||
return do_backfill(&ctx); | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to release the allocated memory somewhere?