-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathparfu_2022_catalog_format.txt
129 lines (104 loc) · 4.75 KB
/
parfu_2022_catalog_format.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
////////////////////////////////////////////////////////////////////////////////////
//
// University of Illinois/NCSA Open Source License
// http://otm.illinois.edu/disclose-protect/illinois-open-source-license
//
// Parfu is copyright (c) 2017-2022, The Trustees of the University of Illinois.
// All rights reserved.
//
// Parfu was developed by:
// The University of Illinois
// The National Center For Supercomputing Applications (NCSA)
// Blue Waters Science and Engineering Applications Support Team (SEAS)
// Craig P Steffen <[email protected]>
// Roland Haas <[email protected]>
//
// https://github.com/ncsa/parfu_archive_tool
// http://www.ncsa.illinois.edu/People/csteffen/parfu/
//
// For full licnse text see the LICENSE file provided with the source
// distribution.
//
///////////////////////////////////////////////////////////////////////////////////
// This file is the re-write of the header format for the 0.6 and later
// C++ port created during the summer and fall of 2022.
// The original format documentation is in the top of the file
// parfu_buffer_utils.com. This supercedes that format, and is used
// by parfu going forward.
// A quick 2022 note: pad files are no longer a thing, so all the references
// to padding and pad files are gone. This makes the archive files more
// like native tar files, makes the code simpler, and saves a lot of
// complexity in the archive file format.
// The beginning of the archive file will be the parfu
// catalog, and at the beginning of the catalog is this
// header block, which is approximately human readable.
///////////////////////////////////////////////////////////////////////
//
// catalog file header:
// each of the initial header numbers is a 10-digit number delineated
// by an '\n' at the end
// all numbers in the catalog header or catalog body lines are written
// out in ASCII. This makes it human readable.
// \n is a newline (single byte)
// \t is a tab character (single byte)
// which characters are actually used as the value delimeters can
// be set in the parfu header file with #define statements
// catalog header format:
// SSSSSSSSSS\n total size of catalog, in bytes, including whole header
// parfu_v06 \n version string
// 000 of 001\n index within multiple archive files
// FFFFFFFFFF\n total number of file entries in catalog
//////////////////////////////////////////////////////////////////////
//
// Catalog file entry lines
//
// These are the file entry lines that will be written to the front
// of the archive file ON DISK.
// Each directory or file or symlink in the archive is represented in
// the catalog by a single line entry. The format of that line
// is as follows:
// AAA \t T \t TGT \t SZ \t THSZ \t LOC_AR \n
// with the entries defined thusly:
// AAA relative filename within the archive
// T type of entry: dir, symlink, or regular file
// TGT: if symlink, the target, otherwise empty
// SZ is size of file in bytes
// THSZ is the size of the tar header in bytes
// LOC_AR is the beginning of file or fragment in archive file
/////////////////////////////////////////////////////////////////////
//
// MPI "working orders" entry lines
//
// These lines are the format that rank 0 ("boss" rank) uses to
// signal to the worker over MPI message that the worker is to
// move X bytes from target file Y to archive file Z and where
// exactly to move them.
//
// these lines are a modified version of the catalog lines
// above but contain a couple of additional pices of
// information
// NNN \t AAA \t T \t TGT \t SZ \t THSZ \t LOC_AR \t LOC_OR \n
// with the entries defined thusly:
// NNN index of which of the multiple open archive files we're to be
// written into. The file information will be stored in some
// kind of shared structure or some such.
// AAA relative filename within the archive
// T type of entry: dir, symlink, or regular file
// TGT: if symlink, the target, otherwise empty
// SZ is size of file in bytes
// THSZ is the size of the tar header in bytes
// LOC_AR is the beginning of file or fragment in archive file
// LOC_OR is location of fragment in orig file (zero for single file)
////////////////////////
//
// below are discarded entries in the file entries.
// RRR path+filename, relative to CWD of running process
// catalog header (full):
// SSSSSSSSSS\n total size of catalog, in bytes, including whole header
// parfu_v04 \n version string
// full \n indicates full catalog
// BBBBBBBBBB\n bucket size (a whole number of buckets reserved for catalog
// FFFFFFFFFF\n total number of file entries in catalog
// RNK_BKT rank bucket index of the file fragment
// N_FRG file is divided into this many fragments
// FP_IND file pointer index; internal use only