Skip to content

Commit

Permalink
Tidy code and documentation for all the storage layouts (#132)
Browse files Browse the repository at this point in the history
  • Loading branch information
zimeon authored Dec 12, 2024
1 parent f5f255f commit 27ecfd7
Show file tree
Hide file tree
Showing 12 changed files with 378 additions and 70 deletions.
14 changes: 7 additions & 7 deletions docs/demo_using_bagit_bags.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/t
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v2/bag-info.txt
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v2/manifest-sha512.txt
INFO:root:Updated OCFL object info:bb123cd4567 by adding v2
### <ocfl.version_metadata.VersionMetadata object at 0x7f1cf4eaa5f0>
### <ocfl.version_metadata.VersionMetadata object at 0x7f6067126b30>
Updated object info:bb123cd4567 to v2
```

Expand Down Expand Up @@ -104,7 +104,7 @@ INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/t
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v3/bag-info.txt
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v3/manifest-sha512.txt
INFO:root:Updated OCFL object info:bb123cd4567 by adding v3
### <ocfl.version_metadata.VersionMetadata object at 0x7fb59dd525f0>
### <ocfl.version_metadata.VersionMetadata object at 0x7f586cc5ab60>
Updated object info:bb123cd4567 to v3
```

Expand Down Expand Up @@ -150,7 +150,7 @@ INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/t
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v4/bag-info.txt
INFO:bagit:Verifying checksum for file /home/runner/work/ocfl-py/ocfl-py/tests/testdata/bags/uaa_v4/manifest-sha512.txt
INFO:root:Updated OCFL object info:bb123cd4567 by adding v4
### <ocfl.version_metadata.VersionMetadata object at 0x7f924be4a5f0>
### <ocfl.version_metadata.VersionMetadata object at 0x7fbd12f42b60>
Updated object info:bb123cd4567 to v4
```

Expand All @@ -164,8 +164,8 @@ Taking the newly created OCFL object `/tmp/obj` we can `--extract` the `v4` cont
INFO:root:Extracted v4 into tmp/extracted_v4
INFO:bagit:Creating bag for directory tmp/extracted_v4
INFO:bagit:Creating data directory
INFO:bagit:Moving my_content to tmp/extracted_v4/tmp6gveuavx/my_content
INFO:bagit:Moving tmp/extracted_v4/tmp6gveuavx to data
INFO:bagit:Moving my_content to tmp/extracted_v4/tmpw3baynt4/my_content
INFO:bagit:Moving tmp/extracted_v4/tmpw3baynt4 to data
INFO:bagit:Using 1 processes to generate manifests: sha512
INFO:bagit:Generating manifest lines for file data/my_content/dracula.txt
INFO:bagit:Generating manifest lines for file data/my_content/dunwich.txt
Expand All @@ -187,12 +187,12 @@ We note that the OCFL object had only one `content` file in `v4` but the extract
diff -r tmp/extracted_v4/bag-info.txt tests/testdata/bags/uaa_v4/bag-info.txt
1,2c1
< Bag-Software-Agent: bagit.py v1.8.1 <https://github.com/LibraryOfCongress/bagit-python>
< Bagging-Date: 2024-12-06
< Bagging-Date: 2024-12-12
---
> Bagging-Date: 2020-01-04
diff -r tmp/extracted_v4/tagmanifest-sha512.txt tests/testdata/bags/uaa_v4/tagmanifest-sha512.txt
2c2
< 7e23b308ac51b064e7471d7b8e5ba1f758891631ad8c8fb57799a39018d7d77e893a8236a608a8087117000c55efde9529cb76cdb63bacc5642b38ab459b30d5 bag-info.txt
< f4b54148ef84efafaaa8f062695a1bc07a1e876a15f6164f6e89d979f09c91baa5b1d76002b2bffb6193445b0642a0d6237731623d0ab4050bde5558fcbcca4e bag-info.txt
---
> 10624e6d45462def7af66d1a0d977606c7b073b01809c1d42258cfab5c34a275480943cbe78044416aee1f23822cc3762f92247b8f39b5c6ddc5ae32a8f94ce5 bag-info.txt
```
Expand Down
4 changes: 2 additions & 2 deletions docs/validation_status.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ The following tables show the implementation status of all errors and warnings i
| | E038a | OCFL Object %s inventory `type` attribute has wrong value (expected %s, got %s) \[[ocfl/inventory_validator.py#L134](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L134)\] |
| | E038b | OCFL Object %s inventory `type` attribute does not look like a valid specification URI (got %s), will proceed as if using version %s \[[ocfl/inventory_validator.py#L139](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L139)\] |
| | E038c | OCFL Object %s inventory `type` attribute has an unsupported specification version number (%s), will proceed as if using version %s \[[ocfl/inventory_validator.py#L144](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L144)\] |
| | E038d | OCFL Object %s inventory `type` attribute does not have a string value \[[ocfl/inventory_validator.py#L131](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L131)\] |
| [E039](https://ocfl.io/1.1/spec#E039) | '[digestAlgorithm] must be the algorithm used in the manifest and state blocks.' | _Not implemented_ |
| [E040](https://ocfl.io/1.1/spec#E040) | [head] must be the version directory name with the highest version number.' | OCFL Object %s inventory head attribute doesn't match versions (got %s, expected %s) \[[ocfl/inventory_validator.py#L183](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L183)\] |
| [E041](https://ocfl.io/1.1/spec#E041) | 'In addition to these keys, there must be two other blocks present, manifest and versions, which are discussed in the next two sections.' | _See multiple cases identified with suffixes below_ |
Expand Down Expand Up @@ -181,7 +182,6 @@ The following tables show the implementation status of all errors and warnings i
| [E110](https://ocfl.io/1.1/spec#E110) | 'A unique identifier for the OCFL Object MUST NOT change between versions of the same object.' | _Not implemented_ |
| [E111](https://ocfl.io/1.1/spec#E111) | 'If present, [the value of the fixity key] MUST be a JSON object, which may be empty.' | OCFL Object %s inventory includes a fixity key with value that isn't a JSON object \[_Not implemented_\] |
| [E112](https://ocfl.io/1.1/spec#E112) | 'The extensions directory must not contain any files or sub-directories other than extension sub-directories.' | _Not implemented_ |
| E999 | **Not in specification** | **Missing description** \[[ocfl/inventory_validator.py#L131](https://github.com/zimeon/ocfl-py/blob/main/ocfl/inventory_validator.py#L131)\] |

## Warnings

Expand All @@ -206,4 +206,4 @@ The following tables show the implementation status of all errors and warnings i
| [W016](https://ocfl.io/1.1/spec#W016) | 'In the Storage Root, extension sub-directories SHOULD be named according to a registered extension name.' | _Not implemented_ |
| W901 | **Not in specification** | OCFL Storage Root includes unregistered extension directory '%s' \[[ocfl/storage_root.py#L275](https://github.com/zimeon/ocfl-py/blob/main/ocfl/storage_root.py#L275)\] |

_Generated by `extract_codes.py` at 2024-12-06 00:46:27.116525_
_Generated by `extract_codes.py` at 2024-12-12 18:50:33.443332_
102 changes: 85 additions & 17 deletions ocfl/layout.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,47 +65,102 @@ def config_file(self):
def config(self):
"""Dictionary with config.json configuration for the layout extenstion.
Returns a dict with values based on the current attributes (to be
serialized with json.dump()), else None indicates that there is no
config.json this layout.
Dict values are based on the current attributes (as would be serialized
with json.dump()), else None indicates that there is no config.json
this layout.
"""
return None

def check_full_config(self):
"""Check full configuration in instance variables.
Trivial implementation that does nothing. It is intended that
sub-classes will override to do real checks if necessary. No
return value, raise a LayoutException on error.
"""
return

def strip_root(self, path, root):
"""Remove root from path, throw exception on failure."""
"""Remove root from path, throw exception on failure.
Arguments:
path (str): file path from which root will be stripped
root (str): root path that will be stripped from path, also
any leading path separator is removed.
Raises:
LayoutException: if the path is not within the given root and thus
root cannot be stripped from it
"""
root = root.rstrip(os.sep) # ditch any trailing path separator
if os.path.commonprefix((path, root)) == root:
return os.path.relpath(path, start=root)
raise LayoutException("Path %s is not in root %s" % (path, root))

def is_valid(self, identifier): # pylint: disable=unused-argument
"""Return True if identifier is valid, always True in this base implementation."""
"""Check validity of identifier for this layout.
Arguments:
identifier (str): identifier to check
Returns:
bool: True if valid, False otherwise. Always True in this base
implementation.
"""
return True

def encode(self, identifier):
"""Encode identifier to get rid of unsafe chars."""
"""Encode identifier to get rid of unsafe chars.
Arguments:
identifier (str): identifier to encode
Returns:
str: encoded identifier
"""
return quote_plus(identifier)

def decode(self, identifier):
"""Decode identifier to put back unsafe chars."""
"""Decode identifier to put back unsafe chars.
Arguments:
identifier (str): identifier to decode
Returns:
str: decoded identifier
"""
return unquote_plus(identifier)

def identifier_to_path(self, identifier):
"""Convert identifier to path relative to some root."""
"""Convert identifier to path relative to some root.
Arguments:
identifier (str): identifier to encode
Returns:
str: object path for this identifier
Raises:
LayoutException: if the identifer cannot be used to create an object
path. In this base implementation, an exception is always raised.
The method should be overridded with the same signature
"""
raise LayoutException("No yet implemented")

def read_layout_params(self, root_fs=None, params_required=False):
"""Look for and read and layout configuration parameters.
Arguments:
root_fs: the storage root fs object
params_required: if True then throw exception for params file not present
root_fs (str): the storage root fs object
params_required (bool): if True then throw exception for params file
not present
Returns None, sets instance data in accord with the configuration using
the methods in self.PARAMS to parse for each key.
Raises:
LayoutException: if the config can't be read or if required by
params_required but not present
Raises LayoutException if the config can't be read or if required by
params_required but not present.
Sets instance data in accord with the configuration using the methods
in self.PARAMS to parse for each key.
"""
config = None
logging.debug("Reading extension config file %s", self.config_file)
Expand All @@ -130,8 +185,15 @@ def check_and_set_layout_params(self, config, require_extension_name=True):
require_extension_name: boolean, True by default. If set False then
the extensionName paramater is not required
Raises:
LayoutException: if the extensionName is missig from the config, if
support for the named extension isn't implemented, or if there
is an error in the parameters or full configuration.
For each parameter that is recognized, the appropriate check and set
method in self.PARAMS is called. The methods set instance attributes.
Finally, the check_full_config method is called to check anything that
might required all of the configuration to be known.
"""
# Check the extensionName if required and/or specified
if "extensionName" not in config:
Expand All @@ -142,14 +204,20 @@ def check_and_set_layout_params(self, config, require_extension_name=True):
# Read and check the parameters (ignore any extra params)
for key, method in self.PARAMS.items():
method(config.get(key))
# Finally, check full config
self.check_full_config()

def write_layout_params(self, root_fs=None):
"""Write the config.json file with layout parameters if need for this layout.
Does nothing if there is no config.json content defined for this layout.
Arguments:
root_fs (str): the storage root fs object
Raises:
LayoutException: if there is an error trying to write the config.json
file, including if one already exists.
Raises a LayoutException if there is an error trying to write the config.json
file, including if one already exists.
Does nothing if there is no config.json content defined for this layout.
"""
config = self.config
if config is None:
Expand Down
15 changes: 14 additions & 1 deletion ocfl/layout_0002_flat_direct.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,20 @@ def __init__(self):
self.PARAMS = None # No parameters

def identifier_to_path(self, identifier):
"""Convert identifier to path relative to root."""
"""Convert identifier to path relative to root.
Argument:
identifier (str): object identifier
Returns:
str: object path for this layout
Raises:
LayoutException: if the identifier cannot be converted to a valid
object path. For the direct layout it is not allowd to have
identifiers that are blank, '.', '..' or include a filesystem
path separator
"""
if identifier in ("", ".", "..") or os.sep in identifier:
raise LayoutException("Identifier '%s' unsafe for %s layout" % (identifier, self.NAME))
return identifier
47 changes: 46 additions & 1 deletion ocfl/layout_0003_hash_and_id_n_tuple.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,19 @@ def check_digest_algorithm(self, value):
Type: string
Constraints: Must not be empty
Default: sha256
Argument:
value (str): digest algorithm name
Raises:
LayoutException: if the digest algorithm is not supported
Sets the digest_algorithm property of this object as a side effect.
"""
if value is None:
raise LayoutException("digestAlgorithm parameter must be specified")
try:
string_digest("aa", digest_type=value)
string_digest("dummy_data", digest_type=value)
except ValueError as e:
raise LayoutException("digestAlgorithm parameter specifies unknown or unsupported digests %s (%s)" % (value, str(e)))
self.digest_algorithm = value
Expand All @@ -91,6 +99,14 @@ def check_tuple_size(self, value):
Type: number
Constraints: An integer between 0 and 32 inclusive
Default: 3
Argument:
value (int): integer value for tuple size in characters
Raises:
LayoutException: if the tuple size is not allowed
Sets the tuple_size property of this object as a side effect.
"""
if value is None:
raise LayoutException("tupleSize parameter must be specified")
Expand All @@ -107,13 +123,42 @@ def check_number_of_tuples(self, value):
Type: number
Constraints: An integer between 0 and 32 inclusive
Default: 3
Argument:
value (int): integer value for number of tuples
Raises:
LayoutException: if the number of tuples is not allowed
Sets the number_of_tuples property of this object as a side effect.
"""
if value is None:
raise LayoutException("numberOfTuples parameter must be specified")
if not isinstance(value, int) or value < 0 or value > 32:
raise LayoutException("numberOfTuples parameter must be aninteger between 0 and 32 inclusive")
self.number_of_tuples = value

def check_full_config(self):
"""Check combined configuration parameters.
From extension:
If tupleSize is set to 0, then no tuples are created and numberOfTuples
MUST also equal 0.
The product of tupleSize and numberOfTuples MUST be less than or equal
to the number of characters in the hex encoded digest.
Raises:
LayoutException: in the case that there is an error.
"""
# Both zero if one zero
if ((self.tuple_size == 0 and self.number_of_tuples != 0)
or (self.tuple_size != 0 and self.number_of_tuples == 0)):
raise LayoutException("Bad layout configuration: If tupleSize is set to 0, then numberOfTuples MUST also equal 0.")
# Enough chars in digest
n = len(string_digest("dummy_data", digest_type=self.digest_algorithm))
if self.tuple_size * self.number_of_tuples > n:
raise LayoutException("Bad layout configuration: The product of tupleSize and numberOfTuples MUST be less than or equal to the number of characters in the hex encoded digest.")

@property
def config(self):
"""Dictionary with config.json configuration for the layout extenstion."""
Expand Down
21 changes: 19 additions & 2 deletions ocfl/layout_nnnn_flat_quoted.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""Layout_NNNN_Flat_Quoted mapping of identifier to directory structure."""
from .layout import Layout
from .layout import Layout, LayoutException


class Layout_NNNN_Flat_Quoted(Layout):
Expand All @@ -13,5 +13,22 @@ def __init__(self):
self.PARAMS = None # No parameters

def identifier_to_path(self, identifier):
"""Convert identifier to path relative to root."""
"""Convert identifier to path relative to root.
Argument:
identifier (str): object identifier
Returns:
str: object path for this layout
Raises:
LayoutException: if the identifier cannot be converted to a valid
object path. Currently just a check for blank
Uses Layout.encode() to generate a safe directory name from any
identifier. Length is not checked but could cause operating system
errors.
"""
if identifier == "":
raise LayoutException("Identifier '%s' unsafe for %s layout" % (identifier, self.NAME))
return self.encode(identifier)
Loading

0 comments on commit 27ecfd7

Please sign in to comment.