Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification #133

Closed
wants to merge 81 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
f44b925
Adds verification cli
domna Jun 14, 2023
4739463
Simple working verification
domna Jun 22, 2023
de27590
Don't replace non-variadic group names
domna Jun 22, 2023
712dbd4
Happyfy linting
domna Jun 22, 2023
0ae9671
Adds support for bytes NX_class attributes
domna Jun 22, 2023
09d3b4b
Autoformatting
domna Jun 22, 2023
43c65a9
Cleanup
domna Jul 4, 2023
fcd1c43
Adds nexus unit registry
domna Jul 5, 2023
4200c71
Fixes linting
domna Jul 5, 2023
16b070f
Sets defs to latest fairmat
domna Jul 5, 2023
1671a13
Adds basic unit check
domna Jul 5, 2023
20efda4
Check general validity of units
domna Jul 5, 2023
cca87bc
Resolve also parents for units
domna Jul 5, 2023
9dfbd03
Merge commit '0c69581b014d0ef7a65e54e9cc8a2e25916c26c8' into verifica…
domna Feb 5, 2024
5c2dd4e
autoformat
domna Feb 5, 2024
2f520cc
Merge commit '8bd900e8c520dacc67ef7b644d29dba1d5fe221e' into verifica…
domna Feb 5, 2024
529331f
Adds missing import
domna Feb 5, 2024
7c95311
Merge branch 'master' into verification
domna Feb 5, 2024
33cb7fd
Merge branch 'master' into verification
domna Feb 5, 2024
13e2670
Update to latest definitions
domna Feb 5, 2024
7413bae
Allow more genaral uppercase notation in nx_namefit
domna Feb 7, 2024
9ccb7b4
Add proper unit retrieval in validation
domna Feb 7, 2024
0c6fb6c
Lower debug level
domna Feb 7, 2024
e2a167a
Add counts to units
domna Feb 7, 2024
199024a
Fix namefitting
domna Feb 7, 2024
8f8df03
Adds support for NX_TRANSFORMATION
domna Feb 7, 2024
c44f5b8
Fix units in example data and tests
domna Feb 8, 2024
42904b9
Fix NOT IN SCHEMA for mpes example
domna Feb 8, 2024
cac78c6
Fix uppercase attribute namefit
domna Feb 8, 2024
06190b7
Keep uppercase parts in hdf names
domna Feb 9, 2024
49a7e1f
Fix upper/lower notation for example
domna Feb 9, 2024
11892d4
Re-enable empty-required-field test
domna Feb 9, 2024
4105ba5
don't use removeprefix
domna Feb 9, 2024
2d352cf
Fix empty-required-field test
domna Feb 9, 2024
7b1ec45
Properly check error logs
domna Feb 9, 2024
7c449cb
Catch errors for validate data dict
domna Feb 9, 2024
9312b3e
Fix required lone group in template
domna Feb 9, 2024
53c1849
Removes unecessary function
domna Feb 9, 2024
16e2d37
Adds proper uppercase matching to path in data dict check
domna Feb 9, 2024
45ee476
Cleans unit attributes
domna Feb 9, 2024
a139a40
Fix typing
domna Feb 12, 2024
f98994a
Fix local linting
domna Feb 12, 2024
e9ecd30
Update definitions
domna Feb 12, 2024
9a98967
Update nexus version file
domna Feb 12, 2024
c4ef94f
Updates generated eln file
domna Feb 12, 2024
1cf6a50
Updates reference files
domna Feb 12, 2024
00b7637
Do file checks in verification cli
domna Feb 12, 2024
52fca4e
Don't fail if definition is not present
domna Feb 12, 2024
08a3b81
Updates definitions
domna Feb 12, 2024
6d11276
Merge branch 'master' into verification
domna Feb 23, 2024
d4655fe
Add required under optional in group
domna Apr 18, 2024
1755347
rename to field
domna Apr 18, 2024
debcf51
Fix other tests
domna Apr 18, 2024
d632535
Check required field provided
domna Apr 18, 2024
b4686fc
Fix all_required_children_are_set
domna Apr 19, 2024
d761f37
Merge branch 'master' into fix-required-under-optional
domna Apr 19, 2024
10b1c44
Fix tests
domna Apr 19, 2024
d4dc235
Use if checks instead of try..except
domna Apr 23, 2024
1c68848
Add routine to check required fields for repeating groups
domna Apr 24, 2024
feb973e
Delete temporary file
domna Apr 24, 2024
17eb061
Fix path in data dict test
domna Apr 24, 2024
d83a6b8
Fix tests
domna Apr 24, 2024
b3a0f1b
Cleanup
domna Apr 24, 2024
0c9a0f4
Remove debugging line
domna Apr 24, 2024
dcb4d9b
Add collector class
domna Apr 24, 2024
56bf3a6
Remove commented lines
domna Apr 24, 2024
a0ae259
Check validation return type and logging
domna Apr 24, 2024
46f122b
Add tests for repeating groups
domna Apr 24, 2024
2b0144f
Fix report of variadic groups set to all None
domna Apr 25, 2024
9e29d5d
Merge branch 'master' into verification
domna Apr 25, 2024
59106f2
Merge branch 'fix-required-under-optional' into verification
domna Apr 25, 2024
617d86f
Add validity report at the end
domna Apr 25, 2024
bd98fad
Add validation logging for units
domna Apr 25, 2024
2529789
Fixes undocumented units and reporting of all none required groups
domna Apr 26, 2024
f7a64db
Use dict paths everywhere
domna Apr 26, 2024
59b1798
Add pint to dependencies
domna Apr 26, 2024
e6dad7c
Catch and log undefined units
domna Apr 26, 2024
7734256
Add unit checks for nx transformations
domna Apr 26, 2024
b55ebcc
Log wrong transformation_type
domna Apr 26, 2024
1261e6a
Merge branch 'master' into verification
domna Apr 26, 2024
485ffd9
Renaming
domna Apr 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,8 @@ cython_debug/
*.txt
!requirements.txt
!dev-requirements.txt
!pynxtools/dataconverter/units/default_en.txt
!pynxtools/dataconverter/units/constants_en.txt
!mkdocs-requirements.txt
!pynxtools/nexus-version.txt
build/
Expand Down
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ recursive-include pynxtools/definitions/base_classes/ *.xml
recursive-include pynxtools/definitions/applications/ *.xml
recursive-include pynxtools/definitions/contributed_definitions/ *.xml
include pynxtools/definitions/*.xsd
include pynxtools/dataconverter/units *.txt
include pynxtools/nexus-version.txt
include pynxtools/definitions/NXDL_VERSION
include pynxtools/definitions/NXDL_VERSION
18 changes: 16 additions & 2 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --extra=dev --extra=docs --output-file=dev-requirements.txt pyproject.toml
Expand Down Expand Up @@ -34,6 +34,8 @@ cycler==0.12.1
# via matplotlib
distlib==0.3.8
# via virtualenv
exceptiongroup==1.2.1
# via pytest
filelock==3.13.3
# via virtualenv
fonttools==4.50.0
Expand Down Expand Up @@ -120,6 +122,8 @@ pathspec==0.12.1
# via mkdocs
pillow==10.2.0
# via matplotlib
pint==0.23
# via pynxtools (pyproject.toml)
pip-tools==7.4.1
# via pynxtools (pyproject.toml)
platformdirs==4.2.0
Expand Down Expand Up @@ -181,14 +185,24 @@ structlog==24.1.0
# via pynxtools (pyproject.toml)
termcolor==2.4.0
# via mkdocs-macros-plugin
tomli==2.0.1
# via
# build
# coverage
# mypy
# pip-tools
# pyproject-hooks
# pytest
types-pytz==2024.1.0.20240203
# via pynxtools (pyproject.toml)
types-pyyaml==6.0.12.20240311
# via pynxtools (pyproject.toml)
types-requests==2.31.0.20240311
# via pynxtools (pyproject.toml)
typing-extensions==4.10.0
# via mypy
# via
# mypy
# pint
tzdata==2024.1
# via pandas
urllib3==2.2.1
Expand Down
4 changes: 2 additions & 2 deletions pynxtools/dataconverter/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
from pynxtools.dataconverter.readers.base.reader import BaseReader
from pynxtools.dataconverter.template import Template
from pynxtools.dataconverter.writer import Writer
from pynxtools.nexus import nexus
from pynxtools.definitions.dev_tools.utils.nxdl_utils import get_nexus_definitions_path

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Expand Down Expand Up @@ -120,7 +120,7 @@ def get_nxdl_root_and_path(nxdl: str):
Error if no file with the given nxdl name is found.
"""
# Reading in the NXDL and generating a template
definitions_path = nexus.get_nexus_definitions_path()
definitions_path = get_nexus_definitions_path()
if nxdl == "NXtest":
nxdl_f_path = os.path.join(
f"{os.path.abspath(os.path.dirname(__file__))}/../../",
Expand Down
151 changes: 122 additions & 29 deletions pynxtools/dataconverter/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,21 @@
import lxml.etree as ET
import numpy as np
from ase.data import chemical_symbols
from pint import UndefinedUnitError

import pynxtools.definitions.dev_tools.utils.nxdl_utils as nexus
from pynxtools import get_nexus_version, get_nexus_version_hash
from pynxtools.dataconverter.template import Template
from pynxtools.definitions.dev_tools.utils.nxdl_utils import get_inherited_nodes
from pynxtools.nexus import nexus
from pynxtools.nexus.nexus import NxdlAttributeNotFoundError
from pynxtools.dataconverter.units import ureg
from pynxtools.definitions.dev_tools.utils.nxdl_utils import (
NxdlAttributeNotFoundError,
get_enums,
get_inherited_nodes,
get_node_at_nxdl_path,
)
from pynxtools.definitions.dev_tools.utils.nxdl_utils import (
get_required_string as nexus_get_required_string,
)

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Expand All @@ -50,6 +59,8 @@ class ValidationProblem(Enum):
InvalidType = 7
InvalidDatetime = 8
IsNotPosInt = 9
InvalidUnit = 10
InvalidTransformationType = 11


class Collector:
Expand Down Expand Up @@ -109,6 +120,16 @@ def insert_and_log(
logger.warning(
f"The value at {path} should be a positive int, but is {value}."
)
elif log_type == ValidationProblem.InvalidUnit:
logger.warning(
f"Invalid unit in {path}. {value} "
f"is not in unit category {args[0] if args else '<unknown>'}"
)
elif log_type == ValidationProblem.InvalidTransformationType:
logger.warning(
f"Invalid transformation type in {path}: {value}. "
"Should be either not present or have the value 'translation' or 'rotation'."
)
self.data.add(path)

def has_validation_problems(self):
Expand Down Expand Up @@ -195,7 +216,7 @@ def get_all_defined_required_children(nxdl_path, nxdl_name):
if nxdl_name == "NXtest":
return []

elist = nexus.get_inherited_nodes(nxdl_path, nx_name=nxdl_name)[2]
elist = get_inherited_nodes(nxdl_path, nx_name=nxdl_name)[2]
list_of_children_to_add = set()
for elem in elist:
list_of_children_to_add.update(get_all_defined_required_children_for_elem(elem))
Expand Down Expand Up @@ -298,7 +319,7 @@ def generate_template_from_nxdl(

def get_required_string(elem):
"""Helper function to return nicely formatted names for optionality."""
return nexus.get_required_string(elem)[2:-2].lower()
return nexus_get_required_string(elem)[2:-2].lower()


def convert_nexus_to_caps(nexus_name):
Expand Down Expand Up @@ -369,7 +390,7 @@ def convert_data_dict_path_to_hdf5_path(path) -> str:
def is_value_valid_element_of_enum(value, elist) -> Tuple[bool, list]:
"""Checks whether a value has to be specific from the NXDL enumeration and returns options."""
for elem in elist:
enums = nexus.get_enums(elem)
enums = get_enums(elem)
if enums is not None:
return value in enums, enums
return True, []
Expand Down Expand Up @@ -448,6 +469,25 @@ def convert_str_to_bool_safe(value):
return None


def clean_str_attr(
attr: Optional[Union[str, bytes]], encoding="utf-8"
) -> Optional[str]:
"""
Cleans the string attribute which means it will decode bytes to str if necessary.
If `attr` is not str, bytes or None it raises a TypeError.
"""
if attr is None:
return attr
if isinstance(attr, bytes):
return attr.decode(encoding)
if isinstance(attr, str):
return attr

raise TypeError(
"Invalid type {type} for attribute. Should be either None, bytes or str."
)


def is_valid_data_field(value, nxdl_type, path):
"""Checks whether a given value is valid according to what is defined in the NXDL.

Expand Down Expand Up @@ -487,6 +527,46 @@ def is_valid_data_field(value, nxdl_type, path):
return value


def is_valid_unit(
unit: str, nx_category: str, transformation_type: Optional[str]
) -> bool:
"""
The provided unit belongs to the provided nexus unit category.

Args:
unit (str): The unit to check. Should be according to pint.
nx_category (str): A nexus unit category, e.g. `NX_LENGTH`,
or derived unit category, e.g., `NX_LENGTH ** 2`.
transformation_type (Optional[str]):
The transformation type of an NX_TRANSFORMATION.
This parameter is ignored if the `nx_category` is not `NX_TRANSFORMATION`.
If `transformation_type` is not present this should be set to None.

Returns:
bool: The unit belongs to the provided category
"""
unit = clean_str_attr(unit)
try:
if nx_category in ("NX_ANY"):
ureg(unit) # Check if unit is generally valid
return True
nx_category = re.sub(r"(NX_[A-Z]+)", r"[\1]", nx_category)
if nx_category == "[NX_TRANSFORMATION]":
# NX_TRANSFORMATIONS is a pseudo unit
# and can be either an angle, a length or unitless
# depending on the transformation type.
if transformation_type is None:
return ureg(unit).check("[NX_UNITLESS]")
if transformation_type == "translation":
return ureg(unit).check("[NX_LENGTH]")
if transformation_type == "rotation":
return ureg(unit).check("[NX_ANGLE]")
return False
return ureg(unit).check(f"{nx_category}")
except UndefinedUnitError:
return False


@lru_cache(maxsize=None)
def path_in_data_dict(nxdl_path: str, data_keys: Tuple[str, ...]) -> List[str]:
"""Checks if there is an accepted variation of path in the dictionary & returns the path."""
Expand All @@ -505,9 +585,9 @@ def check_for_optional_parent(path: str, nxdl_root: ET.Element) -> str:
return "<<NOT_FOUND>>"

parent_nxdl_path = convert_data_converter_dict_to_nxdl_path(parent_path)
elem = nexus.get_node_at_nxdl_path(nxdl_path=parent_nxdl_path, elem=nxdl_root)
elem = get_node_at_nxdl_path(nxdl_path=parent_nxdl_path, elem=nxdl_root)

if nexus.get_required_string(elem) in ("<<OPTIONAL>>", "<<RECOMMENDED>>"):
if get_required_string(elem) in ("<<OPTIONAL>>", "<<RECOMMENDED>>"):
return parent_path

return check_for_optional_parent(parent_path, nxdl_root)
Expand All @@ -522,8 +602,8 @@ def is_node_required(nxdl_key, nxdl_root):
nxdl_key[0 : nxdl_key.rindex("/") + 1]
+ nxdl_key[nxdl_key.rindex("/") + 2 :]
)
node = nexus.get_node_at_nxdl_path(nxdl_key, elem=nxdl_root, exc=False)
return nexus.get_required_string(node) == "<<REQUIRED>>"
node = get_node_at_nxdl_path(nxdl_key, elem=nxdl_root, exc=False)
return get_required_string(node) == "<<REQUIRED>>"


def all_required_children_are_set(optional_parent_path, data, nxdl_root):
Expand Down Expand Up @@ -753,7 +833,7 @@ def try_undocumented(data, nxdl_root: ET.Element):
field_path = path.rsplit("/", 1)[0]
if field_path in data.get_documented() and path in data.undocumented:
field_requiredness = get_required_string(
nexus.get_node_at_nxdl_path(
get_node_at_nxdl_path(
nxdl_path=convert_data_converter_dict_to_nxdl_path(field_path),
elem=nxdl_root,
)
Expand All @@ -767,7 +847,7 @@ def try_undocumented(data, nxdl_root: ET.Element):
nxdl_path = nxdl_path[0:index_of_at] + nxdl_path[index_of_at + 1 :]

try:
elem = nexus.get_node_at_nxdl_path(nxdl_path=nxdl_path, elem=nxdl_root)
elem = get_node_at_nxdl_path(nxdl_path=nxdl_path, elem=nxdl_root)
optionality = get_required_string(elem)
data[optionality][path] = data.undocumented[path]
del data.undocumented[path]
Expand All @@ -786,7 +866,7 @@ def validate_data_dict(template, data, nxdl_root: ET.Element):

@lru_cache(maxsize=None)
def get_xml_node(nxdl_path: str) -> ET.Element:
return nexus.get_node_at_nxdl_path(nxdl_path=nxdl_path, elem=nxdl_root)
return get_node_at_nxdl_path(nxdl_path=nxdl_path, elem=nxdl_root)

# Make sure all required fields exist.
ensure_all_required_fields_exist(template, data, nxdl_root)
Expand Down Expand Up @@ -814,21 +894,33 @@ def get_xml_node(nxdl_path: str) -> ET.Element:
)
continue

# TODO: If we want we could also enable unit validation here
# field = nexus.get_node_at_nxdl_path(
# nxdl_path=convert_data_converter_dict_to_nxdl_path(
# # The part below is the backwards compatible version of
# # nxdl_path.removesuffix("/units")
# nxdl_path[:-6] if nxdl_path.endswith("/units") else nxdl_path
# ),
# elem=nxdl_root,
# )
# nxdl_unit = field.attrib.get("units", "")
# if not is_valid_unit(data[path], nxdl_unit):
# raise ValueError(
# f"Invalid unit in {path}. {data[path]} "
# f"is not in unit category {nxdl_unit}"
# )
field = get_node_at_nxdl_path(
nxdl_path=convert_data_converter_dict_to_nxdl_path(
# The part below is the backwards compatible version of
# nxdl_path.removesuffix("/units")
nxdl_path[:-6] if nxdl_path.endswith("/units") else nxdl_path
),
elem=nxdl_root,
)
nxdl_unit = field.attrib.get("units", "")
transformation_type = (
field.attrib.get("transformation_type")
if nxdl_unit == "[NX_TRANSFORMATION]"
else None
)
if not is_valid_unit(data[path], nxdl_unit, transformation_type):
if transformation_type is not None and transformation_type not in (
"rotation",
"translation",
):
collector.insert_and_log(
path,
ValidationProblem.InvalidTransformationType,
transformation_type,
)
collector.insert_and_log(
path, ValidationProblem.InvalidUnit, data[path], nxdl_unit
)
continue

elem = get_xml_node(nxdl_path)
Expand All @@ -851,7 +943,7 @@ def get_xml_node(nxdl_path: str) -> ET.Element:
else "NXDL_TYPE_UNAVAILABLE"
)
data[path] = is_valid_data_field(data[path], nxdl_type, path)
elist = nexus.get_inherited_nodes(
elist = get_inherited_nodes(
nxdl_path, path.rsplit("/", 1)[-1], nxdl_root
)[2]
is_valid_enum, enums = is_value_valid_element_of_enum(data[path], elist)
Expand Down Expand Up @@ -934,6 +1026,7 @@ def update_and_warn(key: str, value: str):
f"blob/{get_nexus_version_hash()}",
)
update_and_warn("/@NeXus_version", get_nexus_version())
# pylint: disable=c-extension-no-member
update_and_warn("/@HDF5_version", ".".join(map(str, h5py.h5.get_libversion())))
update_and_warn("/@h5py_version", h5py.__version__)

Expand Down
1 change: 1 addition & 0 deletions pynxtools/dataconverter/readers/ellips/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ def read(
# MK:: Carola, Ron, Flo, Tamas, Sandor refactor the following line
template[f"/ENTRY[entry]/plot/DATA[{key}_errors]/@units"] = "degree"

template["/ENTRY[entry]/data_collection/measured_data/@units"] = ""
# Define default plot showing Psi and Delta at all angles:
template["/@default"] = "entry"
template["/ENTRY[entry]/@default"] = "plot"
Expand Down
2 changes: 1 addition & 1 deletion pynxtools/dataconverter/readers/json_map/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Example:

```json
"/ENTRY[entry]/DATA[data]/current_295C": "/entry/data/current_295C",
"/ENTRY[entry]/NXODD_name/posint_value": "/a_level_down/another_level_down/posint_value",
"/ENTRY[entry]/NXODD_name[odd_name]/posint_value": "/a_level_down/another_level_down/posint_value",
```

* Write the values directly in the mapping file for missing data from your data file.
Expand Down
23 changes: 23 additions & 0 deletions pynxtools/dataconverter/units/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#
# Copyright The NOMAD Authors.
#
# This file is part of NOMAD. See https://nomad-lab.eu for further info.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
"""A unit registry for nexus units"""

import os
from pint import UnitRegistry

ureg = UnitRegistry(os.path.join(os.path.dirname(__file__), "default_en.txt"))
Loading
Loading