Skip to content

Commit

Permalink
review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
lukaspie committed Aug 27, 2024
1 parent cf9691b commit b1f8fdc
Show file tree
Hide file tree
Showing 9 changed files with 64 additions and 67 deletions.
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,14 @@ It allows to develop ontologies and to create ontological instances based on the

# Scope

`pynxtools` (previously called `nexusutils`) is intended as a parser for combining various instrument output formats and electronic lab notebook (ELN) formats to an hdf5 file according to NeXus application definitions.
`pynxtools` is a parser for combining various instrument output formats and electronic lab notebook (ELN) formats into an [HDF5](https://support.hdfgroup.org/HDF5/) file according to NeXus application definitions.

Additionally, the software can be used as a plugin in the research data management system NOMAD for
making experimental data searchable and publishable.
NOMAD is developed by the FAIRMAT consortium, as a part of the German National Research Data Infrastructure
(NFDI).
making experimental data searchable and publishable. NOMAD is developed by the FAIRmat consortium which is a consortium of the German National Research Data Infrastructure (NFDI).

# Installation

It is recommended to use python 3.10 with a dedicated virtual environment for this package.
It is recommended to use python 3.11 with a dedicated virtual environment for this package.
Learn how to manage [python versions](https://github.com/pyenv/pyenv) and
[virtual environments](https://realpython.com/python-virtual-environments-a-primer/).

Expand All @@ -42,19 +40,20 @@ Documentation can be found [here](https://fairmat-nfdi.github.io/pynxtools/).

# Repository structure

The software tools are located inside [`src/pynxtools`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/src/pynxtools) and they are shipped with unit tests located in [`tests`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/tests).
Some examples with real datasets are provided in [`examples`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/examples). They guide you through the process of converting instrument raw data into the NeXus standard and visualising the files' content.
The software tools are located inside [`src/pynxtools`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/src/pynxtools). They are shipped with unit tests located in [`tests`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/tests).
Some examples from the scientific community are provided in [`examples`](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/examples). They guide you through the process of converting instrument data into the NeXus standard and visualising the files' content.

# NOMAD integration

To use pynxtools with NOMAD, simply install it in the same environment as the `nomad-lab` package.
NOMAD will recognize pynxtools as a plugin automatically and offer automatic parsing of `.nxs` files
and a schema for NeXus application definitions.
pynxtools is already included in the NOMAD main deployment and NOMAD NeXus distribution images.
## Does this software require NOMAD or NOMAD OASIS ?

No. The data files produced here can be uploaded to NOMAD. Therefore, this tool acts as the framework to design schemas and instances of data within the NeXus universe. It can, however, be used as a NOMAD plugin to parse nexus files, please see the section below for details.

### Does this software require NOMAD or NOMAD OASIS ?
## How to use pynxtools with NOMAD

No. The data files produced here can be uploaded to NOMAD. Therefore, this acts like the framework to design schemas and instances of data within the NeXus universe. It can, however, be used as a NOMAD plugin to parse nexus files, please see the section above for details.
To use pynxtools with NOMAD, simply install it in the same environment as the `nomad-lab` package.
NOMAD will recognize pynxtools as a plugin automatically and offer automatic parsing of `.nxs` files. In addition, NOMAD will install a schema for NeXus application definitions.
By default, `pynxtools` is already included in the NOMAD [production]https://nomad-lab.eu/prod/v1/gui/ and [staging](https://nomad-lab.eu/prod/v1/staging/gui/) deployments.

# Contributing

Expand Down Expand Up @@ -94,7 +93,7 @@ python -m pytest -sv tests
## Run examples

A number of examples exist which document how the tools can be used. For a standalone
usage convenient jupyter notebooks are available for each tool. To use them, jupyter
usage convenient jupyter notebooks are available for each tool. To use these notebooks, jupyter
and related tools have to be installed in the development environment as follows:

```shell
Expand Down
17 changes: 8 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ hide: toc
# FAIRmat NeXus documentation

<!-- A single sentence that says what the product is, succinctly and memorably -->
Within [FAIRMat](https://www.fairmat-nfdi.eu/fairmat/), we are extending the [NeXus data format standard](https://www.nexusformat.org/) to support the FAIR data principles for experimental data in materials science and and phyics. This is the documentation for both our contribution to the NeXus standard as well as for our tools for data conversion and verification.
Within [FAIRmat](https://www.fairmat-nfdi.eu/fairmat/), we are extending the [NeXus data format standard](https://www.nexusformat.org/) to support the FAIR data principles for experimental data in materials science (covering solid-state physics and the chemical physics of solids, as well as materials engineering). This is the documentation for both our contribution to the NeXus standard and for our tools for data conversion and verification.

<!-- A paragraph of one to three short sentences, that describe what the product does. -->
`pynxtools`, which is the main tool under development, provides a dataconverter that maps from experimental data to the NeXus format as well as tools to verify NeXus files. It is intended as a parser for combining various instrument output formats and electronic lab notebook (ELN) formats to an HDF5 file according to NeXus application definitions.
`pynxtools`, the main tool under development, provides a data converter that maps experimental data and metadata to the NeXus format, performing parsing, normalization, visualization, and ontology matching. It combines various instrument output formats and electronic lab notebook (ELN) formats to an HDF5 file according to NeXus application definitions. In addition, `pynxtools` can be used to validate and verify NeXus files.

<!-- A third paragraph of similar length, this time explaining what need the product meets -->
`pynxtools` offers scientists a convenient way to use the NeXus format and solves the challenge of unstructured and non-standardized data in experimental materials science.
`pynxtools` offers scientists a convenient way to use the NeXus format and solves the challenge of unstructured and non-standardized data in experimental materials science. We consider this package useful for meeting the following FAIR principle as defined in [FAIR Principles: Interpretations and Implementation Considerations](https://direct.mit.edu/dint/article/2/1-2/10/10017/FAIR-Principles-Interpretations-and-Implementation): F2-4, I2-I3, and R1.

<!-- Finally, a paragraph that describes whom the product is useful for. -->
The new contribution to the standard, together with the tools provided through `pynxtools`, enable scientists and research groups working with data, as well as helping communities implement standardized FAIR research data.
FAIRmat's contribution to the existing NeXus standard, together with the tools provided through `pynxtools`, enable scientists and research groups working with data, as well as helping communities implement standardized FAIR research data.

Additionally, the software is used as a plugin in the research data management system [NOMAD](https://nomad-lab.eu/nomad-lab/) for making experimental data searchable and publishable. NOMAD is developed by the FAIRMAT consortium, as a part of the German National Research Data Infrastructure (NFDI).

Expand Down Expand Up @@ -58,7 +58,7 @@ How-to guides provide step-by-step instructions for a wide range of tasks.
#### pynxtools

- [Data conversion in `pynxtools`](learn/dataconverter-and-readers.md)
- [NeXus verification in `pynxtools`](learn/nexus-verification.md)
- [Validation of NeXus files](learn/nexus-validation.md)
- [The MultiFormatReader as a reader superclass](learn/multi-format-reader.md)

</div>
Expand All @@ -74,13 +74,12 @@ Or go directly to the [official NIAC](https://manual.nexusformat.org/classes/ind

#### pynxtools

`pynxtools` has a number of command line tools that can be used to convert data and verify NeXus files. You can more information about the
API [here](reference/cli-api.md).
`pynxtools` has a number of command line tools that can be used to convert data and verify NeXus files. You can find more information about the API [here](reference/cli-api.md).

Within FAIRmat, we maintain a number of pynxtools readers as well as reader plugins for different experimental techniques. Here you can find more information:
Within FAIRmat, we maintain a number of generic built-in pynxtools readers, together with reader plugins for different experimental techniques. Here you can find more information:

- [Built-in pynxtools readers](reference/built-in-readers.md)
- [FAIRMat-suppored pynxtools plugins](reference/plugins.md)
- [FAIRMat-supported pynxtools plugins](reference/plugins.md)


</div>
Expand Down
35 changes: 17 additions & 18 deletions docs/learn/dataconverter-and-readers.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Data conversion in pynxtools
One of the main motivations for pynxtools is to develop a tool for combining various instrument output formats and electronic lab notebook (ELN) into an [HDF5](https://support.hdfgroup.org/HDF5/) file according to [NeXus application definitions](https://fairmat-nfdi.github.io/nexus_definitions/classes/index.html).
One of the main motivations for pynxtools is to develop a tool for combining various instrument output formats and electronic lab notebook (ELN) into a file according to [NeXus application definitions](https://fairmat-nfdi.github.io/nexus_definitions/classes/index.html).

The `dataconverter` API in pynxtools provides exactly that: it converts experimental data to NeXus/HDF5 files based on any provided [NXDL schemas](https://manual.nexusformat.org/nxdl.html#index-1).
The `dataconverter` API in pynxtools provides exactly that: it converts experimental as well as simulation data, together with the results from analysis of such data, to NeXus files based on any provided [NXDL schemas](https://manual.nexusformat.org/nxdl.html#index-1). Here, we are using [HDF5](https://support.hdfgroup.org/HDF5/) as the serialization format.

The dataconverter has essentially three functionalities:
The dataconverter currently has essentially three functionalities:

1. read in experimental data using ```readers```
2. validate the data and metadata against any of the NeXus application definitions
3. write a valid NeXus/HDF5 file
1. Read in experimental data using ```readers```
2. Validate the data and metadata against a NeXus application definition of choice (i.e., check that the output data matches all existence, shape, and format constraints of application definition)
3. Write a valid NeXus/HDF5 file

For step 1, a set of readers has been which the converter calls to accomplish this task for a specific set of application definition (NXDL XML file) plus a set of experiment/method-specific file(s). These files can be files in a proprietary format, or of a certain format used in the respective scientific community, or text files. Only in combination, these files hold all the required pieces of information which the application definition demands and which are thus required to make a NeXus/HDF5 file compliant. Users can store additional pieces of information in an NeXus/HDF5 file. In this case readers will issue a warning that these data are not properly documented from the perspective of NeXus.
A set of readers has been developed which the converter calls to read in a set of experiment/method-specific file(s) and for a specific set of application definitions (NXDL XML file). These data files can be in a proprietary format, or of a certain format used in the respective scientific community, or text files. Only in combination, these files hold all the required pieces of information which the application definition demands and which are thus required to make a NeXus/HDF5 file compliant. Users can store additional pieces of information in an NeXus/HDF5 file. In this case readers will issue a warning that these data are not properly documented from the perspective of NeXus.

There exists two different subsets of readers:

Expand All @@ -18,12 +18,11 @@ There exists two different subsets of readers:

## Matching to NeXus application definitions

The purpose of the dataconverter is to create NeXus/HDF5 files with content that matches a specific NeXus application definition. Such application definitions are useful for collecting a set of pieces of information about a specific experiment in a given scientific field. The pieces of information are metadata and numerical data. The application definition is used to provide these data in a format that serves a data delivery contract: The HDF5 file, or so-called NeXus file, delivers all those pieces of information which the application definition specifies. Required and optional pieces of information are distinguished. NeXus classes can recommend the inclusion of certain pieces of information. Recommended data are essentially optional. The idea is that flagging these data as recommended motivates users to collect them but does not require to write dummy
or nonsense data if the user is unable to collect recommended data.
The purpose of the dataconverter is to create NeXus/HDF5 files with content that matches a specific NeXus application definition. Such application definitions are useful for collecting a set of pieces of information about a specific experiment in a given scientific field. The pieces of information are numerical and categorical (meta)data. The application definition is used to provide these data in a format that serves a data delivery contract: The HDF5 file, or so-called NeXus file, delivers all those pieces of information which the application definition specifies. Required and optional pieces of information are distinguished. NeXus classes can recommend the inclusion of certain pieces of information. Recommended data are essentially optional. The idea is that flagging these data as recommended motivates users to collect these, but does not require to write dummy or nonsense data if the recommended data is not available.

## Getting started

Each of the built-in reader comes with the main `pynxtools` package, therefore they are avaible after pip installation:
Each of the built-in readers comes with the main `pynxtools` package. Hence, they can be used after after pip installation:
```console
user@box:~$ pip install pynxtools
```
Expand All @@ -39,13 +38,15 @@ In addition, it is also possible to install all of the pynxtools reader plugins
pip install pynxtools[convert]
```

Note that in this case, the latest version of the plugin from PyPI is installed.

## Usage
See [here](../reference/cli-api.md#data-conversion) for the documentation of the `dataconverter` API.

### Use with multiple input files

```console
user@box:~$ dataconverter --nxdl nxdl metadata data.raw otherfile
user@box:~$ dataconverter metadata data.raw otherfile --nxdl nxdl --reader <reader-name>
```

### Merge partial NeXus files into one
Expand All @@ -54,7 +55,7 @@ user@box:~$ dataconverter --nxdl nxdl metadata data.raw otherfile
user@box:~$ dataconverter --nxdl nxdl partial1.nxs partial2.nxs
```

### Map an HDF5/JSON/(Python Dict pickled in a pickle file)
### Map an HDF5 file/JSON file

```console
user@box:~$ dataconverter --nxdl nxdl any_data.hdf5 --mapping my_custom_map.mapping.json
Expand All @@ -65,11 +66,9 @@ You can find actual examples with data files at [`examples/json_map`](https://gi

## Example data for testing and development purposes

Before using your own data we strongly encourage you to download a set of open-source test data for testing the plug-ins. For this purpose pynxtools comes with a tests directory with a data/dataconverter sub-directory including reader-specific jupyter-notebook examples. These examples can be used for downloading test data and use specific readers as a standalone converter to translate given data into a NeXus/HDF5 file.

Once you have practised with these tools how to convert these examples, feel free to use the tools for converting your own data. You should feel invited to contact the respective corresponding author(s) of each reader if you run into issues with the reader or feel there is a necessity to include additional data into the NeXus file for the respective application.
Before using your own data we strongly encourage you to download a set of open-source test data for testing the pynxtools readers andreader plugins. For this purpose, pynxtools and its plugins come
with `examples` and `test` directories including reader-specific examples. These examples can be used for downloading test data and use specific readers as a standalone converter to translate given data into a NeXus/HDF5 file.

We are looking forward for learning from your experience and see the interesting use cases.
You can find the contact persons in the respective README.md of each reader.
Once you have practized with these tools how to convert these examples, feel free to use the tools for converting your own data. You should feel invited to contact the respective corresponding author(s) of each reader if you run into issues with the reader or feel there is a necessity to include additional data into the NeXus file for your respective application.

You can read specific README's of the readers and find usage examples [here](https://github.com/FAIRmat-NFDI/pynxtools/tree/master/examples/).
We are looking forward to learning from your experience and learn from your use cases. You can find the contact persons in the respective README.md of each reader (plugin).
2 changes: 1 addition & 1 deletion docs/learn/nexus-rules.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Rules for storing data in NeXus

There are several rules which apply for storing single data items in NeXus. There exists a [summary](https://manual.nexusformat.org/datarules.html) in the NeXus documentation outlining most of these rules. However, this explanation is not exhaustive and thus, we have compiled here additional information.
There are several rules which apply for storing single data items in NeXus. There exists a [summary](https://manual.nexusformat.org/datarules.html) in the NeXus documentation outlining most of these rules. However, to guide data providers even further, we have compiled here additional information and explanations.


## Namefitting
Expand Down
Loading

0 comments on commit b1f8fdc

Please sign in to comment.