Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.0.0 #150

Merged
merged 30 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ef96749
Update version
stuartmcalpine Sep 2, 2024
041b22d
Start developer notes section in docs
stuartmcalpine Sep 2, 2024
707cc10
Update default NERSC site
stuartmcalpine Sep 4, 2024
f8cbec8
Fix test
stuartmcalpine Sep 4, 2024
c6a03d4
Update schema default names
stuartmcalpine Sep 5, 2024
deecb57
Update default NERSC site dir
stuartmcalpine Sep 5, 2024
f46a327
Fix what production schema is used for dependencies, tidy schema crea…
stuartmcalpine Sep 5, 2024
9538163
Update CI schema names
stuartmcalpine Sep 5, 2024
52eb474
Update tests to use new default production schema name
stuartmcalpine Sep 5, 2024
d08572b
Add extra error output
stuartmcalpine Sep 10, 2024
a6f61d6
Update nersc root site dir
stuartmcalpine Sep 10, 2024
1bf36fc
Update the production schema to the new default in the CLI
stuartmcalpine Sep 10, 2024
6037fe8
No need for permission check on root dir, user should never have it a…
stuartmcalpine Sep 10, 2024
ca42227
Add script to create schema dirs
stuartmcalpine Sep 10, 2024
19946bd
Fix tet
stuartmcalpine Sep 10, 2024
915fa75
Update error output
stuartmcalpine Sep 15, 2024
872cff1
Make sure dataset checks work with any production schema name
stuartmcalpine Sep 15, 2024
e932654
Fix production check for sqlite
stuartmcalpine Sep 15, 2024
c6a04d2
Update tutorial notebooks
stuartmcalpine Sep 15, 2024
031dd48
Update some docs
stuartmcalpine Sep 16, 2024
13e4c41
Address reviewer comments
stuartmcalpine Sep 17, 2024
51988a4
Correct create script for sqlite
stuartmcalpine Sep 17, 2024
1a4ad28
Update creation script to assign correct permissions to reg_reader an…
stuartmcalpine Sep 19, 2024
131a55b
Add option creating schemas to not add permissions (for tutorial sche…
stuartmcalpine Sep 19, 2024
df9625d
Add select priv to sequences
stuartmcalpine Sep 19, 2024
2a1e3b5
Address reviewer comments on docs
stuartmcalpine Sep 22, 2024
14ea1bb
Remove DELETE privileges
stuartmcalpine Sep 22, 2024
d90add5
Update changelog
stuartmcalpine Sep 22, 2024
2308c35
Remove version suffix support
stuartmcalpine Sep 22, 2024
691c5b2
Remove version suffix from tests
stuartmcalpine Sep 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,9 @@ jobs:
echo "sqlalchemy.url : postgresql://postgres:postgres@localhost:5432/desc_data_registry" > $HOME/.config_reg_access

# Create schemas
- name: Create data registry production schema
- name: Create data registry schemas
run: |
python scripts/create_registry_schema.py --config $HOME/.config_reg_access --schema production

- name: Create data registry default schema
run: |
python scripts/create_registry_schema.py --config $HOME/.config_reg_access
python scripts/create_registry_schema.py --config $HOME/.config_reg_access --create_both

# Run CI tests
- name: Run CI tests
Expand Down Expand Up @@ -152,13 +148,9 @@ jobs:
echo "sqlalchemy.url : postgresql://postgres:postgres@localhost:5432/desc_data_registry" > $DATAREG_CONFIG

# Create schemas
- name: Create data registry production schema
- name: Create data registry schemas
run: |
python scripts/create_registry_schema.py --config $DATAREG_CONFIG --schema production

- name: Create data registry default schema
run: |
python scripts/create_registry_schema.py --config $DATAREG_CONFIG
python scripts/create_registry_schema.py --config $DATAREG_CONFIG --create_both

# Run CI tests
- name: Run CI tests
Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
## Version 1.0.0 (Release)

- Update default NERSC site to
`/global/cfs/cdirs/lsst/utilities/desc-data-registry`
- Update default schema names (now stored in
`src/dataregistry/schema/default_schema_names.yaml`
- There is now a `reg_admin` account which is the only account to create the
initial schemas. The schema creation script has been updated to give the
correct `reg_writer` and `reg_reader` privileges.
- Remove `version_suffix`

## Version 0.6.4

- Update `dregs ls` to be a bit cleaner. Also has `dregs ls --extended` option
Expand Down
56 changes: 56 additions & 0 deletions docs/source/dev_notes_database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Database structure
==================

The database schemas
--------------------

There are two primary database schemas which the majority of users will work with:

- The "default" schema, which the a hard-coded variable ``DEFAULT_SCHEMA_WORKING`` in
the ``src/dataregistry/db_basic.py`` file. It can be imported by ``from
dataregistry.db_basic import DEFAULT_SCHEMA_WORKING``
- The production schema. This is where production datasets go, and has only
read access for the general user. By default this schema is named
"production", however during schema creation (see below) you can specify the
name of the production schema (though this should only be changed for testing
purposes).

Users can specify their own schemas during the initialization of the
``DataRegistry`` object (by default ``DEFAULT_SCHEMA_WORKING`` is connected to). If
they wish to connect to the production schema its name will have to be manually
entered (see production schema tutorial). If the user wishes to connect to a
custom schema they will have to manually enter its name, however they will have
to have created their own schema for it to work.

When using *SQLite* as the backend (useful for testing), the concepts of
schemas do not exist.

First time creation of database schemas
---------------------------------------

In the top level ``scripts`` directory there is a ``create_registry_schema.py``
script to do the initial schema creation. Before using the data registry, both
for *Postgres* and *SQLite* backends, this script must have been run.

First, make sure your ``~/.config_reg_access`` and ``~/.pgpass`` are correctly
setup (see "Getting set up" for more information on these configuration files).
When creating schemas at NERSC, make sure the SPIN instance of the *Postgres*
database is running.

The script must be run twice, first for the production schema, then for the
stuartmcalpine marked this conversation as resolved.
Show resolved Hide resolved
general schema (or run in a single call when using the ``--create_both``
argument). There are three arguments that can be specified (all optional):

- ``--config`` : Location of the data registry configuration file
(``~/.config_reg_access`` by default)
- ``--schema`` : The name of the schema (default is ``DEFAULT_SCHEMA_WORKING``)
- ``--production-schema``: The name of the production schema (default
"production")
- ``--create_both`` : Create both the production schema and working schema in
one call (the production schema will be made first, then the working schema)

The typical initlalization would be:

.. code-block:: bash

python3 create_registry_schema.py --create_both
4 changes: 4 additions & 0 deletions docs/source/dev_notes_spin.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SPIN
====

Details on setting up the SPIN instance...
8 changes: 8 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,14 @@ them.
reference_cli
reference_schema

.. toctree::
:maxdepth: 2
:caption: Developer notes:
:hidden:

dev_notes_spin
dev_notes_database

.. toctree::
:maxdepth: 2
:caption: Contact:
Expand Down
12 changes: 12 additions & 0 deletions docs/source/reference_python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ It connects the user to the database, and serves as a wrapper to both the

.. automethod:: dataregistry.registrar.dataset.DatasetTable.register

.. automethod:: dataregistry.registrar.dataset.DatasetTable.replace

.. automethod:: dataregistry.registrar.dataset.DatasetTable.modify

.. automethod:: dataregistry.registrar.dataset.DatasetTable.delete

.. automethod:: dataregistry.registrar.dataset.DatasetTable.add_keywords

.. automethod:: dataregistry.registrar.dataset.DatasetTable.get_modifiable_columns

.. automethod:: dataregistry.registrar.dataset.DatasetTable.get_keywords

.. automethod:: dataregistry.registrar.execution.ExecutionTable.register

.. automethod:: dataregistry.registrar.dataset_alias.DatasetAliasTable.register
32 changes: 11 additions & 21 deletions docs/source/tutorial_cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ Typing

will list all the metadata properties that can be associated with a dataset
during registration. As when registering datasets using the ``dataregistry``
package, the ``relative_path`` and ``version`` string properties are mandatory,
which will always be the first two parameters passed to the ``dregs register
package, the dataset ``name`` and ``version`` properties are mandatory, which
will always be the first two parameters passed to the ``dregs register
dataset`` command respectively.

For example, say I have produced some data from my latest DESC publication that
Expand All @@ -59,11 +59,9 @@ would run the CLI as follows:
--description "Data from my_paper_dataset"

This will recursively copy the ``/some/place/at/nersc/my_paper_dataset/``
directory into the data registry shared space under the relative path
``my_paper_dataset``. As we did not specify a ``--name`` for the dataset, the
``name`` column in the database will automatically be assigned as
``my_paper_dataset`` (and all other properties we did not specify will keep
their default values).
directory into the data registry shared space with the
``name='my_paper_dataset'`` (other non-specified properties will keep their
default values).

Updating a dataset
------------------
Expand All @@ -76,26 +74,18 @@ initial registration, we need to create a new version of the dataset.
.. code-block:: bash

dregs register dataset \
my_paper_dataset_updated \
my_paper_dataset \
patch \
--old-location /some/place/at/nersc/my_paper_dataset_updated/ \
--owner_type project \
--owner "DESC Generic Working Group" \
--description "Data from my_paper_dataset describing bugfix" \
--name my_paper_dataset

Here we associate it with the previous dataset through ``--name
my_paper_dataset``, and tell the data registry to automatically bump the patch
version to ``1.0.1`` by specifying "patch" as the version string (you could
however have entered "1.0.1" here if you prefer).

.. note::

Remember, if the dataset is non-overwritable, the relative paths in the data
registry need to be unique, which is why we could not have the relative path
of the second entry match the first. But for datasets only the ``name``
plus ``version`` has to be unique, which is how we could associate them with
the same ``name`` column.
Here we associate it with the previous dataset through ``name=
my_paper_dataset`` (and making sure we keep the same `owner` and `owner_type`),
and tell the data registry to automatically bump the patch version to ``1.0.1``
by specifying "patch" as the version string (you could however have entered
"1.0.1" here if you prefer).

Querying the data registry
--------------------------
Expand Down
60 changes: 42 additions & 18 deletions docs/source/tutorial_notebooks/datasets_deeper_look.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,20 @@
},
"outputs": [],
"source": [
"# Come up with a random owner name to avoid clashes\n",
"from random import randint\n",
"OWNER = \"tutorial_\" + str(randint(0,int(1e6)))\n",
"\n",
"import dataregistry\n",
"print(\"Working with dataregistry version:\", dataregistry.__version__)"
"print(f\"Working with dataregistry version: {dataregistry.__version__} as random owner {OWNER}\")"
]
},
{
"cell_type": "markdown",
"id": "4c2f92bf-9048-421e-b896-292eb00542c8",
"metadata": {},
"source": [
"**Note** that running some of the cells below may fail, especially if run multiple times. This will likely be from clashes with the unique constraints within the database (hopefully the error output is informative). In these events either; (1) run the cell above to establish a new database connection with a new random user, or (2) manually change the conflicting database column(s) that are clashing during registration."
]
},
{
Expand All @@ -55,13 +67,15 @@
"cell_type": "code",
"execution_count": null,
"id": "72eabcd0-b05e-4e87-9ed1-6450ac196b05",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from dataregistry import DataRegistry\n",
"\n",
"# Establish connection to database (using defaults)\n",
"datareg = DataRegistry()"
"# Establish connection to the tutorial schema\n",
"datareg = DataRegistry(schema=\"tutorial_working\", owner=OWNER)"
]
},
{
Expand All @@ -78,7 +92,9 @@
"cell_type": "code",
"execution_count": null,
"id": "560b857c-7d94-44ad-9637-0b107cd42259",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(datareg.Registrar.dataset.get_keywords())"
Expand All @@ -98,7 +114,9 @@
"cell_type": "code",
"execution_count": null,
"id": "44581049-1d15-44f0-b1ed-34cff6cdb45a",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Add new dataset entry with keywords.\n",
Expand Down Expand Up @@ -132,7 +150,9 @@
"cell_type": "code",
"execution_count": null,
"id": "09478b87-7d5a-4814-85c7-49f90e0db45d",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# List of keywords to add to dataset\n",
Expand Down Expand Up @@ -160,22 +180,24 @@
"\n",
"The files and directories of registered datasets are stored under a path relative to the root directory (`root_dir`), which, by default, is a shared space at NERSC.\n",
"\n",
"By default, the relative_path is constructed from the `name`, `version` and `version_suffix` (if there is one), in the format `relative_path=<name>/<version>_<version_suffix>`. However, one can also manually select the relative_path during registration, for example"
"By default, the `relative_path` is constructed from the `name`, `version` and `version_suffix` (if there is one), in the format `relative_path=<name>/<version>_<version_suffix>`. However, one can also manually select the relative_path during registration, for example"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5bc0d5b6-f50a-4646-bc1b-7d9e829e91bc",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Add new entry with a manual relative path.\n",
"datareg.Registrar.dataset.register(\n",
" \"nersc_tutorial:my_desc_dataset_with_relative_path\",\n",
" \"1.0.0\",\n",
" relative_path=\"nersc_tutorial/my_desc_dataset\",\n",
" location_type=\"dummy\", # for testing, means we need no data\n",
" relative_path=f\"NERSC_tutorial/{OWNER}/my_desc_dataset\",\n",
" location_type=\"dummy\", # for testing, means we need no actual data to exist\n",
")"
]
},
Expand Down Expand Up @@ -216,19 +238,21 @@
"cell_type": "code",
"execution_count": null,
"id": "718d1cd8-4517-4597-9e36-e403e219cef2",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from dataregistry.dataset_util import get_dataset_status\n",
"from dataregistry.registrar.dataset_util import get_dataset_status\n",
"\n",
"# The `get_dataset_status` function takes in a dataset `status` and a bit index, and returns if that bit is True or False\n",
"dataset_status = 1\n",
"\n",
"# Is dataset valid?\n",
"print(f\"Dataset is valid: {get_dataset_status(dataset_status, \"valid\"}\")\n",
"print(f\"Dataset is valid: {get_dataset_status(dataset_status, 'valid')}\")\n",
"\n",
"# Is dataset replaced?\n",
"print(f\"Dataset is replaced: {get_dataset_status(dataset_status, \"replaced\"}\")"
"print(f\"Dataset is replaced: {get_dataset_status(dataset_status, 'replaced')}\")"
]
},
{
Expand Down Expand Up @@ -257,9 +281,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "DREGS-env",
"language": "python",
"name": "python3"
"name": "venv"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -271,7 +295,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.9.18"
}
},
"nbformat": 4,
Expand Down
Loading
Loading