Skip to content

Commit

Permalink
Update tutorial notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
stuartmcalpine committed Sep 15, 2024
1 parent e932654 commit c6a04d2
Show file tree
Hide file tree
Showing 4 changed files with 184 additions and 99 deletions.
60 changes: 42 additions & 18 deletions docs/source/tutorial_notebooks/datasets_deeper_look.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,20 @@
},
"outputs": [],
"source": [
"# Come up with a random owner name to avoid clashes\n",
"from random import randint\n",
"OWNER = \"tutorial_\" + str(randint(0,int(1e6)))\n",
"\n",
"import dataregistry\n",
"print(\"Working with dataregistry version:\", dataregistry.__version__)"
"print(f\"Working with dataregistry version: {dataregistry.__version__} as random owner {OWNER}\")"
]
},
{
"cell_type": "markdown",
"id": "4c2f92bf-9048-421e-b896-292eb00542c8",
"metadata": {},
"source": [
"**Note** that running some of the cells below may fail, especially if ran multiple times. This will likely be from clashes with the unique constraints within the database (hopefully the error output is informative). In these events either; (1) run the cell above to establish a new database connection with a new random user, or (2) manually change the conflicting database column(s) that are clashing during registration."
]
},
{
Expand All @@ -55,13 +67,15 @@
"cell_type": "code",
"execution_count": null,
"id": "72eabcd0-b05e-4e87-9ed1-6450ac196b05",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from dataregistry import DataRegistry\n",
"\n",
"# Establish connection to database (using defaults)\n",
"datareg = DataRegistry()"
"# Establish connection to the tutorial schema\n",
"datareg = DataRegistry(schema=\"tutorial_working\", owner=OWNER)"
]
},
{
Expand All @@ -78,7 +92,9 @@
"cell_type": "code",
"execution_count": null,
"id": "560b857c-7d94-44ad-9637-0b107cd42259",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(datareg.Registrar.dataset.get_keywords())"
Expand All @@ -98,7 +114,9 @@
"cell_type": "code",
"execution_count": null,
"id": "44581049-1d15-44f0-b1ed-34cff6cdb45a",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Add new dataset entry with keywords.\n",
Expand Down Expand Up @@ -132,7 +150,9 @@
"cell_type": "code",
"execution_count": null,
"id": "09478b87-7d5a-4814-85c7-49f90e0db45d",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# List of keywords to add to dataset\n",
Expand Down Expand Up @@ -160,22 +180,24 @@
"\n",
"The files and directories of registered datasets are stored under a path relative to the root directory (`root_dir`), which, by default, is a shared space at NERSC.\n",
"\n",
"By default, the relative_path is constructed from the `name`, `version` and `version_suffix` (if there is one), in the format `relative_path=<name>/<version>_<version_suffix>`. However, one can also manually select the relative_path during registration, for example"
"By default, the `relative_path` is constructed from the `name`, `version` and `version_suffix` (if there is one), in the format `relative_path=<name>/<version>_<version_suffix>`. However, one can also manually select the relative_path during registration, for example"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5bc0d5b6-f50a-4646-bc1b-7d9e829e91bc",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Add new entry with a manual relative path.\n",
"datareg.Registrar.dataset.register(\n",
" \"nersc_tutorial:my_desc_dataset_with_relative_path\",\n",
" \"1.0.0\",\n",
" relative_path=\"nersc_tutorial/my_desc_dataset\",\n",
" location_type=\"dummy\", # for testing, means we need no data\n",
" relative_path=f\"NERSC_tutorial/{OWNER}/my_desc_dataset\",\n",
" location_type=\"dummy\", # for testing, means we need no actual data to exist\n",
")"
]
},
Expand Down Expand Up @@ -216,19 +238,21 @@
"cell_type": "code",
"execution_count": null,
"id": "718d1cd8-4517-4597-9e36-e403e219cef2",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from dataregistry.dataset_util import get_dataset_status\n",
"from dataregistry.registrar.dataset_util import get_dataset_status\n",
"\n",
"# The `get_dataset_status` function takes in a dataset `status` and a bit index, and returns if that bit is True or False\n",
"dataset_status = 1\n",
"\n",
"# Is dataset valid?\n",
"print(f\"Dataset is valid: {get_dataset_status(dataset_status, \"valid\"}\")\n",
"print(f\"Dataset is valid: {get_dataset_status(dataset_status, 'valid')}\")\n",
"\n",
"# Is dataset replaced?\n",
"print(f\"Dataset is replaced: {get_dataset_status(dataset_status, \"replaced\"}\")"
"print(f\"Dataset is replaced: {get_dataset_status(dataset_status, 'replaced')}\")"
]
},
{
Expand Down Expand Up @@ -257,9 +281,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "DREGS-env",
"language": "python",
"name": "python3"
"name": "venv"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -271,7 +295,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.9.18"
}
},
"nbformat": 4,
Expand Down
60 changes: 23 additions & 37 deletions docs/source/tutorial_notebooks/production_schema.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
"\n",
"- Connect to the production schema and register a new dataset (admin only)\n",
"- Query the production schema\n",
"- Transfer a dataset from another schema to the production schema (admin only)\n",
"\n",
"### Before we begin\n",
"\n",
Expand All @@ -43,8 +42,20 @@
},
"outputs": [],
"source": [
"# Come up with a random owner name to avoid clashes\n",
"from random import randint\n",
"OWNER = \"tutorial_\" + str(randint(0,int(1e6)))\n",
"\n",
"import dataregistry\n",
"print(\"Working with dataregistry version:\", dataregistry.__version__)"
"print(f\"Working with dataregistry version: {dataregistry.__version__} as random owner {OWNER}\")"
]
},
{
"cell_type": "markdown",
"id": "782179b4-4349-4199-b3a3-38d4845188a9",
"metadata": {},
"source": [
"**Note** that running some of the cells below may fail, especially if ran multiple times. This will likely be from clashes with the unique constraints within the database (hopefully the error output is informative). In these events either; (1) run the cell above to establish a new database connection with a new random user, or (2) manually change the conflicting database column(s) that are clashing during registration."
]
},
{
Expand All @@ -71,17 +82,15 @@
"from dataregistry import DataRegistry\n",
"\n",
"# Establish connection to the production schema\n",
"datareg = DataRegistry(schema=\"production\", owner=\"DESC CO group\", owner_type=\"production\")"
"datareg = DataRegistry(schema=\"tutorial_production\", owner=\"production\", owner_type=\"production\")"
]
},
{
"cell_type": "markdown",
"id": "6f7423fb-32d0-4a33-8e87-cd75e952512f",
"metadata": {},
"source": [
"Here we have connected to the data registry production schema (`schema=\"production\"`). Notice we have assigned a universal owner (`owner=\"DESC CO group\"`) and owner type (`owner_type=\"production\"`) to save some time when registering the datasets during this instance.\n",
"\n",
"Note for the production schema no value other than `production` will be allowed for `owner_type` (the inverse is also true for any schema other than production)."
"Here we have connected to the data registry tutorial production schema (`schema=\"tutorial_production\"`). We have assigned the universal `owner` and `owner_type` to be \"production\", which is the only values allowed for the production schema."
]
},
{
Expand All @@ -93,17 +102,12 @@
},
"outputs": [],
"source": [
"# Production datasets can't be overwritten, so for the purposes of this tutorial, let's generate a random unique name\n",
"import numpy as np\n",
"tag = np.rrandom.andint(0, 100000)\n",
"\n",
"# Add new entry.\n",
"dataset_id, execution_id = datareg.Registrar.dataset.register(\n",
" f\"nersc_production_tutorial/my_desc_production_dataset_{tag}\",\n",
" f\"nersc_production_tutorial:my_desc_production_dataset_{OWNER}\",\n",
" \"1.0.0\",\n",
" description=\"An production output from some DESC code\",\n",
" old_location=\"dummy_production_dataset.txt\",\n",
" is_dummy=True\n",
" location_type=\"dummy\"\n",
")\n",
"\n",
"print(f\"Created dataset {dataset_id}, associated with execution {execution_id}\")"
Expand All @@ -120,7 +124,7 @@
"\n",
"To recap about production datasets:\n",
"- Only administrators have write access to the production schema and shared space\n",
"- All datasets in the production schema have `owner_type=\"production\"`\n",
"- All datasets in the production schema have `owner=\"production\"` and `owner_type=\"production\"`\n",
"- Production datasets can never be overwritten, even if `is_overwritable=True`"
]
},
Expand All @@ -147,8 +151,8 @@
},
"outputs": [],
"source": [
"# Create a filter that queries on the dataset name\n",
"f = datareg.Query.gen_filter('dataset.name', '==', 'my_desc_production_dataset')\n",
"# Create a filter that queries on the owner\n",
"f = datareg.Query.gen_filter('dataset.owner', '==', 'production')\n",
"\n",
"# Query the database\n",
"results = datareg.Query.find_datasets(['dataset.dataset_id', 'dataset.name', 'dataset.owner',\n",
Expand All @@ -166,31 +170,13 @@
"source": [
"Note that when using the command line interface to query datasets, e.g., `dregs ls`, both the default schema you are connected to and the production schema are both searched."
]
},
{
"cell_type": "markdown",
"id": "db6a2ac8-80ad-4038-a722-de9de8fbe433",
"metadata": {},
"source": [
"## Transferring datasets to the production schema\n",
"\n",
"TBD"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb87beb4-937c-498c-b1f2-de32cab29b17",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "DREGS-env",
"language": "python",
"name": "python3"
"name": "venv"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -202,7 +188,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.9.18"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit c6a04d2

Please sign in to comment.