Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.0.0 #150

Merged
merged 30 commits into from
Sep 24, 2024
Merged

Release 1.0.0 #150

merged 30 commits into from
Sep 24, 2024

Conversation

stuartmcalpine
Copy link
Collaborator

@stuartmcalpine stuartmcalpine commented Sep 2, 2024

Placeholder branch for release

Todo

  • Update Postgres version. It appears we need at least 15.2 for NULLS NOT DISTINCT. See https://blog.rustprooflabs.com/2022/07/postgres-15-unique-improvement-with-null
    • Update posgres version Updated to 16.4.alpine
    • Move spin instance to production
    • Update to a persistent volume
    • Add developer notes on basic SPIN setup process
  • Decide on new default NERSC site, and update src/dataregistry/site_config/site_rootdir.yaml. Updated to /global/cfs/cdirs/lsst/utilities/data-registry
  • Decide on new schema default (SCHEMA_VERSION in src/dataregistry/db_basic.py). Default schema names are now at src/dataregistry/schema/default_schema_names.yaml, one for WORKING and one for PRODUCTION
  • Implement tutorial schema for tutorial notebooks and update notebooks to point to it Created with the new schema creation script, tutorial_working and tutorial_production, will update notebooks to match
  • Review tutorial notebooks
    • register_datasets.ipynb
    • datasets_deeper_look.ipynb
    • production_schema.ipynb
    • query_datasets.ipynb
    • pipelines.ipynb
  • Review documentation
    • installation
    • tutorial
    • reference
    • dev_notes
    • contact
  • Create startup script to create schema and owner_type directories. These should be static, and that level of the directory structure should have no write access for regular users. New script in scripts folder called create_schema_dirs.sh that creates the initial schema directories and their owner_type groups but needs some revision.
  • Get registry working with communal (read only?) pgpass and config files at NERSC
    • Can have a communal config file at least. Update reading of the config file to start with looking for the NERSC env variable that points to the config file (and update docs to reflect this)
  • Get gcrcatalogs working with release
  • Exercise sqlite I've tested sqlite locally using the create script and running the CI tests, works good for me
  • Build github release and publish to PyPi (have CI publish releases automatically to PyPi?)
    • Update installation to talk about installing from PyPi
  • Incorporate registry into python_bleed.
    • Update installation notes to reflect this

@stuartmcalpine
Copy link
Collaborator Author

I've started a release notes section in the docs to outline the process of setting up spin and the initial creation of the schemas.

This section can also be later used for more code developments notes

@stuartmcalpine
Copy link
Collaborator Author

stuartmcalpine commented Sep 4, 2024

@JoanneBogart, Opinion on what should we change SCHEMA_VERSION to?

  • lsst_desc_working?
  • lsst_desc_default?

The default production schema could also be in the code, at the moment its just an argparse default in the create script. Maybe there could also be PROD_SCHEMA_VERSION for the default? Could also be more explicit

  • lsst_desc_production ?

Could also change the variable name to not include VERSION?, SCHEMA_DEFAULT_NAME .. ?

@JoanneBogart
Copy link
Collaborator

Yes, we should change the variable name to not include VERSION. I've always found that confusing. And there should be one default schema name for production and another for non-production. lsst_desc_production or lsstdesc_production would do for the first. For the second, working contrasts nicely with production.

@stuartmcalpine
Copy link
Collaborator Author

I think there is enough now to start a review, mainly the docs.

Not finalized from my end, but close.

@JoanneBogart
Copy link
Collaborator

I'll start looking seriously today.
Yao approved my additional update to GCRCatalogs (ability to select config source via environment variable) and it's merged.

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at everything yet but thought you might want to address the things - all pretty minor - that I've found so far, which all have to do with code under src/dataregistry. I've also looked at src/dataregistry_cli and tests, which seem fine.

scripts/create_registry_schema.py Outdated Show resolved Hide resolved
scripts/create_registry_schema.py Outdated Show resolved Hide resolved
scripts/create_registry_schema.py Outdated Show resolved Hide resolved
scripts/create_registry_schema.py Outdated Show resolved Hide resolved
src/dataregistry/registrar/base_table_class.py Outdated Show resolved Hide resolved
@stuartmcalpine
Copy link
Collaborator Author

stuartmcalpine commented Sep 17, 2024

I've updated the create script slightly to give a bit better output when running sqlite (and also addressed the comments you made).

With regards to the privileges, they are not really tested. I think on the spin instance I added reg_writer to be able to write, and in the CI it uses the master postgres account. But in the create script it is only adding read access to the production tables, maybe this was when originally there was going to be a production writer account also?

Should it be like

table reg_writer reg_reader reg_admin
lsst_desc_working rw r rw
lsst_desc_production r r rw
tutorial_working rw rw rw
tutorial_production rw rw rw

@JoanneBogart
Copy link
Collaborator

JoanneBogart commented Sep 17, 2024

About your table above - for now, as long as we run the script at NERSC using reg_writer, this looks right. If we decide to make a more privileged account to own the production schema it would be more complicated.

@JoanneBogart
Copy link
Collaborator

I've made some minor comments about the tutorials (which you probably can't yet see since I haven't submitted a review) just from looking at the diffs. I intend to also look at the full text of the tutorials at some point but it's not critical that it happen in time for the release. I started looking at the documentation but thought you might still be working on it so I'll hold off until you tell me you're ready.
I found one other small thing in the creation script which I couldn't comment on easily since it's not in code you changed. In _get_column_definitions, line 79, you use the variable prod_schema which has been set by the main part of the script at line 314. Yes, it's available to the routine, but it's not a good practice to use it this way. If we decided to put the helper routines in a separate file it would break. Better to pass it as an argument.

@stuartmcalpine
Copy link
Collaborator Author

I started looking at the documentation but thought you might still be working on it so I'll hold off until you tell me you're ready.

They are ready to look at, though most of the docs is the notebooks

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to comments on specific lines of code I noticed a few other things:

  • installation.rst needs an overhaul
  • do we want to mention reg_reader and reg_writer in there and expected use of them?
  • in tutorial_CLI could we change the example owner from "DESC Generic Working Group" to something without blanks? E.g. text could be "For example to see all the datasets from the DESC Generic Working Group (GWG)..." and then value for owner could be GWG. It should work as it stands, but I'd rather not encourage people to use values with embedded spaces.
  • in tutorial_python.rst there is no link to the production_schema tutorial

I haven't yet looked at the tutorials. Instructions for installation and use of the tutorials will change as soon as we have made the release and uploaded to PyPI.

I'm thinking that we don't need to get all the documentation-type changes, including changes to tutorials, in before we make a release, as long as we believe the code and database structure are ok and usable (that includes permanent backing store for the database). Once we've made a release we can finish up any remaining documentation issue in a separate PR. For now, I'd just say fix the things that seem most essential (I'm not even sure there are any that are truly essential; I'd have to look) or are quick to do.

docs/source/dev_notes_database.rst Outdated Show resolved Hide resolved
docs/source/tutorial_notebooks/datasets_deeper_look.ipynb Outdated Show resolved Hide resolved
docs/source/tutorial_notebooks/production_schema.ipynb Outdated Show resolved Hide resolved
docs/source/tutorial_notebooks/query_datasets.ipynb Outdated Show resolved Hide resolved
docs/source/tutorial_notebooks/register_datasets.ipynb Outdated Show resolved Hide resolved
docs/source/dev_notes_database.rst Outdated Show resolved Hide resolved
docs/source/dev_notes_database.rst Outdated Show resolved Hide resolved
docs/source/dev_notes_database.rst Outdated Show resolved Hide resolved
docs/source/dev_notes_database.rst Show resolved Hide resolved
docs/source/dev_notes_database.rst Outdated Show resolved Hide resolved
@JoanneBogart
Copy link
Collaborator

I used the create_registry_schema script to make a production-like alt_production and then an alt_working. The tables are created properly and the privileges look like they match the intent. I'm wondering whether reg_writer should have DELETE. As far as I know there is nothing in the API which will delete a row.

@stuartmcalpine
Copy link
Collaborator Author

I've removed version_suffix, answered the basic comments you had on the docs.

I've shifted some of the comments here into a issue so we don't forget some final tweaks we can do later.

Once this code is approved, we can make the release.

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. There are things we need to fix but nothing that is important for the release. I successfully made some gcr catalog entries in the alt_production schema (necessarily using the reg_admin account) and also made entries in alt_working with reg_writer.
As soon as this PR is merged we should get rid of existing lsst_desc_* schemas and make new ones.

@stuartmcalpine stuartmcalpine merged commit bba71d6 into main Sep 24, 2024
20 checks passed
@stuartmcalpine stuartmcalpine deleted the release_1.0.0 branch September 24, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants