Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whitelist for projects #25

Open
sashakames opened this issue Nov 2, 2020 · 10 comments · Fixed by #30
Open

whitelist for projects #25

sashakames opened this issue Nov 2, 2020 · 10 comments · Fixed by #30
Assignees

Comments

@sashakames
Copy link
Contributor

To integrate wget into ESGF where it can't yet support restricted projects (as the auth infrastructure isn't deployment-ready),
we should whitelist particular projects that are unrestricted. The project can be detected either in a search parameter: project=X or more commonly, the first attribute in the dataset tuple. If any datasets are detected outside of the whitelist, we display an error message to the user stating that the dataset was requested cannot be processed through this service and to (1) redo the request for unrestricted data only, AND (2) create a request for restricted data at a non-LLNL site.

[CMIP6, cmip5, cmip3, input4MIPs, obs4MIPs, CREATE-IP, E3SM] are most of the projects to consider.

@mauzey1
Copy link
Contributor

mauzey1 commented Nov 4, 2020

@sashakames Should the whitelist be held in an external XML file like the Solr shard list?

@sashakames
Copy link
Contributor Author

local_settings.py is good for the whitelist. The xml format is legacy from the java days

@mauzey1
Copy link
Contributor

mauzey1 commented Nov 5, 2020

@sashakames We have recently replaced the local_settings.py file with an INI configuration file. We could either have a variable in the config file that is a comma-separated list of projects, or we could have the whitelist stored in a JSON file.

@sashakames
Copy link
Contributor Author

I'm fine with a .json file. Ini doesn't work well with lists (picky about formatting in my opinion), though ESGConfigParser might help if you can get passed the learning curve.

@mauzey1
Copy link
Contributor

mauzey1 commented Nov 5, 2020

esgf-wget currently uses environment variables for the config file path and Django secret key, INI for the initial settings, and XML for the Solr shards.

Maybe we could use JSON for all of these settings at some point. Not necessarily everything in one file but using JSON format for all config files.

@sashakames
Copy link
Contributor Author

The solr shards from esg-search is a bit of a legacy list. Its fine to migrate to a different form, but we would need to ensure the new listis made up to date in the event another shard drops out. I consider .json to be the easiest machine-readable format for Python programmers but open to other opinions on the matter.

@philipkershaw
Copy link

Just offering some thoughts on this issue: I'm a wary of having a separate piece of metadata to the auth layer itself to indicate whether something is secured or not. I would not have this. Instead I would suggest the esgf-wget code verify by going to source: do a sample check on a download and see if it returns 401 Unauthorized. If so, you know to a good level of confidence that it is a secured dataset. If we make a separate list we run the risk that we have to maintain information about access control in two different places. These can get out of sync with one another eg. the policy for a dataset changes to be open but esgf-wget tells me I can't download it because it is not in the whitelist.

@sashakames
Copy link
Contributor Author

@philipkershaw good suggestion. My suggestion was devised as a stop-gap for deployment at LLNL to reduce CMIP download errors that continue to plague the support list. We'll need to think more about how to handle the different download scenarios wrt restricted / unrestricted X anonymous / logged-in I'd advocate that any user can get a wget script without a token and it should work well for unrestricted data.

@philipkershaw
Copy link

I'd advocate that any user can get a wget script without a token and it should work well for unrestricted data.

Agreed! :)

@mauzey1
Copy link
Contributor

mauzey1 commented Jan 6, 2021

@sashakames I think we can close this issue since the project whitelist has already been implemented in esgf-wget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants