-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U/jrbogart/provenance v2 #111
Conversation
…lude non-standard cosmology parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and change requests.
@@ -220,13 +220,20 @@ def __init__(self, sed_path, fmt='db'): | |||
assumptions: that the table name is "SED" and that the columns are | |||
"wavelength" (units angstroms) and "flux" (units flambda) | |||
''' | |||
if not sed_path: | |||
# print a warning message? | |||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is returning None
really the right thing here? From the line above in skyCatalogs.py
, if this object is returned as None
, the sso
object type simply isn't registered, but it seems like it should raise an exception if the needed SED info isn't available instead of silently omitting SSOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the typical config file includes all object types supported by the code, whether or not they're actually available at the installation. This will change with the stuff I'm currently working on (which, among other things, should make it easier to include only those object types which are available), but for now the fact that the SSO SED isn't there is often the first sign the code sees that there are no SSOs. I could add a warning message but would rather not raise an exception. That would force people to meddle with the configs by hand (to remove the sso section) more than I'd like.
object_class=SsoObject, | ||
collection_class=SsoCollection) | ||
if self._sso_sed_factory: | ||
self.cat_cxt.register_source_type('sso', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below in sed_tools.py
. Silently omitting SSOs here seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll come up with a suitable place to log a warning message.
inputs=inputs, | ||
run_options=self._run_options) | ||
|
||
arrow_schema = make_star_schema(metadata_input=file_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment below,
# Need a way to indicate which object types to include; deal with that # later. For now, default is stars + sn
still relevant? I would think not, since sncosmo
support has been removed. As it stands, it's hard to see how it relates to the code around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, no longer relevant. Will delete.
skycatalogs/catalog_creator.py
Outdated
@@ -1165,5 +1193,6 @@ def write_provenance_file(self, datafile_path): | |||
''' | |||
outpath = datafile_path.rsplit('.', 1)[0] + '_provenance.yaml' | |||
|
|||
prov = assemble_provenance(self._pkg_root, inputs=None) | |||
prov = assemble_provenance(self._pkg_root, inputs=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems inconsistent that we have inputs=None
for this provenance and an inputs
value that's not None
above. Shouldn't the same provenance info be used in both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. In the first case, a global config file is being written. The value for inputs
is all the possible inputs catalog_creator knows about. In the second, it's only associated with a particular binary file. Probably there should have been an inputs
argument to write_provenance_file so that inputs appropriate to the object type of that file could be included. (That's what I've done in the code I'm working on where there is no global provenance, but only provenance associated with an object type) But, now that provenance is being included in the binary file (which includes only relevant inputs), I don't think we need this routine altogether. My inclination is to get rid of it and the option in the creation script which causes it to be called.
skycatalogs/utils/common_utils.py
Outdated
def callinfo_to_dict(args): | ||
""" | ||
Make a dict out of program arguments. Each option value is | ||
either a simple atomic type or a list | ||
""" | ||
a_dict = dict() | ||
for e in dir(args): | ||
if not e.startswith('_'): | ||
v = eval('args.' + e) | ||
a_dict[e] = v | ||
return a_dict | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that
dict(args._get_kwargs())
does the same thing as this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it does. I could also use _get_kwargs
to simplify log_callinfo
but the results aren't alphabetized, which is useful in that case. Another similar routine,print_callinfo
, isn't used so I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To alphabetize by keyword,
dict(sorted(args._get_kwargs()))
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll make that change.
@@ -237,7 +238,8 @@ def __init__(self, parts, area_partition=None, skycatalog_root=None, | |||
galaxy_stride=1000000, provenance=None, | |||
dc2=False, sn_object_type='sncosmo', galaxy_type='cosmodc2', | |||
include_roman_flux=False, star_input_fmt='sqlite', | |||
sso_truth=None, sso_sed=None, sso_partition='healpixel'): | |||
sso_truth=None, sso_sed=None, sso_partition='healpixel', | |||
run_options=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring entry for run_options
please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, certainly. Just an oversight.
skycatalogs/catalog_creator.py
Outdated
self._run_options = run_options | ||
|
||
# Do we need this? | ||
self._sed_bins = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this attribute were named something like self._cosmodc2_sed_bins
or self._tophat_sed_bins
, it would clearer in which contexts it's used. Based on the code below, setting to None
here isn't strictly needed, but setting a private attribute in the __init__
seems like a good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll change the name.
CI check is failing because sdf is down, but nothing I did in this last commit (all in "create" part of skycatalogs) would affect it. This PR is ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, but I would recommend making the warning more explicit about the outcome of the missing SSO files.
skycatalogs/skyCatalogs.py
Outdated
@@ -334,6 +334,9 @@ def __init__(self, config, mp=False, skycatalog_root=None, verbose=False, | |||
'sso_sed.db') | |||
|
|||
self._sso_sed_factory = SsoSedFactory(self._sso_sed_path) | |||
if not self._sso_sed_factory: | |||
self._logger.warning('SSO appear in the list of available object types but supporting files do not exist') | |||
self._logger.warning('SSOs will not be present in object lists') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that a warning that explicitly says that "SSOs will not be simulated" would be more helpful to users.
Add provenance information to any created parquet files as metadata.
Include run options as part of provenance (both in parquet files and in output text config).
Also includes
Partially addresses issue #64