Skip to content

Commit

Permalink
Merge pull request #34 from LSSTDESC/user/aimalz/renaming
Browse files Browse the repository at this point in the history
naming consistency/clarity within src/rail/estimation
  • Loading branch information
aimalz authored Jul 14, 2023
2 parents cc6b9fe + b04f72c commit 13719c7
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 32 deletions.
30 changes: 15 additions & 15 deletions examples/BPZ_lite_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"from rail.core.data import TableHandle\n",
"from rail.core.stage import RailStage\n",
"from rail.core.utils import RAILDIR\n",
"from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite"
"from rail.estimation.algos.bpz_lite import BPZliteInformer, BPZliteEstimator"
]
},
{
Expand Down Expand Up @@ -88,12 +88,12 @@
"id": "979d5363-af4e-4478-9820-29214292bf16",
"metadata": {},
"source": [
"## running BPZ_lite with a pre-existing model\n",
"## running BPZliteEstimator with a pre-existing model\n",
"\n",
"BPZ is a template-fitting code that works by calculating the chi^2 value for observed photometry and errors compared with a grid of theoretical photometric fluxes generated from a set of template SEDs at each of a grid of redshift values. These chi^2 values are converted to likelihoods. If desired, a Bayesian prior can be applied that parameterizes the expected distribution of galaxies in terms of both probability of a \"broad\" SED type as a function of apparent magnitude, and the probability of a galaxy being at a certain redshift given broad SED type and apparent magnitude. The product of this prior and the likelihoods is then summed over the SED types to return a marginalized posterior PDF, or p(z) for each galaxy. If the config option `no_prior` is set to `True`, then no prior is applied, and BPZ_lite will return a likelihood for each galaxy rather than a posterior.\n",
"BPZ is a template-fitting code that works by calculating the chi^2 value for observed photometry and errors compared with a grid of theoretical photometric fluxes generated from a set of template SEDs at each of a grid of redshift values. These chi^2 values are converted to likelihoods. If desired, a Bayesian prior can be applied that parameterizes the expected distribution of galaxies in terms of both probability of a \"broad\" SED type as a function of apparent magnitude, and the probability of a galaxy being at a certain redshift given broad SED type and apparent magnitude. The product of this prior and the likelihoods is then summed over the SED types to return a marginalized posterior PDF, or p(z) for each galaxy. If the config option `no_prior` is set to `True`, then no prior is applied, and BPZliteEstimator will return a likelihood for each galaxy rather than a posterior.\n",
"\n",
"\n",
"`bpz-1.99.3`, the code written by Dan Coe and Narcisso Benitez and available at https://www.stsci.edu/~dcoe/BPZ/, uses a default set of eight SED templates: four templates from Coleman, Wu, & Weedman (CWW, one Elliptical, two Spirals Sbc and Scd, and one Irregular), two starburst (WB) templates, and two very blue star forming templates generated using Bruzual & Charlot models with very young ages of 25Myr and 5Myr. The original BPZ paper, Benitez(2000) computed a \"default\" prior fit to data from the Hubble Deep Field North (HDFN). A pickle file with these parameters and the default SEDs are included with RAIL, named `CWW_HDFN_prior.pkl`. You can run BPZ_lite with these default templates and priors without doing any training, the equivalent of \"running BPZ with the defaults\" had you downloaded bpz-1.99.3 and run it. **Note, however**, that the cosmoDC2_v1.1.4 dataset has a population of galaxy SEDs that are fairly different from the \"default\" CWWSB templates, and the prior distributions do not exactly match. So, you will get results that do not look particularly good. We will demonstrate that use case here, though, as it is the most simple way to run the code out of the box (and illustrates the dangers of grabbing code and running it out of the box):\n",
"`bpz-1.99.3`, the code written by Dan Coe and Narcisso Benitez and available at https://www.stsci.edu/~dcoe/BPZ/, uses a default set of eight SED templates: four templates from Coleman, Wu, & Weedman (CWW, one Elliptical, two Spirals Sbc and Scd, and one Irregular), two starburst (WB) templates, and two very blue star forming templates generated using Bruzual & Charlot models with very young ages of 25Myr and 5Myr. The original BPZ paper, Benitez(2000) computed a \"default\" prior fit to data from the Hubble Deep Field North (HDFN). A pickle file with these parameters and the default SEDs are included with RAIL, named `CWW_HDFN_prior.pkl`. You can run BPZliteEstimator with these default templates and priors without doing any training, the equivalent of \"running BPZ with the defaults\" had you downloaded bpz-1.99.3 and run it. **Note, however**, that the cosmoDC2_v1.1.4 dataset has a population of galaxy SEDs that are fairly different from the \"default\" CWWSB templates, and the prior distributions do not exactly match. So, you will get results that do not look particularly good. We will demonstrate that use case here, though, as it is the most simple way to run the code out of the box (and illustrates the dangers of grabbing code and running it out of the box):\n",
"\n",
"We need to set up a RAIL stage for the default run of BPZ, including specifying the location of the model pickle file:"
]
Expand All @@ -108,15 +108,15 @@
"hdfnfile = os.path.join(RAILDIR, \"rail/examples_data/estimation_data/data/CWW_HDFN_prior.pkl\")\n",
"default_dict = dict(hdf5_groupname=\"photometry\", output=\"bpz_results_defaultprior.hdf5\",\n",
" prior_band=\"mag_i_lsst\", no_prior=False)\n",
"run_default = BPZ_lite.make_stage(name=\"bpz_def_prior\", model=hdfnfile, **default_dict)"
"run_default = BPZliteEstimator.make_stage(name=\"bpz_def_prior\", model=hdfnfile, **default_dict)"
]
},
{
"cell_type": "markdown",
"id": "e8bba83c-d3ed-4385-852c-639e59170c48",
"metadata": {},
"source": [
"Let's run the estimate stage, if this is the first run of ``BPZ_lite`` or ``Inform_BPZ_lite``, you may see a bunch of output lines as ``DESC_BPZ`` creates the synthetic photometry \"AB\" files for the SEDs and filters."
"Let's run the estimate stage, if this is the first run of ``BPZliteEstimator`` or ``BPZliteInformer``, you may see a bunch of output lines as ``DESC_BPZ`` creates the synthetic photometry \"AB\" files for the SEDs and filters."
]
},
{
Expand Down Expand Up @@ -179,7 +179,7 @@
"source": [
"Results do not look bad, there are some catastrophic outliers, and there appears to be some bias in the redshift estimates, but as the SED templates have slightly systematically different colors than our test data, that is just what we expect to see.\n",
"\n",
"BPZ_lite also produces a `tb` , a \"best-fit type\"; that is, the SED template with the highest posterior probability contribution at the value of the `zmode`. We can plot up a color color diagram of our test data and we should see a pattern in color space reflecting the different populations in different areas of color space. `tb` is stored as an 1-indexed integer corresponding the the number of the SED in our template set."
"BPZliteEstimator also produces a `tb` , a \"best-fit type\"; that is, the SED template with the highest posterior probability contribution at the value of the `zmode`. We can plot up a color color diagram of our test data and we should see a pattern in color space reflecting the different populations in different areas of color space. `tb` is stored as an 1-indexed integer corresponding the the number of the SED in our template set."
]
},
{
Expand Down Expand Up @@ -231,7 +231,7 @@
"id": "6ee27ab8-2282-45ae-99bc-a8fcc5691ee1",
"metadata": {},
"source": [
"BPZ_lite also computes a quantity called `todds`, which is the fraction of posterior probability in the best-fit SED relative to the overall probability of all templates. If the value is high, then a single SED is providing more of the probability. If the value is low, then multiple SEDs are contributing, which means that `tb`, the best-fit-SED-type, is less meaningful. The values of todds whould be lower where SEDs have degenerate broad-band colors, let's highlight the values of low todds and see where they lie in color space."
"BPZliteEstimator also computes a quantity called `todds`, which is the fraction of posterior probability in the best-fit SED relative to the overall probability of all templates. If the value is high, then a single SED is providing more of the probability. If the value is low, then multiple SEDs are contributing, which means that `tb`, the best-fit-SED-type, is less meaningful. The values of todds whould be lower where SEDs have degenerate broad-band colors, let's highlight the values of low todds and see where they lie in color space."
]
},
{
Expand Down Expand Up @@ -264,15 +264,15 @@
"id": "e80d59ce-ca73-4bf3-9f3d-682c4dbdcb3e",
"metadata": {},
"source": [
"# Inform_BPZ_lite: training a custom prior\n",
"# BPZliteInformer: training a custom prior\n",
"\n",
"If you want to go beyond the default prior, there is an `Inform_BPZ_lite` stage that allows you to use a training dataset to fit a custom parameterized prior that better matches the magnitude and type distributions of the training set.\n",
"If you want to go beyond the default prior, there is an `BPZliteInformer` stage that allows you to use a training dataset to fit a custom parameterized prior that better matches the magnitude and type distributions of the training set.\n",
"\n",
"`bpz-1.99.3` and our local fork, `DESC_BPZ` both parameterize the Bayesian prior using the form described in Benitez (2000), where the individual SED types are grouped into \"broad types\", e.g. 1 Elliptical makes up one type, the two spirals (Sbc and Scd) make up a second, and the five remaining \"blue\" templates (Im, SB3, SB2, ssp25Myr, and ssp5Myr) make up a third type. This grouping is somewhat ad-hoc, but does have physical motivation, in that we have observed that Ellipticals, spirals, and irregular/starburst galaxies do show distinctly evolving observed fractions as a function of apparent/absolute magnitude and redshift. Things get more complicated with more complex SED sets that contain variations in dust content, star formation histories, emission lines, etc... Due to such complications, the **current** implementation of `inform_BPZ_lite` leaves the assignment of a \"broad-SED-type\" to the user, and these broad types are a necessary input to `Inform_BPZ_lite` via the `type_file` config option. In the future, determination of broad SED type will be added as a pre-processing step to the rail_bpz package.\n",
"`bpz-1.99.3` and our local fork, `DESC_BPZ` both parameterize the Bayesian prior using the form described in Benitez (2000), where the individual SED types are grouped into \"broad types\", e.g. 1 Elliptical makes up one type, the two spirals (Sbc and Scd) make up a second, and the five remaining \"blue\" templates (Im, SB3, SB2, ssp25Myr, and ssp5Myr) make up a third type. This grouping is somewhat ad-hoc, but does have physical motivation, in that we have observed that Ellipticals, spirals, and irregular/starburst galaxies do show distinctly evolving observed fractions as a function of apparent/absolute magnitude and redshift. Things get more complicated with more complex SED sets that contain variations in dust content, star formation histories, emission lines, etc... Due to such complications, the **current** implementation of `BPZliteInformer` leaves the assignment of a \"broad-SED-type\" to the user, and these broad types are a necessary input to `BPZliteInformer` via the `type_file` config option. In the future, determination of broad SED type will be added as a pre-processing step to the rail_bpz package.\n",
"\n",
"The easiest way to obtain these broad SED types is to run `DESC_BPZ` with the parameter `ONLY_TYPE` set to `yes`. When the `ONLY_TYPE` option is turned on in `DESC_BPZ`, the code returns a best-fit SED type evaluated only at the spectroscopic redshift for the object (determined as the best chi^2 amongst the N templates). The user then needs to map these N integers down to a set of \"broad-type\" integers corresponding to however they wish to define the mapping from N SED types to M broad types. As an example, I have done this using the CWWSB templates and the 1 Ell, 2 sp, 5 Im/SB broad type mapping for our `test_dc2_training_9816.hdf5` dataset and included that mapping file in this directory in a file named `test_dc2_training_9816_broadtypes.hdf5` for use in our demo, which consists of an array of integers named `types` with values 0 (Elliptical), 1 (Spiral), and 2 (Irregular/Starburst) corresponding to the best-fit broad SED for each of the 10,225 galaxies in our training sample.\n",
"\n",
"Now, let's set up our inform stage to calculate a new prior. We will name the new prior `test_9816_demo_prior.pkl`, setting this as the `model` config parameter will tell `Inform_BPZ_lite` to save our trained model by that name in the current directory.\n",
"Now, let's set up our inform stage to calculate a new prior. We will name the new prior `test_9816_demo_prior.pkl`, setting this as the `model` config parameter will tell `BPZliteInformer` to save our trained model by that name in the current directory.\n",
"\n",
"When we run `inform` it will display values for the parameters as the minimizer runs, including final values for the parameters. You do not need to pay attention to these values, though if you are curious you can plot them up and compare to the distributions of the HDFN prior."
]
Expand All @@ -287,7 +287,7 @@
"train_dict = dict(hdf5_groupname=\"photometry\", model=\"test_9816_demo_prior.pkl\",\n",
" type_file=\"test_dc2_training_9816_broadtypes.hdf5\",\n",
" nt_array=[1,2,5])\n",
"run_bpz_train = Inform_BPZ_lite.make_stage(name=\"bpz_new_prior\", **train_dict)"
"run_bpz_train = BPZliteInformer.make_stage(name=\"bpz_new_prior\", **train_dict)"
]
},
{
Expand Down Expand Up @@ -420,7 +420,7 @@
"id": "82d0919d-5842-4297-9a03-3838c6be6a2c",
"metadata": {},
"source": [
"Now, let's re-run BPZ_lite using this new prior and see if our results are any different:"
"Now, let's re-run BPZliteEstimator using this new prior and see if our results are any different:"
]
},
{
Expand All @@ -432,7 +432,7 @@
"source": [
"rerun_dict = dict(hdf5_groupname=\"photometry\", output=\"bpz_results_rerun.hdf5\", prior_band='mag_i_lsst',\n",
" no_prior=False)\n",
"rerun = BPZ_lite.make_stage(name=\"rerun_bpz\", **rerun_dict, \n",
"rerun = BPZliteEstimator.make_stage(name=\"rerun_bpz\", **rerun_dict, \n",
" model=run_bpz_train.get_handle('model'))"
]
},
Expand Down
16 changes: 8 additions & 8 deletions examples/BPZ_lite_with_custom_SEDs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "85906442-6d29-407a-8d80-4108d45af015",
"metadata": {},
"source": [
"# Running BPZ_lite with a custom set of SEDs\n",
"# Running BPZliteEstimator with a custom set of SEDs\n",
"authors: Sam Schmidt<br>\n",
"Last successfully run: Apr 14, 2023<br>\n",
"\n",
Expand Down Expand Up @@ -77,7 +77,7 @@
"import desc_bpz\n",
"from rail.core.data import TableHandle\n",
"from rail.core.stage import RailStage\n",
"from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite"
"from rail.estimation.algos.bpz_lite import BPZliteInformer, BPZliteEstimator"
]
},
{
Expand Down Expand Up @@ -125,9 +125,9 @@
"id": "e80d59ce-ca73-4bf3-9f3d-682c4dbdcb3e",
"metadata": {},
"source": [
"# Inform_BPZ_lite: training a custom prior with our new SEDs\n",
"# BPZliteInformer: training a custom prior with our new SEDs\n",
"\n",
"We will run the inform stage just as we did in the main demo notebook; however, we will have to define a few extra configuration parameters in order to tell Inform_BPZ_lite to use our new SEDs. We specify the SED set using the `spectra_file` configuration parameter, which points to an ascii file that contains the names of the SEDs, which must be sorted in the same order as the \"broad type array\" (usually done in ascending rest-frame \"blueness\", that is Elliptical red galaxies first, then increasingly blue galaxies). In this case, the tar file that we copied to the SED directory contained this file, named `baddc2templates.list`. As before, we need a \"best fit type\" for each of the galaxies in our training set. And, as before, this has been computed separately (computing best type within rail_bpz will be added in the future). The best fit broad types are available in a dictionary stored in the file `test_dc2_train_customtemp_broadttypes.hdf5`, which we will point to with the `type_file` config parameter. This file should already exist in this directory. As stated above, we have two Elliptical, three Spiral, and four Irregular/Starburst seds, so we'll set the `nt_array` configuration parameter to a list `[2, 3, 4]` to specify those numbers of the three broad types."
"We will run the inform stage just as we did in the main demo notebook; however, we will have to define a few extra configuration parameters in order to tell BPZliteInformer to use our new SEDs. We specify the SED set using the `spectra_file` configuration parameter, which points to an ascii file that contains the names of the SEDs, which must be sorted in the same order as the \"broad type array\" (usually done in ascending rest-frame \"blueness\", that is Elliptical red galaxies first, then increasingly blue galaxies). In this case, the tar file that we copied to the SED directory contained this file, named `baddc2templates.list`. As before, we need a \"best fit type\" for each of the galaxies in our training set. And, as before, this has been computed separately (computing best type within rail_bpz will be added in the future). The best fit broad types are available in a dictionary stored in the file `test_dc2_train_customtemp_broadttypes.hdf5`, which we will point to with the `type_file` config parameter. This file should already exist in this directory. As stated above, we have two Elliptical, three Spiral, and four Irregular/Starburst seds, so we'll set the `nt_array` configuration parameter to a list `[2, 3, 4]` to specify those numbers of the three broad types."
]
},
{
Expand All @@ -142,7 +142,7 @@
" type_file=\"test_dc2_train_customtemp_broadttypes.hdf5\",\n",
" prior_band=\"mag_i_lsst\",\n",
" nt_array=[2,3,4])\n",
"run_bpz_train = Inform_BPZ_lite.make_stage(name=\"bpz_custom_sed_prior\", **train_dict)"
"run_bpz_train = BPZliteInformer.make_stage(name=\"bpz_custom_sed_prior\", **train_dict)"
]
},
{
Expand Down Expand Up @@ -273,7 +273,7 @@
"id": "82d0919d-5842-4297-9a03-3838c6be6a2c",
"metadata": {},
"source": [
"Now, let's re-run BPZ_lite using this new prior and see if our results are any different:"
"Now, let's re-run BPZliteEstimator using this new prior and see if our results are any different:"
]
},
{
Expand All @@ -289,7 +289,7 @@
" prior_band='mag_i_lsst',\n",
" data_path=custom_data_path,\n",
" no_prior=False)\n",
"custom_run = BPZ_lite.make_stage(name=\"rerun_bpz\", **custom_dict, \n",
"custom_run = BPZliteEstimator.make_stage(name=\"rerun_bpz\", **custom_dict, \n",
" model=run_bpz_train.get_handle('model'))"
]
},
Expand Down Expand Up @@ -551,7 +551,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 13719c7

Please sign in to comment.