Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC for HolsteinFriesian_1B16 (BosTau) #1598

Open
gregorgorjanc opened this issue Dec 12, 2024 · 11 comments
Open

QC for HolsteinFriesian_1B16 (BosTau) #1598

gregorgorjanc opened this issue Dec 12, 2024 · 11 comments
Labels
Model QC Quality control process for model addition

Comments

@gregorgorjanc
Copy link
Contributor

gregorgorjanc commented Dec 12, 2024

PR for new model: this is for four separate BosTau models: HolsteinFriesian_1B16, Fleckvieh_1B16, Jersey_1B16, and Angus_1B16, all from the same source (see paper below), as implemented in #1593

Original paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005877

Population indices: each model has only one population, that changes with time

Potential issues that might lead to differences between the model implementations:
I contacted the lead author (Simon Boitard), who shared Ne values from his published plots (so I didn't need to eyeball them from the plot). Note that the plots reported fewer values, so this implementation goes further into the past than shown in plots. The past for the 4 population seems to "merge" onto the similar values, as we would expect - these populations all share common ancestry back in time, but we keep the models separate as they have been inferred separately.

I am pasting the shared files below in discussion for the purpose of QC - will paste one population per post to avoid confusion.

QC'er requests: {tag potential developers to QC the model} TODO: any volunteers?

@gregorgorjanc gregorgorjanc added the Model QC Quality control process for model addition label Dec 12, 2024
@petrelharp
Copy link
Contributor

This looks good - it makes sense to deal with all these at once. Can you put up the personal communication somehow? At least, enough that the QC'er can use the same values? For instance: paste it into the issue above?

@gregorgorjanc
Copy link
Contributor Author

Here are Ne&time values for HolsteinFriesian_1B16
Ne_HOL.txt

@gregorgorjanc
Copy link
Contributor Author

Here are Ne&time values for Fleckvieh_1B16
Ne_FLE.txt

@gregorgorjanc
Copy link
Contributor Author

Here are Ne&time values for Jersey_1B16
Ne_JER.txt

@gregorgorjanc
Copy link
Contributor Author

Here are Ne&time values for Angus_1B16
Ne_ANG.txt

@gregorgorjanc
Copy link
Contributor Author

For the QC volunteer, above 4 posts have a file with Ne & time values, while other model parameters (mut rate, rec rate, generation interval) are pulled from various pages in the original publication - I have noted these pages in the description of each model - for example, see

a generation interval of 5 years (page 10),
.

@hannesbecher
Copy link

I have checked the Ne values and times against the TXT files posted above. Have also checked the other parameters against the paper. All looking correct to me!

@hannesbecher
Copy link

When running the unit tests on my fork, I got various warnings. This is before writing any code myself. Some of the warnings are related to the models added here. Particularly the warnings about different model-specific recombination rates and contig-specific ones. It is not obvious to me what causes these. Any suggestions, @petrelharp or somebody else?
Thx

[gw0] darwin -- Python 3.11.5 /Users/hbecher/git_repos/stdpopsim_env/bin/python

    def setup_module():
        destination = pathlib.Path("_test_cache/tarballs")
        for genetic_map in stdpopsim.all_genetic_maps():
            key = genetic_map.id
            local_file = destination / (key + ".tar.gz")
            if not local_file.exists():
                cache_dir = local_file.parent
                cache_dir.mkdir(exist_ok=True, parents=True)
                print("Downloading", genetic_map.url)
                utils.download(genetic_map.url, local_file)
            # This assertion could fail if we update a file on AWS,
            # or a developer creates a new genetic map with the wrong checksum
            # (in the latter case, this should at least be caught during CI tests).
>           assert utils.sha256(local_file) == genetic_map.sha256, (
                f"SHA256 for {local_file} doesn't match the SHA256 for "
                f"{genetic_map.id}. If you didn't add this SHA256 yourself, "
                f"try deleting {local_file} and restarting the tests."
            )
E           AssertionError: SHA256 for _test_cache/tarballs/HapMapII_GRCh37.tar.gz doesn't match the SHA256 for HapMapII_GRCh37. If you didn't add this SHA256 yourself, try deleting _test_cache/tarballs/HapMapII_GRCh37.tar.gz and restarting the tests.
E           assert '29693143d60d...f0c2e552b7919' == '80f22d9e6cb0...0286520f6b68f'
E
E             - 80f22d9e6cb0e497074ed1bc277e765fa9d8e22f21b2f66c3b10286520f6b68f
E             + 29693143d60d5c76f50a857dfd42a02d9775c38009f930174a1f0c2e552b7919

tests/test_genetic_maps.py:42: AssertionError
========================================================================= warnings summary =========================================================================
tests/test_HomSap.py::TestGenome::test_recombination_rates[Y]
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/genetic_maps.py:113: UserWarning: Genetic map not found for chromosome: 'Y' on map: 'HapMapII_GRCh38', substituting a flat map with chromosome recombination rate 0.0
    warnings.warn(

tests/test_HomSap.py::TestGenome::test_recombination_rates[MT]
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/genetic_maps.py:113: UserWarning: Genetic map not found for chromosome: 'MT' on map: 'HapMapII_GRCh38', substituting a flat map with chromosome recombination rate 0.0
    warnings.warn(

tests/test_cli.py::TestWriteBibtex::test_number_of_calls
tests/test_cli.py::TestWriteCitations::test_genetic_map_citations
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/cache.py:170: UserWarning: Error occured renaming map directory. Are multiple processes downloading this map at the same time?
    warnings.warn(

tests/test_models.py::TestBosTauHolsteinFriesian_1M13::test_simulation_runs
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:125: UserWarning: The demographic model has recombination rate 1e-08, but this simulation used the contig's recombination rate 0.0. Patterns of linkage may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_models.py::TestBosTauHolsteinFriesian_1B16::test_simulation_runs
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:125: UserWarning: The demographic model has recombination rate 3.66e-09, but this simulation used the contig's recombination rate 0.0. Patterns of linkage may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_models.py::TestBosTauFleckvieh_1B16::test_simulation_runs
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:125: UserWarning: The demographic model has recombination rate 3.89e-09, but this simulation used the contig's recombination rate 0.0. Patterns of linkage may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_models.py::TestBosTauJersey_1B16::test_simulation_runs
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:125: UserWarning: The demographic model has recombination rate 4.58e-09, but this simulation used the contig's recombination rate 0.0. Patterns of linkage may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_models.py::TestBosTauAngus_1B16::test_simulation_runs
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:125: UserWarning: The demographic model has recombination rate 5e-09, but this simulation used the contig's recombination rate 0.0. Patterns of linkage may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_models.py::TestRecombinationRates::test_recombination_rate_match
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/engines.py:111: UserWarning: The demographic model has mutation rate 9.4e-09, but this simulation used the contig's mutation rate 1.2000000000000005e-08. Diversity levels may be different than expected for this species. For details see documentation at https://popsim-consortium.github.io/stdpopsim-docs/stable/tutorial.html
    warnings.warn(

tests/test_utils.py::TestGammaPdf::test_gamma_pdf[x3-a3-loc3-scale3]
  /Users/hbecher/git_repos/stdpopsim/stdpopsim/utils.py:348: RuntimeWarning: invalid value encountered in divide
    y = (x - loc) / scale

tests/test_utils.py::TestGammaPdf::test_gamma_pdf[x3-a3-loc3-scale3]
  /Users/hbecher/git_repos/stdpopsim_env/lib/python3.11/site-packages/scipy/stats/_distn_infrastructure.py:2027: RuntimeWarning: invalid value encountered in divide
    x = np.asarray((x - loc)/scale, dtype=dtyp)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

@nspope
Copy link
Collaborator

nspope commented Dec 18, 2024

Hmm ... @hannesbecher the first few seem to be an issue with the cached genetic maps. Did you try:

If you didn't add this SHA256 yourself, try deleting _test_cache/tarballs/HapMapII_GRCh37.tar.gz and restarting the tests

After deleting the possibly corrupted file, try running the tests with a single core (python3 -m pytest -n 1 ... ) to prevent multiple downloads from being kicked off at once.

The other warnings seem to be related to @gregorgorjanc's new addition of a model-specific recombination rate-- I'm guessing what's going on here is that these tests are for the mitochondrion, which has recombination rate 0, and this is getting substituted instead of the model-specific recombination rate with a warning. If you run python3 -m pytest -W error -vv ... that should tell us the exact settings that are throwing warnings (and make tests with warnings fail).

@gregorgorjanc
Copy link
Contributor Author

Happy to improve tests - I took tests for model specific mutation rate and adapted them to model specific recombination rate and assumed that those warnings are ok!? @hannesbecher let us know what python3 -m pytest -W error -vv ... returns and we take it from there …

@nspope
Copy link
Collaborator

nspope commented Dec 18, 2024

I think the warnings are fine (if I'm right about what's going on) -- we'd probably just want to catch them with pytest for the unit tests, to reduce noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Model QC Quality control process for model addition
Projects
None yet
Development

No branches or pull requests

4 participants