Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auspice testing data (get-data) #1558

Open
jameshadfield opened this issue Oct 4, 2022 · 2 comments
Open

Auspice testing data (get-data) #1558

jameshadfield opened this issue Oct 4, 2022 · 2 comments

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Oct 4, 2022

Background

This repo doesn't contain any datasets beyond some very minimal examples to explain the dataset format. Instead we rely on the get-data script which downloads a slew of (nextstrain core) datasets. In times gone by, this was an accurate listing of all of our core datasets (and other sources didn't exist -- groups, community etc). This script is often run manually (e.g. npm run get-data) so you can have some data to play with, and heroku runs this during setup (npm run heroku-postbuild) which results in usable datasets in review apps.

Shortcomings

A lot of auspice's functionality cannot be tested with the data here and two PRs in the last couple of weeks have highlighted this: #1557 and #1552. This means the heroku-review apps are not useful, and people have to manually checkout the auspice branch and obtain an appropriate dataset for testing.

Proposal

We should make the get-data script obtain a useful set of testing datasets, preferably using timestamped datasets so we can ensure reproducibility. For PRs which need additional datasets to test, these should be added to the get-data script as part of the PR.

@victorlin
Copy link
Member

Note that the PR review apps allow us to test on live nextstrain.org data

@jameshadfield
Copy link
Member Author

Note that the PR review apps allow us to test on live nextstrain.org data

That's true (and a big positive) but it doesn't help with local development and, some time in the future, actual tests within auspice.

I'll take a crack at this issue today - it seems simple on the surface!

jameshadfield added a commit that referenced this issue Aug 9, 2023
The test dataset covers a large range of genome complexity which is much
better than relying on datasets (I don't have any to hand with negative
CDSs wrapping around the origin, or -ve segmented CDSs, for example).
The addition of this starts to address #1558
<#1558>

3 automated tests are passing, with the rest failing as expected.
Subsequent commits will add support for segmented CDSs and -ve strand
CDSs. Full test results:

  ✓ Chromosome coordinates
  ✓ +ve strand CDS with a single segment
  ✕ -ve strand CDS with a single segment
  ✓ +ve strand CDS which wraps the origin
  ✕ -ve strand CDS which wraps the origin
  ✕ +ve strand CDS with multiple (non-wrapping) segments
  ✕ -ve strand CDS with multiple (non-wrapping) segments
jameshadfield added a commit that referenced this issue Aug 10, 2023
The test dataset covers a large range of genome complexity which is much
better than relying on datasets (I don't have any to hand with negative
CDSs wrapping around the origin, or -ve segmented CDSs, for example).
The addition of this starts to address #1558
<#1558>

3 automated tests are passing, with the rest failing as expected.
Subsequent commits will add support for segmented CDSs and -ve strand
CDSs. Full test results:

  ✓ Chromosome coordinates
  ✓ +ve strand CDS with a single segment
  ✕ -ve strand CDS with a single segment
  ✓ +ve strand CDS which wraps the origin
  ✕ -ve strand CDS which wraps the origin
  ✕ +ve strand CDS with multiple (non-wrapping) segments
  ✕ -ve strand CDS with multiple (non-wrapping) segments
jameshadfield added a commit that referenced this issue Aug 10, 2023
The test dataset covers a large range of genome complexity which is much
better than relying on datasets (I don't have any to hand with negative
CDSs wrapping around the origin, or -ve segmented CDSs, for example).
The addition of this starts to address #1558
<#1558>

3 automated tests are passing, with the rest failing as expected.
Subsequent commits will add support for segmented CDSs and -ve strand
CDSs. Full test results:

  ✓ Chromosome coordinates
  ✓ +ve strand CDS with a single segment
  ✕ -ve strand CDS with a single segment
  ✓ +ve strand CDS which wraps the origin
  ✕ -ve strand CDS which wraps the origin
  ✕ +ve strand CDS with multiple (non-wrapping) segments
  ✕ -ve strand CDS with multiple (non-wrapping) segments
jameshadfield added a commit that referenced this issue Aug 11, 2023
The test dataset covers a large range of genome complexity which is much
better than relying on datasets (I don't have any to hand with negative
CDSs wrapping around the origin, or -ve segmented CDSs, for example).
The addition of this starts to address #1558
<#1558>

3 automated tests are passing, with the rest failing as expected.
Subsequent commits will add support for segmented CDSs and -ve strand
CDSs. Full test results:

  ✓ Chromosome coordinates
  ✓ +ve strand CDS with a single segment
  ✕ -ve strand CDS with a single segment
  ✓ +ve strand CDS which wraps the origin
  ✕ -ve strand CDS which wraps the origin
  ✕ +ve strand CDS with multiple (non-wrapping) segments
  ✕ -ve strand CDS with multiple (non-wrapping) segments
jameshadfield added a commit that referenced this issue Aug 17, 2023
The test dataset covers a large range of genome complexity which is much
better than relying on datasets (I don't have any to hand with negative
CDSs wrapping around the origin, or -ve segmented CDSs, for example).
The addition of this starts to address #1558
<#1558>

3 automated tests are passing, with the rest failing as expected.
Subsequent commits will add support for segmented CDSs and -ve strand
CDSs. Full test results:

  ✓ Chromosome coordinates
  ✓ +ve strand CDS with a single segment
  ✕ -ve strand CDS with a single segment
  ✓ +ve strand CDS which wraps the origin
  ✕ -ve strand CDS which wraps the origin
  ✕ +ve strand CDS with multiple (non-wrapping) segments
  ✕ -ve strand CDS with multiple (non-wrapping) segments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

2 participants