Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates to hash codes and file names #1553

Merged
merged 11 commits into from
Apr 5, 2024

Conversation

silastittes
Copy link
Contributor

@silastittes silastittes commented Mar 11, 2024

Added _v1 to annotation files on aws names as discussed in #1551.
Updated hash codes and files names in the respective annotation.py files. Tests passed locally — remote tests too, woo!

@silastittes
Copy link
Contributor Author

Note, these are the new annotation files with the new hash codes, not the original ones. I have the originals but wasn't sure if we also wanted to change those file names.

Copy link

codecov bot commented Mar 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.86%. Comparing base (fbb94c2) to head (36e84ea).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1553   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files         131      131           
  Lines        4333     4333           
  Branches      594      594           
=======================================
  Hits         4327     4327           
  Misses          3        3           
  Partials        3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@petrelharp
Copy link
Contributor

Wow, remote tests, even - nice. Shouldn't there also be some changes to the maintenance script, though?

@petrelharp
Copy link
Contributor

(Or if not - hm, and even if so - some documentation of how this works?)

@silastittes
Copy link
Contributor Author

Yes to documentation. Should I write it in a way that applies to all files with a hash? Maybe a "how to update preexisting resources" or something?

@petrelharp
Copy link
Contributor

Well, this needs updating at least:
https://popsim-consortium.github.io/stdpopsim-docs/latest/development.html#adding-a-genetic-map
Gee, suddenly I'm surprised we haven't encountered this problem with genetic maps?

be sure to give it a new version name (i.e. ``name_{id}_more_name_v1.txt``).
This is to ensure that the previous file versions and checksums that are hosted
remotely remain valid when new releases are made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient and what you had in mind @petrelharp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Can you add, let's see:

  1. in what files do you need to put the new name
  2. something more prescriptive about how to create the new name (your example is not following how you created the new names for this round - and, they are .tar.gz, not .txt, right?)
  3. a note that in the cache the files are renamed to a standard thing
  4. something saying why you have to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates! Better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see what you mean about naming -- I didn't mean to add the _v1 before the exons and CDS. Would it be better to change the files names to match what I describe here? That seems a bit cleaner than having the version number in the middle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead with updating names to match what I described in docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - that made more sense to me, but I assumed you had a good reason for doing it the other way.

@@ -1337,7 +1337,7 @@ see `Getting set up to add a new species`_):
long_description="FILL_ME",
url=("https://stdpopsim.s3-us-west-2.amazonaws.com/genetic_maps/dir/filename"),
sha256="FILL_ME",
file_pattern="name_{id}_more_name.txt",
file_pattern="name_{id}_more_name.tar.gz",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - up above it says that {id} should be the ID of a given chromosome, but we don't .tar.gz up individual chromosomes - all the chromosomes are tar'ed up together? This seems inconsistent?

when an existing resource file (such as a genetic map or annotation)
is updated and replaces the previous version,
be sure to update the name of the file pattern with a version number
(i.e. ``name_{id}_more_name_v1.tar.gz``).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confused - either {id} should not go here (because as above it refers to an individual chromosome, and we don't rename the individual chromosome files?) or else this is the wrong thing that needs to be changed (ie we don't change the file_pattern)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is wrong. Note below in your PR; you are not changing file_pattern; you're changing like intervals_url. Here you should be documenting/explaining the very process you're doing in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry, thanks for looking carefully!

@@ -4,4 +4,4 @@
# Make sure we don't update the release without realising it.
def test_version():
release = stdpopsim.catalog.ensembl_info.release
assert release == 103
assert release == 111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, this thing. See #1521

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so best to just let this test fail until the solution is worked out? I tried for follow the threads and it seems like the PR #1536 is still open?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should definitely not change this until this is actually fixed. (However, that'd be a great thing to fix next...)

docs/development.rst Outdated Show resolved Hide resolved
docs/development.rst Outdated Show resolved Hide resolved
@@ -1,2 +1,2 @@
# File autogenerated from Ensembl REST API. Do not edit.
release = 103
release = 111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't update this

@petrelharp
Copy link
Contributor

I've edited the docs - please read and see if it makes sense, @silastittes .

And, I verified the only failing tests have to do with the ensembl build.

So, I think we can merge this. But: don't we need to do something to the maintenance script? I don't think that the annotation script has to automatically do the version numbers, but it should at least remind people to do this? Or something? I have not digested that side of things - I'm hoping you have the big picture here, @silastittes.

@silastittes
Copy link
Contributor Author

silastittes commented Apr 2, 2024

These updates look good to me! My current understanding of the problem did not require any changes to the maintenance script. I used @andrewkern's local aws_push_annotations.py to get the old and new versions of the annotation files onto AWS with the correct names. I think as long as developers update the file names when revisions are submitted, that fixes the issue? Automatic updates in the maintenance script or elsewhere seems hard to implement and error prone, but happy to admit I may not have the whole picture either. @andrewkern, thoughts?

@petrelharp
Copy link
Contributor

I made some adjustments to the maintenance script. Please see what you think.

Comment on lines 134 to 138
logger.info(
"ALERT: need to rename the files (replace 'vX' with "
"appropriate version number to not clobber existing files "
"before upload."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(
"ALERT: need to rename the files (replace 'vX' with "
"appropriate version number to not clobber existing files "
"before upload."
)
logger.warning(
"ALERT: need to rename the files (replace 'vX' with "
"appropriate version number to not clobber existing files "
"before upload."
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran locally, found it easy to ignore. Worth making a warning or some other more notable flag? Not sure what best practices dictates here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the filename itself has _vX in it, so they should notice that? Hm, maybe instead it should be _vCHANGEME?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@petrelharp
Copy link
Contributor

Could you run the maint script again locally to check these changes work?

@silastittes
Copy link
Contributor Author

Seems to be working to me!

ls annotations/*/*vCHANGEME.tar.gz
annotations/AraTha/araport_11_CDS_vCHANGEME.tar.gz
annotations/AraTha/araport_11_exons_vCHANGEME.tar.gz
annotations/DroMel/FlyBase_BDGP6.32.51_CDS_vCHANGEME.tar.gz
annotations/DroMel/FlyBase_BDGP6.32.51_exons_vCHANGEME.tar.gz
annotations/HomSap/ensembl_havana_104_CDS_vCHANGEME.tar.gz
annotations/HomSap/ensembl_havana_104_exons_vCHANGEME.tar.gz
annotations/PhoSin/Phocoena_sinus.mPhoSin1.pri.110_CDS_vCHANGEME.tar.gz
annotations/PhoSin/Phocoena_sinus.mPhoSin1.pri.110_exons_vCHANGEME.tar.gz

@petrelharp petrelharp merged commit 6926072 into popsim-consortium:main Apr 5, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants