-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updates to hash codes and file names #1553
updates to hash codes and file names #1553
Conversation
Note, these are the new annotation files with the new hash codes, not the original ones. I have the originals but wasn't sure if we also wanted to change those file names. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1553 +/- ##
=======================================
Coverage 99.86% 99.86%
=======================================
Files 131 131
Lines 4333 4333
Branches 594 594
=======================================
Hits 4327 4327
Misses 3 3
Partials 3 3 ☔ View full report in Codecov by Sentry. |
Wow, remote tests, even - nice. Shouldn't there also be some changes to the maintenance script, though? |
(Or if not - hm, and even if so - some documentation of how this works?) |
Yes to documentation. Should I write it in a way that applies to all files with a hash? Maybe a "how to update preexisting resources" or something? |
Well, this needs updating at least: |
be sure to give it a new version name (i.e. ``name_{id}_more_name_v1.txt``). | ||
This is to ensure that the previous file versions and checksums that are hosted | ||
remotely remain valid when new releases are made. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sufficient and what you had in mind @petrelharp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. Can you add, let's see:
- in what files do you need to put the new name
- something more prescriptive about how to create the new name (your example is not following how you created the new names for this round - and, they are .tar.gz, not .txt, right?)
- a note that in the cache the files are renamed to a standard thing
- something saying why you have to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates! Better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see what you mean about naming -- I didn't mean to add the _v1
before the exons
and CDS
. Would it be better to change the files names to match what I describe here? That seems a bit cleaner than having the version number in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead with updating names to match what I described in docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - that made more sense to me, but I assumed you had a good reason for doing it the other way.
docs/development.rst
Outdated
@@ -1337,7 +1337,7 @@ see `Getting set up to add a new species`_): | |||
long_description="FILL_ME", | |||
url=("https://stdpopsim.s3-us-west-2.amazonaws.com/genetic_maps/dir/filename"), | |||
sha256="FILL_ME", | |||
file_pattern="name_{id}_more_name.txt", | |||
file_pattern="name_{id}_more_name.tar.gz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - up above it says that {id}
should be the ID of a given chromosome, but we don't .tar.gz
up individual chromosomes - all the chromosomes are tar'ed up together? This seems inconsistent?
docs/development.rst
Outdated
when an existing resource file (such as a genetic map or annotation) | ||
is updated and replaces the previous version, | ||
be sure to update the name of the file pattern with a version number | ||
(i.e. ``name_{id}_more_name_v1.tar.gz``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confused - either {id}
should not go here (because as above it refers to an individual chromosome, and we don't rename the individual chromosome files?) or else this is the wrong thing that needs to be changed (ie we don't change the file_pattern
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is wrong. Note below in your PR; you are not changing file_pattern
; you're changing like intervals_url
. Here you should be documenting/explaining the very process you're doing in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry, thanks for looking carefully!
tests/test_ensembl.py
Outdated
@@ -4,4 +4,4 @@ | |||
# Make sure we don't update the release without realising it. | |||
def test_version(): | |||
release = stdpopsim.catalog.ensembl_info.release | |||
assert release == 103 | |||
assert release == 111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, this thing. See #1521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so best to just let this test fail until the solution is worked out? I tried for follow the threads and it seems like the PR #1536 is still open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you should definitely not change this until this is actually fixed. (However, that'd be a great thing to fix next...)
stdpopsim/catalog/ensembl_info.py
Outdated
@@ -1,2 +1,2 @@ | |||
# File autogenerated from Ensembl REST API. Do not edit. | |||
release = 103 | |||
release = 111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't update this
I've edited the docs - please read and see if it makes sense, @silastittes . And, I verified the only failing tests have to do with the ensembl build. So, I think we can merge this. But: don't we need to do something to the maintenance script? I don't think that the annotation script has to automatically do the version numbers, but it should at least remind people to do this? Or something? I have not digested that side of things - I'm hoping you have the big picture here, @silastittes. |
These updates look good to me! My current understanding of the problem did not require any changes to the maintenance script. I used @andrewkern's local |
I made some adjustments to the maintenance script. Please see what you think. |
maintenance/annotation_maint.py
Outdated
logger.info( | ||
"ALERT: need to rename the files (replace 'vX' with " | ||
"appropriate version number to not clobber existing files " | ||
"before upload." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info( | |
"ALERT: need to rename the files (replace 'vX' with " | |
"appropriate version number to not clobber existing files " | |
"before upload." | |
) | |
logger.warning( | |
"ALERT: need to rename the files (replace 'vX' with " | |
"appropriate version number to not clobber existing files " | |
"before upload." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran locally, found it easy to ignore. Worth making a warning or some other more notable flag? Not sure what best practices dictates here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the filename itself has _vX
in it, so they should notice that? Hm, maybe instead it should be _vCHANGEME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Could you run the maint script again locally to check these changes work? |
Seems to be working to me!
|
Added
_v1
to annotation files on aws names as discussed in #1551.Updated hash codes and files names in the respective
annotation.py
files. Tests passed locally — remote tests too, woo!