-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest: Use of data
vs results
directories
#51
Comments
Option 3 Have the first step in |
Yup, this is possible in the workflow when the repo has the top level Maybe @jameshadfield and/or @j23414 can chime in here on what they expect the directories to be? |
Thanks for the clarification, I was mistaken in thinking that I mildly lean toward option 1 but even having the difference documented here in an issue is sufficient for me to avoid the mistake in the future.
YES, this. I've run into this when trying to explain where the final files are for the ingest workflow. It was easier having one final folder of final outputs. |
From afar, I've found it mildly confusing that |
Ah, okay so it seems like the confusion comes from mismatch of the use of However, seems like clear documentation is enough and we can just use the existing directory structure. Closing this issue as not planned. If anyone feels strongly enough to change the directory structure, please feel free to reopen for discussion. |
Context
I've seen comments of the use of
data
andresults
directories in the ingest workflow not feeling quite right, so it seems like this should be explicitly discussed hereCurrent set up
Everything in the ingest workflow gets filed under the
data
directory except the final output metadata.tsv and sequences.fasta (with optional Nextclade results).I had originally made this decision to make it easy to say all ingest outputs will be available under
results
(analogous to all phylogenetic outputs being inauspice
). This also made it straightforward to move data from ingest to phylogenetic manually withmv ingest/results/* phylogenetic/data/
.However, I can see this being confusing since the intermediate files are technically "results" of the ingest workflow that feel weird to be under
data
.Possible solutions
Option 1
Adding an explicit
intermediates
directory so that the use ofdata
makes more sense:data
for everything directly fetched from outside sourcesintermediates
for all intermediate files produced by ingestresults
remains for final output files that include metadata.tsv, sequences.fasta, and optional Nextclade output files.Option 2
Adding a
outputs
directory so that the use ofdata
andresults
both get shifted:data
for everything directly fetched from outside sourcesresults
for all intermediate files produced by ingestoutputs
for final output files that include metadata.tsv, sequences.fasta, and optional Nextclade output files.The text was updated successfully, but these errors were encountered: