-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input reads have incorrect file format #206
Comments
I also have an unrelated Novoplasty question, what is the function of the Optional config.txt parameters |
Hi, And why are you filtering the reads? You can just use the complete dataset. Greets, Nicolas |
Insert range doesn't need to be changed, you can make it larger when you use a library that has very fluctuating insert ranges, but that is almost never the case |
Hi Nicolas,
I also have a question about the I've written a script to create a batch file for each individual that provides a new project name for the seed + sample combination, and the other standard information, so that this structure is iterated through the file (until all seed combinations have been included): From the above, it appears that the Store Hash option stores a separate hash for each new project, even if the read data being provided is the same. Is there a way I can store the hash table to use across many projects? I want to compare contig lengths across seeds so it's important to keep the project naming conventions since that's how the ouput fasta files are named. Thanks a lot for your help!! |
Hi, I also have the same question. If I changed the name of the projects, the seeds would not work. |
Insert size auto means that it will automatically calculate the insert size, the range determines how much it can differ from the insert size. No need to change anything there, won't change much. About the store hash, have you read the wiki: You need to run store hash only ones and then you need to use the stored hashes in stead of the reads. It will speed up the first phase by a lot, especially for larger datasets. Why are you using 1000s of seeds, if you have a WGS dataset, one seed should be enough and the seed is only need to initiate the assembly and should be quite flexible |
Not sure what you mean by the seeds won't work... |
There is also a batch function: you can check the wiki https://github.com/ndierckx/NOVOPlasty/wiki/Batch-function It is easy to use and like this you can run many samples with the changes you want per run |
@ndierckx
|
You can use the hash files directly for subsequent runs, because you will know how the hash file is called. |
Hi Nicolas,
I'm using novoplastty to de novo assemble mitochondrial genomes for a dataset of 29 sets of paired end short read data.
For over a third of my samples, I've gotten the following error:
THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!
I've attached an example of some of the reads from one sample below, please let me know if there is / what other information you require. I filtered the raw data for the entire dataset with fastp using default parameters and am giving novoplasty the filtered forward and reverse read files generated by fastp. Thank you for your time, your help and advice is greatly appreciated!
Best,
cebos
Example:
zless Microrhombophryne_Ca39_ZCMV-12404_L001_R1.out.fastq.gz
@J00138:141:HN23TBBXX:5:1101:23520:1068 2:N:0:ATAGCGAC+ATTACTCG
CCCTGAATGTCTACGTGGCTCTTTGTTACTATAAACTTGATTACTATGATGTGTCACAGGAAGTTCTTGCAGTATATTTGCAACAGGTTCCTGACAGTACGATTGCTCTTAATTTGAAGGCCTGCAATCATTTTCGTCTTTACAATGGGAA
+
AAFFFJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJ--AFJFJJ7JF-F-FAJFFAFFJF7AJFJJFJ7-AFFJJJJJJJJAJJFJJJJJJJJJJJJJFFJJJJFJJJJJJJJJJFJJJJJJJJJJAJFAAAJJFJJJ7AJJJJJ<7
@J00138:141:HN23TBBXX:5:1101:30076:1068 2:N:0:ATAGCGAG+ATTACTCG
GCCAACAAAAGGTATCGCCTTATTTCTTCACTTTTCTATTGAATTCAATGGCCAACACGGTACAACACATCACTGCTACATATCGAATAGATAGCTTGGCCGTAGGCCTGTGTGTTTGGGGAAGGGCTGATCAGAACCCATCGGGATAGCT
+
A<AFFJJJJJJJJJJJJFJJJJ-F7FFJJJJJF7J7FJF-<FJ<F7JJFA--7--AFF7-AFJJFFJ<7JFJJF7FFJJJFJA<FJJFFFJJJJJJJFJFJ7JF<FFFAFJ--AFJAFFFJJJJJJFJAF-F7FFJJJA7-))7-FFF<F<
@J00138:141:HN23TBBXX:5:1101:18873:1103 2:N:0:ATAGCGAC+ATTACTCG
TGGATACTGGAGAAGATTCGAGTGGTAGATTCTATTCAGAACCTTGGAGATGATCTCACTGCAGTCATGTCAATTCAGAGAAAACTCTGTGGCATTGAGAAAGATCTTGGTGCCATTGAGTCTAAACTTGTAAGTCTACAAGAAGAGGCAA
+
AAAFFJJJJJJFFJJJJJJJJJJJJJFJFFJJJJJJJJJJJFJJJJJJFJFJJJJAAJFFJJAJ7FJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJJ-77AJJFJJFFJJJJJJJJJJJJFJJJFFJJFF<JJJJFF<JJJJJJJJJJJJJ
@J00138:141:HN23TBBXX:5:1101:27965:1103 2:N:0:ATAGCGAG+ATTACTCG
AGGTTGGCAATGTGGAATCAGGCAGAGTGTGCAATGGCAAGCAAGGTT
+
AAFFFJJJJJJJAFJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJFAF
@J00138:141:HN23TBBXX:5:1101:25276:1121 2:N:0:ATAGCGAC+ATTACTCG
The text was updated successfully, but these errors were encountered: