Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about argument -l and -ml #62

Open
neptuneyt opened this issue Jan 24, 2021 · 3 comments
Open

Confusion about argument -l and -ml #62

neptuneyt opened this issue Jan 24, 2021 · 3 comments

Comments

@neptuneyt
Copy link

Dear quickmerge teams,
I have installed the latest quickmerge which could support mumer 4,but I was confused by the argument -l and -ml
according the manual,

-l LENGTH_CUTOFF, --length_cutoff LENGTH_CUTOFF,which means minimum seed contig length to be merged (default=0)
-ml MERGING_LENGTH_CUTOFF, --merging_length_cutoff MERGING_LENGTH_CUTOFF,which means setting the merging length cutoff necessary for use in quickmerge (default 5000)
Does it means the same as described in the picture below?
image
Thanks a lot!

@mahulchak
Copy link
Owner

mahulchak commented Jan 24, 2021 via email

@neptuneyt
Copy link
Author

Thanks your kindly reply in time, but I still failed to understand the -ml,I have test the -l and -ml , results as below:
image
from the first table, I test -l from 10-5000,but the merged sequence could not improve compare to raw two contig sets;
the second table, I test -ml from 1-5000, the merged sequence quality were affected by its' length. so how can I understand such result?
Looking forward your reply, thanks a lot.

@neptuneyt
Copy link
Author

Sorry for disturb you,
I have done another pure test:
I extracted 10k contigs from two assembly, respectively.Then command:

nohup merge_wrapper.py    -l 10 -ml 10 -t 50 -v C1_10k.fa C2_10k.fa  &>log&

from param_summary_out.txt,I could count 205 pair of overlaped contig,so it was just account 0.0205 (205/10000) overlaped rate.

REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG
1 Cluster2_k141_213238 Cluster1_k141_1058634 2107 2631 53324809 R 0 523 0.263343 1986 3787
... Cluster2_k141_15754305 Cluster1_k141_1064426 57 3426 4010642 L 0 3368 4.82521 698 154
205 Cluster2_k141_11926444 Cluster1_k141_109005 2257 3942 21993893 L 1 1694 0.6776 2500 0

my raw two 10k contigs total size was 113M(113248645 bp), but the merged_out.fasta total size was 59M(59258534 bp) , it does not make sense given the low overlaped rate(2%).
so I checked one of overlaped pairs, the overlap relationship as below:

REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG
Cluster2_k141_6817205 Cluster1_k141_1166759 1020 4705 1 3684 R 03683 3683 1 257

image

And I found a sequence named Cluster2_k141_6817205 in the merged_out.fasta,it seems the merged sequence names the largest one of two overlaped contigs, and it was correctly! So strangely!

And then, I check the merged_out.fasta ID,

Source Numbers
from Cluster1 9941
from Cluster2 58

merged_out.fasta 9941 Cluster1 source sequence, it seems all merged contig length are same as raw length

Source contig_length
Raw.tsv:Cluster1_k141_1025 3698
Merge.tsv:Cluster1_k141_1025 3698
Raw.tsv:Cluster1_k141_1026 3852
Merge.tsv:Cluster1_k141_1026 3852
Raw.tsv:Cluster1_k141_1040 8359
Merge.tsv:Cluster1_k141_1040 8359
Raw.tsv:Cluster1_k141_1057577 8707
Merge.tsv:Cluster1_k141_1057577 8707
Raw.tsv:Cluster1_k141_1057886 3968
Merge.tsv:Cluster1_k141_1057886 3968
Raw.tsv:Cluster1_k141_1057955 3078
Merge.tsv:Cluster1_k141_1057955 3078
Raw.tsv:Cluster1_k141_1058039 3038
Merge.tsv:Cluster1_k141_1058039 3038
Raw.tsv:Cluster1_k141_1058096 4079
Merge.tsv:Cluster1_k141_1058096 4079
Raw.tsv:Cluster1_k141_1058151 3719
Merge.tsv:Cluster1_k141_1058151 3719
Raw.tsv:Cluster1_k141_1058269 3248
Merge.tsv:Cluster1_k141_1058269 3248
Raw.tsv:Cluster1_k141_1058399 7611
Merge.tsv:Cluster1_k141_1058399 7611

merged_out.fasta 58 Cluster2 source sequence, merged contig length are large than raw length

Raw.tsv:Cluster2_k141_10429771 4993
Merge.tsv:Cluster2_k141_10429771 5069
Raw.tsv:Cluster2_k141_10436849 10727
Merge.tsv:Cluster2_k141_10436849 12696
Raw.tsv:Cluster2_k141_10643446 5615
Merge.tsv:Cluster2_k141_10643446 7713
Raw.tsv:Cluster2_k141_1067037 6430
Merge.tsv:Cluster2_k141_1067037 6430
Raw.tsv:Cluster2_k141_1067215 11431
Merge.tsv:Cluster2_k141_1067215 20071
Raw.tsv:Cluster2_k141_1067595 11140
Merge.tsv:Cluster2_k141_1067595 11140
Raw.tsv:Cluster2_k141_10859382 4492
Merge.tsv:Cluster2_k141_10859382 4492
Raw.tsv:Cluster2_k141_11711522 6219
Merge.tsv:Cluster2_k141_11711522 7268
Raw.tsv:Cluster2_k141_11713665 3653
Merge.tsv:Cluster2_k141_11713665 5739
Raw.tsv:Cluster2_k141_12137628 6638
Merge.tsv:Cluster2_k141_12137628 7152
Raw.tsv:Cluster2_k141_1279455 28667
Merge.tsv:Cluster2_k141_1279455 29290

So, how can I explain above result, In my opinion, does quickmerge's final merged genome are output the extend two overlapped contigs pair and plus the non-overlapped contigs in each sets?
Looking forward your reply, Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants