Confusion about argument `-l` and `-ml` #62

neptuneyt · 2021-01-24T08:11:42Z

Dear quickmerge teams,
I have installed the latest quickmerge which could support mumer 4,but I was confused by the argument -l and -ml，
according the manual,

-l LENGTH_CUTOFF, --length_cutoff LENGTH_CUTOFF,which means minimum seed contig length to be merged (default=0)
-ml MERGING_LENGTH_CUTOFF, --merging_length_cutoff MERGING_LENGTH_CUTOFF,which means setting the merging length cutoff necessary for use in quickmerge (default 5000)
Does it means the same as described in the picture below?

Thanks a lot!

The text was updated successfully, but these errors were encountered:

mahulchak · 2021-01-24T20:31:03Z

Hi, -l represents the minimum length of the seed contig. In the figure, the large blue circle is the seed contig and -l would determine its minimum length. In your highlighted example, the orange one is the seed contig and -l would determine its minimum length. The description of -ml is a little off. -ml determines the minimum alignment length that will be included in the merging process. Any alignment lesser than -ml will not be merged. I hope this helps. Let me know if you have any other questions.

…

On Sun, Jan 24, 2021 at 12:12 AM neptuneyt ***@***.***> wrote: Dear quickmerge teams, I have installed the latest quickmerge which could support mumer 4,but I was confused by the argument -l and -ml， according the manual, -l LENGTH_CUTOFF, --length_cutoff LENGTH_CUTOFF,which means minimum seed contig length to be merged (default=0) -ml MERGING_LENGTH_CUTOFF, --merging_length_cutoff MERGING_LENGTH_CUTOFF,which means setting the merging length cutoff necessary for use in quickmerge (default 5000) Does it means the same as described in the picture below? [image: image] <https://user-images.githubusercontent.com/39893798/105624681-a00bd980-5e5e-11eb-951b-cf8ba71b3926.png> Thanks a lot! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#62>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZQH2B6S2AORBSD27SQDODS3PI4TANCNFSM4WQK7MVQ> .

-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

neptuneyt · 2021-01-25T00:56:26Z

Thanks your kindly reply in time, but I still failed to understand the -ml,I have test the -l and -ml , results as below:

from the first table, I test -l from 10-5000,but the merged sequence could not improve compare to raw two contig sets;
the second table, I test -ml from 1-5000, the merged sequence quality were affected by its' length. so how can I understand such result?
Looking forward your reply, thanks a lot.

neptuneyt · 2021-01-25T09:15:33Z

Sorry for disturb you,
I have done another pure test：
I extracted 10k contigs from two assembly, respectively.Then command:

nohup merge_wrapper.py    -l 10 -ml 10 -t 50 -v C1_10k.fa C2_10k.fa  &>log&

from param_summary_out.txt,I could count 205 pair of overlaped contig，so it was just account 0.0205 (205/10000) overlaped rate.

REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG
1 Cluster2_k141_213238 Cluster1_k141_1058634 2107 2631 53324809 R 0 523 0.263343 1986 3787
... Cluster2_k141_15754305 Cluster1_k141_1064426 57 3426 4010642 L 0 3368 4.82521 698 154
205 Cluster2_k141_11926444 Cluster1_k141_109005 2257 3942 21993893 L 1 1694 0.6776 2500 0

my raw two 10k contigs total size was 113M(113248645 bp), but the merged_out.fasta total size was 59M(59258534 bp) , it does not make sense given the low overlaped rate(2%).
so I checked one of overlaped pairs, the overlap relationship as below:

REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG
Cluster2_k141_6817205 Cluster1_k141_1166759 1020 4705 1 3684 R 03683 3683 1 257

And I found a sequence named Cluster2_k141_6817205 in the merged_out.fasta，it seems the merged sequence names the largest one of two overlaped contigs, and it was correctly! So strangely!

And then, I check the merged_out.fasta ID，

Source	Numbers
from Cluster1	9941
from Cluster2	58

merged_out.fasta 9941 Cluster1 source sequence, it seems all merged contig length are same as raw length

Source contig_length
Raw.tsv:Cluster1_k141_1025 3698
Merge.tsv:Cluster1_k141_1025 3698
Raw.tsv:Cluster1_k141_1026 3852
Merge.tsv:Cluster1_k141_1026 3852
Raw.tsv:Cluster1_k141_1040 8359
Merge.tsv:Cluster1_k141_1040 8359
Raw.tsv:Cluster1_k141_1057577 8707
Merge.tsv:Cluster1_k141_1057577 8707
Raw.tsv:Cluster1_k141_1057886 3968
Merge.tsv:Cluster1_k141_1057886 3968
Raw.tsv:Cluster1_k141_1057955 3078
Merge.tsv:Cluster1_k141_1057955 3078
Raw.tsv:Cluster1_k141_1058039 3038
Merge.tsv:Cluster1_k141_1058039 3038
Raw.tsv:Cluster1_k141_1058096 4079
Merge.tsv:Cluster1_k141_1058096 4079
Raw.tsv:Cluster1_k141_1058151 3719
Merge.tsv:Cluster1_k141_1058151 3719
Raw.tsv:Cluster1_k141_1058269 3248
Merge.tsv:Cluster1_k141_1058269 3248
Raw.tsv:Cluster1_k141_1058399 7611
Merge.tsv:Cluster1_k141_1058399 7611

merged_out.fasta 58 Cluster2 source sequence, merged contig length are large than raw length

Raw.tsv:Cluster2_k141_10429771 4993
Merge.tsv:Cluster2_k141_10429771 5069
Raw.tsv:Cluster2_k141_10436849 10727
Merge.tsv:Cluster2_k141_10436849 12696
Raw.tsv:Cluster2_k141_10643446 5615
Merge.tsv:Cluster2_k141_10643446 7713
Raw.tsv:Cluster2_k141_1067037 6430
Merge.tsv:Cluster2_k141_1067037 6430
Raw.tsv:Cluster2_k141_1067215 11431
Merge.tsv:Cluster2_k141_1067215 20071
Raw.tsv:Cluster2_k141_1067595 11140
Merge.tsv:Cluster2_k141_1067595 11140
Raw.tsv:Cluster2_k141_10859382 4492
Merge.tsv:Cluster2_k141_10859382 4492
Raw.tsv:Cluster2_k141_11711522 6219
Merge.tsv:Cluster2_k141_11711522 7268
Raw.tsv:Cluster2_k141_11713665 3653
Merge.tsv:Cluster2_k141_11713665 5739
Raw.tsv:Cluster2_k141_12137628 6638
Merge.tsv:Cluster2_k141_12137628 7152
Raw.tsv:Cluster2_k141_1279455 28667
Merge.tsv:Cluster2_k141_1279455 29290

So, how can I explain above result, In my opinion, does quickmerge's final merged genome are output the extend two overlapped contigs pair and plus the non-overlapped contigs in each sets?
Looking forward your reply, Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about argument `-l` and `-ml` #62

Confusion about argument `-l` and `-ml` #62

neptuneyt commented Jan 24, 2021

mahulchak commented Jan 24, 2021 via email

neptuneyt commented Jan 25, 2021

neptuneyt commented Jan 25, 2021

Confusion about argument -l and -ml #62

Confusion about argument -l and -ml #62

Comments

neptuneyt commented Jan 24, 2021

mahulchak commented Jan 24, 2021 via email

neptuneyt commented Jan 25, 2021

neptuneyt commented Jan 25, 2021

Confusion about argument `-l` and `-ml` #62

Confusion about argument `-l` and `-ml` #62