-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to identify merged contigs in output #74
Comments
The merged contigs will have names from the 'reference' assembly. Unchanged
ones will have names of the 'query' assembly contigs.
…On Thu, Jul 18, 2024, 2:08 PM Adam Diehl ***@***.***> wrote:
I am attempting to use quickmerge in an attempt to merge contigs from a
long-read assembly into chromosome-length scaffolds using the hg38
reference genome. Please see the example command below:
merge_wrapper.py
GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta hg38.fa -hco 5
-c 1.5 -l 500000 -v -t 24
The long-read assembly is from (here)[
https://s3.amazonaws.com/1000g-ont/ALIGNMENT_AND_ASSEMBLY_DATA/FIRST_100/NAPU_PIPELINE/HG38/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta
<https://urldefense.com/v3/__https://s3.amazonaws.com/1000g-ont/ALIGNMENT_AND_ASSEMBLY_DATA/FIRST_100/NAPU_PIPELINE/HG38/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OulPU3IsW7KYoQ9kJ-hHD1xwvGCXqkrb1HH4SFWUps_4VxtxZY__3THKzB6R02T2zQqhO5FokVw9G_hf5WfOHsf5$>]
and hg38.fa is straight from UCSC.
Looking at the results, I am unable to tell which contigs in the output
result from merging operations, so cannot compare to input
contigs/chromosomes to ensure the output is correct. How do I identify
merged contigs in the output fasta, and how do I tell what editing
operations were done to generate the merged contigs? My goal here is to end
up with the same number of contigs as hg38, excepting unmapped contigs,
which appear to be written to the output unchanged. (Can you verify this is
the case?)
Thank you in advance for your help!
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/mahulchak/quickmerge/issues/74__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OulPU3IsW7KYoQ9kJ-hHD1xwvGCXqkrb1HH4SFWUps_4VxtxZY__3THKzB6R02T2zQqhO5FokVw9G_hf5U0oGwbS$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABZQH2CTYQJU4QQCUGTCFCDZNAHCXAVCNFSM6AAAAABLDIOUISVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTOMJYGQ4TAMY__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OulPU3IsW7KYoQ9kJ-hHD1xwvGCXqkrb1HH4SFWUps_4VxtxZY__3THKzB6R02T2zQqhO5FokVw9G_hf5YX2N8ag$>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you for the fast response! I took another look at the output and am finding no evidence of merging. See the following experiment:
(where hg38.chr1.fa is simply the chr1 record from the full hg38 assembly, extracted into its own file) So it seems chromosome 1 (plus all the contigs from the query assembly) are present in the output, but there is no evidence that chr1 was used to merge anything. It seems like quickmerge is simply passing all the contigs from both files through unchanged. I have tried setting the -hco, -c, and -l params unrealistically low to see if this affects merging but there was no change to the results. Why would this happen? To make sure I understand correctly, the query assembly is the first argument to merge_wrapper.py and the second argument is the reference. Is that right? |
I am attempting to use quickmerge in an attempt to merge contigs from a long-read assembly into chromosome-length scaffolds using the hg38 reference genome. Please see the example command below:
merge_wrapper.py GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta hg38.fa -hco 5 -c 1.5 -l 500000 -v -t 24
The long-read assembly is from (here)[https://s3.amazonaws.com/1000g-ont/ALIGNMENT_AND_ASSEMBLY_DATA/FIRST_100/NAPU_PIPELINE/HG38/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta] and hg38.fa is straight from UCSC.
Looking at the results, I am unable to tell which contigs in the output result from merging operations, so cannot compare to input contigs/chromosomes to ensure the output is correct. How do I identify merged contigs in the output fasta, and how do I tell what editing operations were done to generate the merged contigs? My goal here is to end up with the same number of contigs as hg38, excepting unmapped contigs, which appear to be written to the output unchanged. (Can you verify this is the case?)
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: