-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why multiple .bam files(>=7) cannot work #411
Comments
Hi, For the first code could you show me what list.files(pattern="GV") outputs, is it possible you have another file in your working directory with GV in the name? The second code is an expected error because I assume there is no file called .bam The third looks fine to me, but unfortunately the error is missing in your post. Would you be able to add that in so I might see what could be happening? Kind Regards, |
Thank you very much for receiving your reply. I have sent you the error report of the third code. Please see if you have received it.
…---Original---
From: "Andre ***@***.***>
Date: Thu, Jan 11, 2024 09:21 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work(Issue #411)
Hi,
For the first code could you show me what list.files(pattern="GV") outputs, is it possible you have another file in your working directory with GV in the name?
The second code is an expected error because I assume there is no file called .bam
The third looks fine to me, but unfortunately the error is missing in your post. Would you be able to add that in so I might see what could be happening?
Kind Regards,
Andre Sim
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, Unfortunately I cannot see it. Was it in your last comment or did you send it another way. Is it possible some syntax might be formatting it in a way that makes in invisible on github? Perhaps a screenshot of the error might work? |
I entered 18 bam files in this code.
andThe following is my code and error report.
> se1 <- bambu(reads = c("GV10O_10.align.bam","GV11O_11.align.bam","GV12O_12.align.bam","GV13O_13.align.bam","GV14O_14.align.bam","GV15O_15.aliThere are more than 10 samples, read class files"GV20O_20.align.bam","GV22O_22.align.bam","GV23O_23.align.bam"), annotations = bambuAnnotations will be temporarily saved to /tmp/RtmpZfGwaw for more efficient processing --- Start generating read class files ---
|====== | 9%
Error: BiocParallel errors
1 remote errors, element index: 1
10 unevaluated and other errors
first remote error:
Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'bfcnew': Failed to collect lazy table.
Caused by error in `db_collect()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• ..1 = Inf
ℹ Did you misspell an argument name?
In addition: Warning messages:
1: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
2: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
3: There was 1 warning in `mutate()`.
ℹ In argument: `annotatedJunction = (!is.na(match(uniqueJunctions, uniqueAnnotatedIntrons)))`.
Caused by warning in `.merge_two_Seqinfo_objects()`:
! The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
4: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
5: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
6: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
7: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
8: In .merge_two_Seqinfo_objects(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
These are ten file inputs I tried.And I succeeded.One more file will report the same error as the above one.
> se1 <- bambu(reads = c("GV10O_10.align.bam","GV11O_11.align.bam","GV12O_12.align.bam","GV13O_13.align.bam","GV14O_14.align.bam","GV15O_15.ali--- Start generating read class files ---n.bam","GV20O_20.align.bam","GV22O_22.align.bam"), annotations = bambuAnnotations, genome = fa.file) |======================================================================| 100%
Detected 40 warnings across the samples during read class construction. Access warnings with metadata(bambuOutput)$warnings
--- Start extending annotations ---
WARNING - Less than 50 TRUE or FALSE read classes for NDR precision stabilization.
NDR will be approximated as: (1 - Transcript Model Prediction Score)
A high NDR threshold is being recommended by Bambu indicating high levels of novel transcripts, limiting the performance of the trained model
We recommend training a new model on similiar but well annotated dataset if available (https://github.com/GoekeLab/bambu/tree/master#Training-a-model-on-another-speciesdataset-and-applying-it), or alternatively running Bambu with opt.discovery=list(fitReadClassModel=FALSE)
Using a novel discovery rate (NDR) of: 1
--- Start isoform quantification ---
--- Finished running Bambu ---
There were 50 or more warnings (use warnings() to see the first 50)
Looking forward to your reply. I really appreciate it.
…---Original---
From: "Andre ***@***.***>
Date: Thu, Jan 11, 2024 10:16 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work(Issue #411)
Hi,
Unfortunately I cannot see it. Was it in your last comment or did you send it another way. Is it possible some syntax might be formatting it in a way that makes in invisible on github? Perhaps a screenshot of the error might work?
Below is what I see (the "..." in the final comment makes the email you replied to appear)
image.png (view on web)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, I have not seen this issue before so I am not exactly sure how to solve it, however here are somethings to try that might circumvent the problem, or provide me a clue on if/where a bug might be in the code.
I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1". I hope to solve this for you and future users so please let me know what from the above works and doesn't work. |
I really appreciate it. Your solution provided me with a new idea. After I try, I will give you a reply whether it is successful or not. Thank you very much for your concern about this issue.
…---Original---
From: "Andre ***@***.***>
Date: Thu, Jan 11, 2024 10:56 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work(Issue #411)
Hi,
I have not seen this issue before so I am not exactly sure how to solve it, however here are somethings to try that might circumvent the problem, or provide me a clue on if/where a bug might be in the code.
add rcOutDir = "/path/somewhere/" to bambu()
The only difference when running with 10 files in the code is that it outputs to a tmp directory. By setting a manual output directory that might skip that issue
Try a different combination of 11 rc files and see if that works. One issue cause be that 1 of the bam files from the 18 is not compatible with bambu for some reason. Use the 10 that worked + 1 that was missing.
Run each of the bam files individually for read class construction (set quant = FALSE, discovery = FALSE). Then combine them for the subsequent steps.
i.e (fill in the ... with the rest of the files)
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE) se2 <- bambu(reads = "GV11O_11.align.bam"), annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE) ... se_final = bambu(reads = c(se1, se2, se3, ....), annotations = bambuAnnotations, genome = fa.file)
I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1".
For the run with <10 bam files that did work. Could you share the output from metadata(se1)$warnings
I hope to solve this for you and future users so please let me know what from the above works and doesn't work.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Thank you for your sincere help.
The last problem was successfully solved using the third point you raised.
But I'm sorry I have a new problem, all the genes in my "counts_gene.txt" file are named after 'bambu', which indirectly caused me to make a mistake when looking for hypervariable genes.
my code:scRNAtest <- FindVariableFeatures(object = scRNAtest, selection.method = "vst", nfeatures = 1500)
error:
Error in `.SelectFeatures()`:
! None of the features provided are present in the feature set
I wonder why all the gene name that came out were full of bambu's name
I look forward to hearing from you.
yours sincerely
?
***@***.***
…------------------ 原始邮件 ------------------
发件人: "GoekeLab/bambu" ***@***.***>;
发送时间: 2024年1月11日(星期四) 上午10:56
***@***.***>;
***@***.******@***.***>;
主题: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work (Issue #411)
Hi,
I have not seen this issue before so I am not exactly sure how to solve it, however here are somethings to try that might circumvent the problem, or provide me a clue on if/where a bug might be in the code.
add rcOutDir = "/path/somewhere/" to bambu()
The only difference when running with 10 files in the code is that it outputs to a tmp directory. By setting a manual output directory that might skip that issue
Try a different combination of 11 rc files and see if that works. One issue cause be that 1 of the bam files from the 18 is not compatible with bambu for some reason. Use the 10 that worked + 1 that was missing.
Run each of the bam files individually for read class construction (set quant = FALSE, discovery = FALSE). Then combine them for the subsequent steps.
i.e (fill in the ... with the rest of the files)
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE) se2 <- bambu(reads = "GV11O_11.align.bam"), annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE) ... se_final = bambu(reads = c(se1, se2, se3, ....), annotations = bambuAnnotations, genome = fa.file)
I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1".
For the run with <10 bam files that did work. Could you share the output from metadata(se1)$warnings
I hope to solve this for you and future users so please let me know what from the above works and doesn't work.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, All the gene counts being named bambu means it has predicted a lot of novel genes. I believe this is related to the warning I flagged above. As in standard runs of Bambu this warning should not appear. "I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1"." Double check that the gtf file you are using matches the fasta file as this is the most common cause, and that the fasta file you are using is the exact same file used for the genome (not transcriptome) alignments. If neither of the above is the case, I will need to look at some of the outputs to try and figure out what is happening.
Kind Regards, |
dear happy New Year!
thank you very much for your help. I have basically solved the problem now.
But I still have one of the following questions.
> cluster3.markers <- FindMarkers(pbmc, ident.1 = 3, ident.2 = c(0,1,2,4), min.pct = 0.25)
> head(cluster3.markers, n = 10)
p_val avg_log2FC pct.1 pct.2 p_val_adj
BambuGene1185 4.761961e-39 8.411806 1 0 8.613436e-35
BambuGene1198 4.761961e-39 9.063921 1 0 8.613436e-35
BambuGene1509 4.761961e-39 6.752086 1 0 8.613436e-35
BambuGene1513 4.761961e-39 7.691474 1 0 8.613436e-35
BambuGene1563 4.761961e-39 8.021621 1 0 8.613436e-35
BambuGene1603 4.761961e-39 10.988708 1 0 8.613436e-35
BambuGene1687 4.761961e-39 6.895461 1 0 8.613436e-35
BambuGene1689 4.761961e-39 7.208463 1 0 8.613436e-35
BambuGene1772 4.761961e-39 9.265445 1 0 8.613436e-35
BambuGene1778 4.761961e-39 8.820515 1 0 8.613436e-35
the top10 genes are all bambugene
I extracted the gene with the highest expression for each cluster, and changed the value to 100, which is top100. I drew a heatmap for them, and only three knowngenes.
> DoHeatmap(pbmc, features = top100$gene) + NoLegend()
Warning message:
In DoHeatmap(pbmc, features = top100$gene) :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: Trpm3, Samd12, Gm35379, Gm32358, Gm32772, Myh14, Rp1, Gm30093, BambuGene131, unstranded.Gene7, Gm14055, BambuGene2284, BambuGene2213, BambuGene2476, BambuGene1128, BambuGene1123, BambuGene2262, BambuGene2837, BambuGene2156, BambuGene1533, BambuGene1316, BambuGene965, BambuGene2931, Gm41929, BambuGene2977, BambuGene2937, BambuGene2882, BambuGene2797, BambuGene2510, BambuGene2480, BambuGene2380, BambuGene2376, BambuGene1983, BambuGene2519, BambuGene2696, BambuGene1041, BambuGene746, BambuGene724, BambuGene2965, BambuGene2951, BambuGene2900, BambuGene2553, BambuGene2274, BambuGene2251, BambuGene1464, BambuGene817, BambuGene744, BambuGene2750, BambuGene2644, BambuGene2630, BambuGene2556, BambuGene2339, BambuGene2253, BambuGene2183, BambuGene2142, BambuGene2138, BambuGene2110, BambuGene1067, Gm31925, BambuGene904, BambuGene810, BambuGene589, BambuGene581, BambuGene44, BambuGene408, BambuGene [... truncated]
Since even top100 has few knowngene, can I increase the value to 300?Does a situation like this not affect the result found?
I feel very confused. Excuse me, is this related to several issues mentioned in your email?
?
***@***.***
…------------------ 原始邮件 ------------------
发件人: "GoekeLab/bambu" ***@***.***>;
发送时间: 2024年1月22日(星期一) 下午5:35
***@***.***>;
***@***.******@***.***>;
主题: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work (Issue #411)
Hi,
All the gene counts being named bambu means it has predicted a lot of novel genes. I believe this is related to the warning I flagged above. As in standard runs of Bambu this warning should not appear.
"I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1"."
Double check that the gtf file you are using matches the fasta file as this is the most common cause, and that the fasta file you are using is the exact same file used for the genome (not transcriptome) alignments.
If neither of the above is the case, I will need to look at some of the outputs to try and figure out what is happening.
So that I can be of the most help please include:
The script you used to run bambu including the prepareAnnotations step
The first 10 lines of the gtf file you are using
The warnings and output from one of these lines
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE, verbose = TRUE)
print(metadata(se1)$warnings)
Bonus would be if you could attach the output of this line.
saveRDS(se1.rds, "outputPath/se1.rds")
Kind Regards,
Andre Sim
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, The formatting of your last comment is a bit hard for me to parse (attached picture). From what I gather you are surprised at the results that all the marker genes you are detecting are novel genes. As I mentioned in some of my earlier comments I think there may be something going wrong with you run as there were some concerning warnings messages which is the likely reason only bambu genes are being identified. For me to be able to help you please attach the following.
Kind Regards, |
hello
I would like to ask why the new transcripts found can still correspond to the annotated genes.What principle led to this result?
Looking forward to your reply sincerely.
Thank you very much.
just like :
transcripts gene
BambuTx1 Tfap2d
…---Original---
From: "Andre ***@***.***>
Date: Mon, Jan 22, 2024 17:35 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work(Issue #411)
Hi,
All the gene counts being named bambu means it has predicted a lot of novel genes. I believe this is related to the warning I flagged above. As in standard runs of Bambu this warning should not appear.
"I also notice some strange warnings in the run that did work which might be a hint. Are you expecting a very low number < 50 known transcripts to be in your bam files? Are you able to check that the chromosome names in your gtf file match that of your fasta file, and that you are using the same genome fasta file as the bambu input that use used to align your reads to. A common issue is a gtf file has "Chr1" as a name and the genome is "1"."
Double check that the gtf file you are using matches the fasta file as this is the most common cause, and that the fasta file you are using is the exact same file used for the genome (not transcriptome) alignments.
If neither of the above is the case, I will need to look at some of the outputs to try and figure out what is happening.
So that I can be of the most help please include:
The script you used to run bambu including the prepareAnnotations step
The first 10 lines of the gtf file you are using
The warnings and output from one of these lines
se1 <- bambu(reads = "GV10O_10.align.bam", annotations = bambuAnnotations, genome = fa.file, quant = FALSE, discovery = FALSE, verbose = TRUE)
print(metadata(se1)$warnings)
Bonus would be if you could attach the output of this line.
saveRDS(se1.rds, "outputPath/se1.rds")
Kind Regards,
Andre Sim
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, |
hello dear author
When I use gtf file as annotation file to convert the transcriptid found by marker into gene,I found that only 1/6 of marker's can be transferred out.I looked at it and found that all they that can be transformed into genes are BambuGenexxx.
I think this is very strange, so I want to ask help from you.
Why is there no transcript ID corresponding to a part of markers in the gtf file?
> table(results)
results
BambuGene1015 BambuGene10238 BambuGene102533 BambuGene102716 BambuGene10300
19 3 4 5 3
BambuGene10325 BambuGene10441 BambuGene10508 BambuGene105185 BambuGene105249
46 48 7 3 3
BambuGene10525 BambuGene105360 BambuGene105439 BambuGene10597 BambuGene10636
9 6 4 43 5
BambuGene10642 BambuGene106557 BambuGene10666 BambuGene10683 BambuGene106897
7 5 4 48 7
BambuGene107288 BambuGene10738 BambuGene10755 BambuGene108099 BambuGene10843
4 5 26 7 3
BambuGene108834 BambuGene10947 BambuGene110 BambuGene11008 BambuGene11075
4 3 35 7 12
(these are one of transform results )
code:result <- merge(gene, markers, by.x = "transcript_id", by.y = "transcript_id")
?
***@***.***
…------------------ 原始邮件 ------------------
发件人: "GoekeLab/bambu" ***@***.***>;
发送时间: 2024年3月7日(星期四) 上午9:15
***@***.***>;
***@***.******@***.***>;
主题: Re: [GoekeLab/bambu] why multiple .bam files(>=7) cannot work (Issue #411)
Hi,
I am not sure I understand your question. Are you able to attach in image or an example of what you mean to better describe it?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi @Yumy526 , 你的上一个提问格式有点问题,我们这边看到的全是乱码。 如果不太方便的话,你也可以直接发中文,谢谢! Ying |
I'm putting multiple .bam files as inputs for reads, but I'm getting an error. When I take less than 10 or so files as readings for reads, it runs successfully, don't know why, A little anxious, very eager and looking forward to your reply!
my code:se1 <- bambu(reads = list.files(pattern="GV"), annotations = bambuAnnotations, genome = fa.file)
error:Error in checkInputs(annotations, reads, readClass.outputDir = rcOutDir, :
Reads should either be: a vector of paths to .bam files, a vector of paths to Bambu RCfile .rds files, or a list of loaded Bambu RCfiles
my code:se1 <- bambu(reads = "/public/home/ymwang/PacBio/6_bambu/bam/.bam", annotations = bambuAnnotations, genome = fa.file)
error:Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in value[3L]: failed to open BamFile: file(s) do not exist:
‘/public/home/ymwang/PacBio/6_bambu/bam/.bam’
my code:se1 <- bambu(reads = c("GV10O_10.align.bam","GV11O_11.align.bam","GV12O_12.align.bam","GV13O_13.align.bam","GV14O_14.align.bam","GV15O_15.align.bam","GV16O_16.align.bam","GV1O_1.align.bam","GV20O_20.align.bam","GV22O_22.align.bam","GV23O_23.align.bam","GV2O_2.align.bam","GV3O_3.align.bam","GV4O_4.align.bam","GV5O_5.align.bam","GV7O_7.align.bam","GV8O_8.align.bam","GV9O_9.align.bam"), annotations = bambuAnnotations, genome = fa.file)
error:
The text was updated successfully, but these errors were encountered: