-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getting mostly NaN values for affinity calculations #3
Comments
Hi Chris,
do you have zeros in your normalized PWM? We recommend adding small pseudo counts to your frequency matrix..
Best,
Matthias
Am 10.01.2019 um 11:36 schrieb Chris Clarkson ***@***.***>:
Hi,
I am interested in using your package to calculating the affinity of predicted binding sites and subsequently the significance of the affinity values.
My pipeline (using your instructions from https://rdrr.io/github/matthuska/tRap/man/ <https://rdrr.io/github/matthuska/tRap/man/>) is as follows:
pfm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A 2 3 5 1 0 13 1 3 18 0 6 1 0 1 1 6 1 2 6 2
C 8 3 2 17 19 0 12 10 1 0 0 1 0 1 18 0 12 7 6 5
G 2 10 12 1 0 5 7 1 1 20 14 14 20 18 0 14 5 3 7 7
T 8 4 2 1 0 2 0 6 0 0 0 5 0 1 1 0 1 8 1 6
pwm <- toPWM(pfm)
pwm=PWMatrix(ID="Unknown", name=tf, matrixClass="Unknown", strand="+", bg=c(A=0.25, C=0.25, G=0.25, T=0.25), tags=list(), profileMatrix=pwm, pseudocounts=numeric())
peaks = searchSeq(pwm, seq, min.score = "80%",mc.cores=10L)
peaks_bed = as(peaks, "GRanges")
head(as.data.frame(peaks_bed$siteSeqs)$x)
[1] "AGCCCACTAGGGTGCAGTCC" "ATACCAGAAGAAGGCATCAG" "ACACCAGAAGAGGGCGTCAG"
[4] "ATGCCACGAGGTGGAGATAA" "GACTCACTAGAGGGCACAGG" "TCTACAGCAGGTGGCAACAC"
***@***.***), as.data.frame(peaks_bed$siteSeqs)$x)
However this results in a many NaN values....
sum(af=='NaN')/length(af)
[1] 0.4785195
I was advised that tRap is used on long sequences rather than short ones so I extended the sequences:
start(peaks_bed) <- start(peaks_bed) - 30
end(peaks_bed) <- end(peaks_bed) + 30
extended_seqs <- getSeq(Mmusculus, peaks_bed)
***@***.***),as.data.frame(extended_seqs)$x)
However this results in 100% NaNs....
I'm wondering if I am doing something wrong?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#3>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIHrHuHLhvR5OL5vxXhUlIcxqhTEVLXQks5vBxfAgaJpZM4Z5OGv>.
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
|
Hi Matthias,
when I try the following, I still get NaNs:
|
The PWM you put into the affinity function should be a probability matrix with values between [0-1]
Am 10.01.2019 um 12:50 schrieb Chris Clarkson ***@***.***>:
Hi Matthias,
Thank you for your quick reply:
There are no zeros in the normalised matrix:
***@***.***)
1 2 3 4 5 6
A 1.0626965 0.9503723 0.04448988 0.4578354 0.3879632 -0.2923233
C -0.5626965 0.9503723 0.85949647 -0.3735062 -0.1638895 1.0223918
G 1.0626965 -1.3188119 -0.76348283 0.4578354 0.3879632 0.0000000
T -0.5626965 0.4180672 0.85949647 0.4578354 0.3879632 0.2699315
7 8 9 10 11 12
A 0.41349136 0.4404797 -0.2536981 0.3870730 -0.03296475 0.71519479
C -0.24047408 -0.6112445 0.2969491 0.3870730 0.61061995 0.71519479
G -0.09176563 1.3303426 0.2969491 -0.1612191 -0.18827516 -0.45258182
T 0.91874835 -0.1595778 0.6597998 0.3870730 0.61061995 0.02219224
13 14 15 16 17 18
A 0.3870730 0.4538871 0.2969491 -0.03296475 0.75263253 1.5229892
C 0.3870730 0.4538871 -0.2536981 0.61061995 -0.47909620 -0.5761614
G -0.1612191 -0.3616612 0.6597998 -0.18827516 -0.02616885 0.8595932
T 0.3870730 0.4538871 0.2969491 0.61061995 0.75263253 -0.8064209
19 20
A -0.2228909 2.3968502
C -0.2228909 0.0000000
G -0.4123795 -0.9067515
T 1.8581614 -0.4900988
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIHrHiWrbvdHBb8gUD9wpxql8GFE3g69ks5vBykKgaJpZM4Z5OGv>.
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
|
Hi sorry for delay, |
Hi Chris,
I would recommend to use normalize.pwm from the tRap package directly on your PFM (and add a pseudo count of 0.25)
pwm.for.trap <- normalize.pwm(pfm + 0.25)
affinity(pwm.for.trap)
Best,
Matthias
Am 10.01.2019 um 23:26 schrieb Chris Clarkson ***@***.***>:
Hi sorry for delay,
Hmm very strange.... I just got the PFM from the jaspar 2018 database and then TFBSTools::toPWM command which results in a matrix like the one seen above... Can you recommend a package/ command that could perform the conversion (PFM>PWM) in the way that is necessary for your package to work?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIHrHmF1snaakiE0i4svmEBNDrjRdcqIks5vB74IgaJpZM4Z5OGv>.
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
|
Hi again Matthias,
Can I take it as a suitable strategy to add 0.25 to all of these PFMs and then implement your analysis on them?
The p-values is |
Hi Chris,
re1: I looked at the code again - should have done this first! Actually you can also pass the unnormalized PWM (counts >= 0) to the affinity function. The pseudo count is an argument of the affinity function. So no need to call normalize.pwm and add pseudo counts..
re2: the local.paffinity function takes arguments:
- the actual affinity computed from the actual sequences
- a long background sequence from which to compute the background affinities
- window size should be set to the size of the sequence used to compute the actual affinity
Best,
Matthias
Am 11.01.2019 um 11:55 schrieb Chris Clarkson ***@***.***>:
Hi again Matthias,
Thank you for this fantastic help.
It works now
Just 2 last questions:
1.
I would also like to apply this analysis to more than one transcription factor- hence if I download a list of PFMs from Jaspar:
ARNT
[,1] [,2] [,3] [,4] [,5] [,6]
A 4 19 0 0 0 0
C 16 0 20 0 0 0
G 0 1 0 20 0 20
T 0 0 0 0 20 0
AHR
[,1] [,2] [,3] [,4] [,5] [,6]
A 3 0 0 0 0 0
C 8 0 23 0 0 0
G 2 23 0 23 0 24
T 11 1 1 1 24 0
Ddit3::Cebpa
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
A 14 11 18 0 0 4 38 36 0 14 4 0
C 7 7 3 1 0 33 1 2 6 17 23 26
G 12 14 15 0 38 0 0 1 0 5 9 6
T 6 7 3 38 1 2 0 0 33 3 3 7
...
Can I take it as a suitable strategy to add 0.25 to all of these PFMs and then implement your analysis on them?
As for the command local.paffinity, I tried it on the calculated affinity values:
seqs=as.data.frame(extended_seqs)$x
head(extended_seqs)
A DNAStringSet instance of length 6
width seq
[1] 79 TACGTAAGTACACTGTAGCTGTCTTCAGACACAC...TCAGATCTCATTATGGGTAGTTGTGAGCTACCA
[2] 79 TTTTACTTTCTCTCTCCCTCTTATTGCTAGATGC...ATAAACAGCTTGCTTCTGCCATGTTCTGCAGAA
[3] 79 GACATCTGAGTACCTTCCCTGTAAGAGAGCTTGC...CTGAGCACTGAAACTCAGAGGAGAGAATCTGTC
head(af_ext)
[1] 10.586463 12.458601 10.153033 7.571788 9.838501 10.966423
***@***.***),seqs)
for(i in c(1:length(af_ext))){
print(local.paffinity(af_ext[i],pwm.for.trap,seqs[i]))
}
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
........
The p-values is 0.01612903 in every case....
Can I take these values as correct or am I not applying this function correctly?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIHrHnucBozWW_wPyYaiob9H5rQ7bjqyks5vCG26gaJpZM4Z5OGv>.
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
|
Do you mean that I should use the unnormalised PFMs? Not the PWMs as they have values < 0 (as shown above)... |
Hi,
I am interested in using your package to calculating the affinity of predicted binding sites and subsequently the significance of the affinity values.
My pipeline (using your instructions from https://rdrr.io/github/matthuska/tRap/man/) is as follows:
However this results in a many NaN values....
I was advised that tRap is used on long sequences rather than short ones so I extended the sequences:
However this results in 100% NaNs....
I'm wondering if I am doing something wrong?
The text was updated successfully, but these errors were encountered: