-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write.tree error #53
Comments
Dear @lpipes, |
I can't upload the file because it is too big (>30MB). Is there somewhere I can send it? The commands that I'm using are
|
I also get the same error with
|
Hi Lenore, R> x <- rtree(1e6)
R> write.tree(x, "x.tre")
R> file.size("x.tre")/1e6
[1] 35.88815 So it seems really possible to write a 35 MB Newick file (the same in NEXUS where the output file is 63 MB big). |
Hi @lpipes, |
Hi @emmanuelparadis if I read in the tree and then write the tree, I can do that without error. But it's only when I try to read in the tree and then do
|
R> n <- 6799103
R> x <- rtree(n)
R> s <- x$edge.length < .001
R> x$edge.length[s] <- 0
R> x <- di2multi(x)
R> degree(x)
Degree N
1 1 6799103
2 2 1
3 3 6785459
4 4 6803
5 5 12
R> x <- multi2di(x, random = FALSE)
R> degree(x)
Degree N
1 1 6799103
2 2 1
3 3 6799101
R> write.tree(x, "x2.tre") # takes ~ 5 min
R> file.size("x2.tre") / 1e6
[1] 250.266
R> saveRDS(x, "x2.rds") # takes ~ 30 sec
R> file.size("x2.rds") / 1e6
[1] 152.8073 |
@emmanuelparadis
But I am getting a very odd error after reading the tree
|
I saved the tree as an rds and then reloaded it into another session and was able to run
|
It seems we identified the cause: see #54. Your problem can be replicated with: R> tr <- stree(2000, "l")
R> write.tree(tr)
Erreur : C stack usage 9526916 is too close to the limit I've just finished a rewrite of the internal function that creates the Newick string: I paste the code below so you can try it. As a check; it should give you (of course if R> .write.tree3(tree) == write.tree(tree)
[1] TRUE And to write in a file: @KlausVigo: I'm not sure we need/want to keep the recursive version .write.tree3 <- function(phy, digits = 10, tree.prefix = "", check_tips = TRUE)
{
brl <- !is.null(phy$edge.length)
nodelab <- !is.null(phy$node.label)
if (check_tips) phy$tip.label <- checkLabel(phy$tip.label)
if (nodelab) phy$node.label <- checkLabel(phy$node.label)
f.d <- paste0(":%.", digits, "g")
n <- length(phy$tip.label)
## terminal branches:
terms <- phy$edge[, 2] <= n
TERMS <- phy$tip.label[phy$edge[terms, 2]]
if (brl) TERMS <- paste0(TERMS, sprintf(f.d, phy$edge.length[terms]))
## internal branches:
phy$edge[!terms, 2]
INTS <- ")"
if (nodelab) INTS <- paste0(INTS, phy$node.label[-1])
if (brl) INTS <- paste0(INTS, sprintf(f.d, phy$edge.length[!terms]))
## borrowed from phangorn:
parent <- phy$edge[, 1]
children <- phy$edge[, 2]
kids <- vector("list", n + phy$Nnode)
for (i in 1:length(parent))
kids[[parent[i]]] <- c(kids[[parent[i]]], children[i])
Nkids <- lengths(kids)
root <- parent[! parent %in% children][1]
LEFT <- -.Machine$integer.max
buildCladeIntegers <- function(node) {
N <- Nkids[node]
n <- 2 * N + 1L
res <- integer(n)
if (N > 1) res[] <- NA_integer_ # initialise with ','
res[1] <- LEFT
res[n] <- -node
res[seq(2L, n - 1L, 2L)] <- kids[[node]]
res
}
iSTR <- buildCladeIntegers(root)
repeat {
whr <- which(iSTR > n)
if (!length(whr)) break
for (i in rev(whr)) {
z <- iSTR[i]
iSTR <- c(iSTR[1:(i - 1L)],
buildCladeIntegers(z),
iSTR[(i + 1L):length(iSTR)])
}
}
STRING <- character(length(iSTR))
STRING[iSTR == LEFT] <- "("
STRING[match(-phy$edge[!terms, 2], iSTR)] <- INTS
STRING[is.na(iSTR)] <- ","
STRING[match(1:n, iSTR)] <- TERMS
## treat the root:
ROOT <- if (nodelab) paste0(")", phy$node.label[1]) else ")"
if (!is.null(phy$root.edge))
ROOT <- paste0(ROOT, sprintf(f.d, phy$root.edge))
ROOT <- paste0(ROOT, ";")
STRING[length(STRING)] <- ROOT
paste(STRING, collapse = "")
} |
I don't think we should have two versions of Cheers, |
Hi @KlausVigo, |
Hi @emmanuelparadis,
where most of the time is spent and also most memory allocations happen. |
It should be possible without resizing |
@emmanuelparadis @KlausVigo Thanks for working on this! I'm trying out the fix now. It has been several hours since I started the |
@lpipes It seems you are victim of the O(n^2) curse! I paste below another version which seems to scale with O(n) and is even faster than the current internal function in ape (and it accepts totally unbalanced trees!). The basic idea is to build the Newick string progressively thanks to the postorder tree-traversal. So there's no need to do recursions, just an iteration along the edges after getting their postorder ordering. I made tests with some random trees generated with @KlausVigo This version assumes that the root is the 'n + 1' node (hence the code borrowed from phnagorn is not needed anymore). I think you wanted to have this relaxed... Maybe this can still be relaxed if .write.tree3 <- function(phy, digits = 10, tree.prefix = "", check_tips = TRUE)
{
brl <- !is.null(phy$edge.length)
nodelab <- !is.null(phy$node.label)
if (check_tips) phy$tip.label <- checkLabel(phy$tip.label)
if (nodelab) phy$node.label <- checkLabel(phy$node.label)
f.d <- paste0(":%.", digits, "g")
n <- length(phy$tip.label)
## terminal branches:
terms <- phy$edge[, 2] <= n
TERMS <- phy$tip.label[phy$edge[terms, 2]]
if (brl) TERMS <- paste0(TERMS, sprintf(f.d, phy$edge.length[terms]))
## internal branches, including root edge:
INTS <- rep(")", phy$Nnode)
if (nodelab) INTS <- paste0(INTS, phy$node.label)
if (brl) {
tmp <- phy$edge.length[!terms][order(phy$edge[!terms, 2])]
tmp <- c("", sprintf(f.d, tmp))
if (!is.null(phy$root.edge)) tmp[1L] <- sprintf(f.d, phy$root.edge)
INTS <- paste0(INTS, tmp)
}
INTS[1] <- paste0(INTS[1], ";")
### ## borrowed from phangorn:
### parent <- phy$edge[, 1]
### children <- phy$edge[, 2]
### kids <- vector("list", n + phy$Nnode)
### for (i in 1:length(parent))
### kids[[parent[i]]] <- c(kids[[parent[i]]], children[i])
### Nkids <- lengths(kids, FALSE)
### root <- parent[! parent %in% children][1]
###
o <- postorder(phy)
ANC <- phy$edge[o, 1L]
DESC <- phy$edge[o, 2L]
NEWICK <- character(n + phy$Nnode)
NEWICK[1:n] <- TERMS
root <- n + 1L
from <- to <- 1L
repeat {
thenode <- ANC[from]
if (thenode == root) {
to <- length(ANC)
} else {
while (ANC[to + 1L] == thenode) to <- to + 1L
}
tmp <- paste(NEWICK[DESC[from:to]], collapse = ",")
tmp <- paste0("(", tmp, INTS[thenode - n])
NEWICK[thenode] <- tmp
if (thenode == root) break
from <- to + 1L
}
NEWICK[root]
} |
Hi @emmanuelparadis,
If the tree is in postorder than the root is always Cheers, |
@emmanuelparadis @KlausVigo Thanks for your help! I was able to print out the tree! I'll close this issue now. |
Hi guys, Do you think the below issue we experience with Thanks, |
Hi @teng-gao, |
Hi @KlausVigo, Sorry for the confusion. I'm not the author of the Thanks, |
Hi, my tree has become bigger now (~10,000,000 leaf nodes) and I have been trying to use |
|
Hi, R> N <- 10^(2:7)
R> RT <- NULL
R> for (n in N) {tr <- stree(n); tr$edge.length <- runif(n); RT <- rbind(RT, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT) <- N
R> RT
user.self sys.self elapsed user.child sys.child
100 0.001 0.001 0.002 0 0
1000 0.007 0.000 0.006 0 0
10000 0.033 0.000 0.032 0 0
1e+05 0.369 0.020 0.389 0 0
1e+06 6.048 0.124 6.230 0 0
1e+07 65.056 0.951 66.610 0 0 We can check whether the presence of branch lengths is a significant factor: R> RT2 <- NULL
R> for (n in N) {tr <- stree(n); RT2 <- rbind(RT2, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT2) <- N
R> RT2
user.self sys.self elapsed user.child sys.child
100 0.003 0.024 0.027 0 0
1000 0.002 0.000 0.002 0 0
10000 0.016 0.000 0.016 0 0
1e+05 0.191 0.000 0.192 0 0
1e+06 1.772 0.008 1.796 0 0
1e+07 19.596 0.053 19.879 0 0 And the same test with fully binary trees (with branch lengths) but only until one million tips: R> RT3 <- NULL
R> for (n in N[1:5]) {tr <- rtree(n); RT3 <- rbind(RT3, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT3) <- N[1:5]
R> RT3
user.self sys.self elapsed user.child sys.child
100 0.003 0.000 0.002 0 0
1000 0.014 0.000 0.014 0 0
10000 0.219 0.000 0.221 0 0
1e+05 2.209 0.004 2.220 0 0
1e+06 38.798 0.036 38.975 0 0 So even in this "worst" case, this should not take one hour... Maybe you need to install the current version of ape from GitHub (ver. 5.6-3). From your message, it seems you loaded the above code |
I updated with the current version of ape from Github and it printed within 1 hour! Thanks a lot. |
Hi, I'm getting this error when I try to write out my tree. Any suggestions on how to fix this? Thanks!
The text was updated successfully, but these errors were encountered: