-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Implement as
generics for SOMASparseNDArrayReader
#1458
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #1458 +/- ##
===========================================
- Coverage 63.97% 52.08% -11.89%
===========================================
Files 101 72 -29
Lines 8219 5725 -2494
===========================================
- Hits 5258 2982 -2276
+ Misses 2961 2743 -218
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
apis/r/DESCRIPTION
Outdated
'utils-uris.R' | ||
'utils.R' | ||
'write_seurat.R' | ||
'write_soma.R' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Why would we need this now when we did not before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Collate
secion is not necessary. My guess is it was generated when @include
s were part of the code, then added as part of this PR
apis/r/R/utils.R
Outdated
@@ -71,7 +71,7 @@ arrow_to_dt <- function(arrlst) { | |||
} | |||
|
|||
##' @noRd | |||
as_arrow_table <- function(arrlst) { | |||
to_arrow_table <- function(arrlst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could possibly make that simple list of two external pointers a simple S3 class (which I considered for the simple features like pretty printing). That may open the door for a dispatch of as_arrow_table.CLASSHERE as in your coercion utilities. Is that better?
In the short term the renaming is fine but I do like these "verbs" to start with 'as' ...
(We could also decide to make it .as_arrow_table()
with a leading dot. It is already a non-documented, non-exported internal helper.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could possibly make that simple list of two external pointers a simple S3 class
Love this idea. Almost did it myself #1461
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @pablo-gar, note that I removed the internal as_arrow_table()
in #1461 since it was redundant with soma_array_to_arrow_table()
and conflicted with arrow::as_arrow_table()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, let's add "proposing to remove as_arrow_table()
in #1461" and maybe we should move a little slower here / not quite do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this idea.
Come to think about it, in tiledb-r
I changed this and am now returning at arrow Table
each time. That is simpler. Maybe we should do that here too. So then the as_arrow_table()
would become an R-level internal function, not exported, not visible that wraps around sr_next()
and other data gathers (i.e. soma_array_reader()
and instead of being handed a list of two (external pointers to Arrow structs) it returns an arrow Table
made from them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left questions inline. This is quite promising, and I trust the unit tests...
Thanks @eddelbuettel you got here a little earlier than expected, I wanted to take a final look before opening it up for reviews since opened the PR very late yesterday. I did a final review and everything looks good. I'll address your comments shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious as to the reason for implementing this at the SparseArrayRead
level instead of the SparseArray
level; I would think that as(array, "RsparseMatrix")
would be more user-friendly than as(array$read(), "RsparseMatrix")
That's not to say the SparseArrayRead
coercions can't exist, the array-level could simply be
methods::setOldClass("SOMASparseNDArray")
#' @importClassesFrom Matrix TsparseMatrix
#'
methods::setAs(from = "SOMASparseNDArray", to = "TsparseMatrix", def = \(from) methods::as(from$read(), "TsparseMatrix"))
but perhaps there's some issue that I'm not thinking of for why we're not providing coercions at the array-level
Additional comments provided within the code
@@ -0,0 +1,40 @@ | |||
#' Coercion methods for SOMA classes | |||
|
|||
#' @importFrom arrow as_arrow_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reexport as_arrow_table
?
#' @importFrom arrow as_arrow_table
#' @export
#'
arrow::as_arrow_table
If not, these methods will be inaccessible to the end-user without library(arrow)
first
# Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to Matrix::\link[Matrix]{dgTMatrix} | ||
setAs(from = "SOMASparseNDArrayRead", | ||
to = "TsparseMatrix", | ||
def = function(from) from$sparse_matrix()$concat() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The R6 classes may need to be declared as an old class with methods::setOldClass()
methods::setOldClass("SOMASparseNDArrayRead")
We may also need to import the target classes from Matrix
#' @importClassesFrom Matrix TsparseMatrix CsparseMatrix RsparseMatrix
#'
NULL
) | ||
|
||
# Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to Matrix::\link[Matrix]{dgCMatrix} | ||
setAs(from = "SOMASparseNDArrayRead", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also provide a delayed method for SeuratObject::as.sparse()
; this is called in SeuratObject::CreateSeuratObject()
and SeuratObject::CreateAssayObject()
and would allow passing a sparse array/sparse array read directly to those functions
#' @exportS3Method SeuratObject::as.sparse
#'
as.sparse.SOMASparseNDArrayRead <- function(x, ...) {
as(x, "CsparseMatrix")
}
apis/r/DESCRIPTION
Outdated
'utils-uris.R' | ||
'utils.R' | ||
'write_seurat.R' | ||
'write_soma.R' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Collate
secion is not necessary. My guess is it was generated when @include
s were part of the code, then added as part of this PR
Thanks @mojaveazure for all the recommendations! I was scratching my head with some behaviors that are going to be solved by your suggestions.
Yes I thought about this and there's a compromise, if we add them at SparseArray level we don't have access to the arguments of What do you think? |
That's true, and is the reason why we built One option that I'm toying with is using The odd thing would be that our |
I am mostly confused at this point. Just as I was getting used to method chaining (which we tamed nicely now) we revert to the (to me still more familiar converters ie (And yes: not a fan that > arr <- tiledb_array("/tmp/tiledb/penguins", return_as="arrow")
> class(as.data.frame(at))
[1] "tbl_df" "tbl" "data.frame"
> But that is Edit: Sorry about the close/open -- that was a fat-fingered early hit on the return key. |
My preference would be to update the vignettes to leverage the new iterated reader in a separate PR so we can discuss those changes in isolation. Re
That's true but I'm not sure |
apis/r/R/utils-coercions.R
Outdated
x$tables()$concat() | ||
} | ||
|
||
#' Coerce \link[tiledbsoma]{SOMASparseNDArrayRead} to \link{data.frame} or \link[tibble]{tibble} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could mention the existence of these methods in the docs for SOMASparseNDArray but I don't think we should add a description here, which results in a separate documentation entry.
I incorporated some of @mojaveazure suggestions. I still want to include a few more things suggested in the comments and do some testing. In addition, after a conversation with @aaronwolen I will hold on merging this PR. He has asked me to hold until a broader strategy is created for R generics across all TileDB-SOMA. |
as
generics for SOMASparseNDArrayReader
as
generics for SOMASparseNDArrayReader
Issue and/or context:
#1453
Changes:
SOMASparseNDArrayReader
andTableIter
Notes for Reviewer:
as
generics is correctUsage
A more R-like behavior is enabled for
$read()