Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add altrep version of sparse real #2

Merged
merged 37 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
90653f9
setup for c development
EmilHvitfeldt Apr 17, 2024
44fca49
usethis::use_c()
EmilHvitfeldt Apr 17, 2024
257a030
minimal working altrep
EmilHvitfeldt Apr 18, 2024
f36e717
format
EmilHvitfeldt Apr 18, 2024
b4ef953
put back in define R_NO_REMAP
EmilHvitfeldt Apr 18, 2024
eeeca4f
add init.c
EmilHvitfeldt Apr 18, 2024
d00b5bc
set minimum R version
EmilHvitfeldt Apr 18, 2024
4675069
organize altrep-sparse-vector.c
EmilHvitfeldt Apr 18, 2024
b153764
make better interface
EmilHvitfeldt Apr 18, 2024
02a1b57
rename all sparse real functions
EmilHvitfeldt Apr 18, 2024
e62b306
altrep-sparse-vector.c -> altrep-sparse-real.c
EmilHvitfeldt Apr 18, 2024
ddcd9ec
add length to sparse_real
EmilHvitfeldt Apr 18, 2024
14ca8b5
pass location and position data to altrep
EmilHvitfeldt Apr 18, 2024
bdb93eb
make subsetting work
EmilHvitfeldt Apr 18, 2024
11a8a6a
add special logic for subsetting after end
EmilHvitfeldt Apr 18, 2024
8aff54f
add alrep_sparse_real_materialize
EmilHvitfeldt Apr 19, 2024
48a9f7f
keep length in data1
EmilHvitfeldt Apr 19, 2024
75cf981
couple more subsetting tests
EmilHvitfeldt Apr 19, 2024
8941eff
store materialized data in data2
EmilHvitfeldt Apr 19, 2024
372cefb
use Rf_asReal
EmilHvitfeldt Apr 19, 2024
aeb676f
fix bug in alrep_sparse_real_Materialize
EmilHvitfeldt Apr 19, 2024
22d85c0
remove method for now
EmilHvitfeldt Apr 19, 2024
3f8bf75
use STDVEC_DATAPTR() in altrep_sparse_real_Dataptr()
EmilHvitfeldt Apr 19, 2024
2494239
make sure alrep_sparse_real_Materialize() works correctly
EmilHvitfeldt Apr 19, 2024
5863362
use Rf_asInteger()
EmilHvitfeldt Apr 19, 2024
8c77d2b
less PROTECT
EmilHvitfeldt Apr 19, 2024
81456e8
Add altrep_sparse_real_Extract_subset()
EmilHvitfeldt Apr 19, 2024
126293f
fewer PROTECT
EmilHvitfeldt Apr 19, 2024
ad52c96
use setters and getters
EmilHvitfeldt Apr 19, 2024
251956e
delete vignette
EmilHvitfeldt Apr 19, 2024
ce28683
Merge commit '95b7e9b0ab4fc19cbc1cfee0731a9835859692af'
EmilHvitfeldt Apr 19, 2024
e24a9c3
move function around for organization
EmilHvitfeldt Apr 19, 2024
ea20ee5
Apply suggestions from code review
EmilHvitfeldt Apr 29, 2024
68946ae
bump UNPROTECT
EmilHvitfeldt Apr 29, 2024
ab68ba1
make sure that length is passed as an integer
EmilHvitfeldt Apr 29, 2024
33dbebf
add reminders
EmilHvitfeldt Apr 29, 2024
2f72845
pull out len neatly
EmilHvitfeldt Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,7 @@
^pkgdown$
^\.github$
^vignettes/articles$
^compile_commands\.json$
.clang-format
^.cache$
^codecov\.yml$
12 changes: 12 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
BasedOnStyle: Google
IndentWidth: 2
DerivePointerAlignment: false
PointerAlignment: Left
ColumnLimit: 80
AlignAfterOpenBracket: BlockIndent
SpaceAfterCStyleCast: true
IncludeBlocks: Regroup
AllowShortFunctionsOnASingleLine: Empty
BinPackArguments: false
BinPackParameters: false
AllowAllParametersOfDeclarationOnNextLine: false
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
.httr-oauth
.DS_Store
docs
compile_commands.json
.cache
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Depends:
R (>= 4.0.0)
Imports:
Matrix,
tibble,
Expand All @@ -17,5 +19,7 @@ URL: https://github.com/EmilHvitfeldt/sparsevctrs, https://emilhvitfeldt.github.
BugReports: https://github.com/EmilHvitfeldt/sparsevctrs/issues
Suggests:
rlang,
rsparse
rsparse,
testthat (>= 3.0.0)
Config/Needs/website: rmarkdown
Config/testthat/edition: 3
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ S3method(format,sparse_vector)
S3method(vec_arith,sparse_vector)
S3method(vec_math,sparse_vector)
S3method(vec_ptype_abbr,sparse_vector)
export(new_sparse_real)
export(new_sparse_vector)
export(sparse_to_tibble)
export(sparse_vector)
export(tibble_to_sparse)
import(vctrs)
useDynLib(sparsevctrs, .registration = TRUE)
12 changes: 12 additions & 0 deletions R/altrep.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#' Create sparse numeric vector
#'
#' @param value Numeric vector, values of non-zero entries.
#' @param position integer vector, indices of non-zero entries.
#' @param length Integer, Length of vector.
#'
#' @export
new_sparse_real <- function(value, position, length) {
x <- list(val = value, pos = as.integer(position), length = as.integer(length))

.Call(ffi_altrep_new_sparse_real, x)
}
1 change: 1 addition & 0 deletions R/sparsevctrs-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@
NULL

## usethis namespace: start
#' @useDynLib sparsevctrs, .registration = TRUE
## usethis namespace: end
NULL
18 changes: 18 additions & 0 deletions man/new_sparse_real.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.o
*.so
*.dll
180 changes: 180 additions & 0 deletions src/altrep-sparse-real.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
#define R_NO_REMAP
#include "R.h"
#include <Rinternals.h>
#include <R_ext/Altrep.h>

// Initialised at load time
R_altrep_class_t altrep_sparse_real_class;

SEXP ffi_altrep_new_sparse_real(SEXP x) {
return R_new_altrep(altrep_sparse_real_class, x, R_NilValue);
}

SEXP alrep_sparse_real_Materialize(SEXP vec) {
SEXP out = R_altrep_data2(vec);

if (out != R_NilValue) {
return out;
}

SEXP data1 = R_altrep_data1(vec);
SEXP val = VECTOR_ELT(data1, 0);
SEXP pos = VECTOR_ELT(data1, 1);
SEXP len = VECTOR_ELT(data1, 2);

R_xlen_t c_len = (R_xlen_t) INTEGER_ELT(len, 0);

out = PROTECT(Rf_allocVector(REALSXP, c_len));

// Reminder about performance
for (R_xlen_t i = 0; i < c_len; ++i) {
SET_REAL_ELT(out, i, 0);
EmilHvitfeldt marked this conversation as resolved.
Show resolved Hide resolved
}

R_xlen_t n_positions = Rf_xlength(pos);

// Reminder about performance
for (R_xlen_t i = 0; i < n_positions; ++i) {
SET_REAL_ELT(out, INTEGER_ELT(pos, i) - 1, REAL_ELT(val, i));
EmilHvitfeldt marked this conversation as resolved.
Show resolved Hide resolved
}

R_set_altrep_data2(vec, out);

UNPROTECT(1);
return out;
}

// -----------------------------------------------------------------------------
// ALTVEC

void* altrep_sparse_real_Dataptr(SEXP x, Rboolean writeable) {
return STDVEC_DATAPTR(alrep_sparse_real_Materialize(x));
}

const void* altrep_sparse_real_Dataptr_or_null(SEXP vec) {
SEXP out = R_altrep_data2(vec);

if (out == R_NilValue) {
return NULL;
} else {
return out;
}
}

static SEXP altrep_sparse_real_Extract_subset(SEXP x, SEXP indx, SEXP call) {

SEXP data1 = R_altrep_data1(x);
SEXP val_old = VECTOR_ELT(data1, 0);
SEXP pos_old = VECTOR_ELT(data1, 1);
SEXP matches = PROTECT(Rf_match(pos_old, indx, R_NaInt));

int n = 0;

for (int i = 0; i < Rf_length(matches); ++i) {
if (INTEGER_ELT(matches, i) != R_NaInt) {
n++;
}
}

SEXP val_new = PROTECT(Rf_allocVector(REALSXP, n));
SEXP pos_new = PROTECT(Rf_allocVector(INTSXP, n));

int step = 0;
int what_pos = 1;

for (int i = 0; i < Rf_length(matches); ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified into a single loop - we can talk about it over zoom


int match = INTEGER_ELT(matches, i);
if (match != R_NaInt) {
SET_REAL_ELT(val_new, step, REAL_ELT(val_old, match - 1));

for (int j = 0; j < Rf_length(matches); ++j) {
if (INTEGER_ELT(indx, j) == INTEGER_ELT(pos_old, match - 1)) {
break;
} else {
what_pos++;
}
}
SET_INTEGER_ELT(pos_new, step, what_pos);
what_pos = 1;
step++;
}
}

const char *names[] = {"val", "pos", "length", ""};
SEXP res = PROTECT(Rf_mkNamed(VECSXP, names));
SET_VECTOR_ELT(res, 0, val_new);
SET_VECTOR_ELT(res, 1, pos_new);
SET_VECTOR_ELT(res, 2, Rf_ScalarInteger(Rf_length(matches)));

UNPROTECT(4);

return ffi_altrep_new_sparse_real(res);
}

// -----------------------------------------------------------------------------
// ALTREP

R_xlen_t altrep_sparse_real_Length(SEXP x) {
SEXP data = R_altrep_data1(x);
SEXP len = VECTOR_ELT(data, 2);
R_xlen_t out = (R_xlen_t) INTEGER_ELT(len, 0);
return out;
}

// What gets printed when .Internal(inspect()) is used
Rboolean altrep_sparse_real_Inspect(SEXP x,
int pre,
int deep,
int pvec,
void (*inspect_subtree)(SEXP, int, int, int)) {
Rprintf("sparsevctrs_altrep_sparse_real (materialized=%s)\n",
R_altrep_data2(x) != R_NilValue ? "T" : "F");
return TRUE;
}

// -----------------------------------------------------------------------------
// ALTREAL

static double altrep_sparse_real_Elt(SEXP x, R_xlen_t i) {
SEXP data1 = R_altrep_data1(x);
SEXP val = VECTOR_ELT(data1, 0);
SEXP pos = VECTOR_ELT(data1, 1);
SEXP len = VECTOR_ELT(data1, 2);
R_xlen_t c_len = (R_xlen_t) INTEGER_ELT(len, 0);

if (i > c_len) {
return NA_REAL;
}

const R_xlen_t n = Rf_xlength(val);

double out = 0;

for (int j = 0; j < n; ++j) {
if (INTEGER_ELT(pos, j) == i + 1) {
out = REAL_ELT(val, j);
break;
}
}
EmilHvitfeldt marked this conversation as resolved.
Show resolved Hide resolved

return out;
}

// -----------------------------------------------------------------------------

void sparsevctrs_init_altrep_sparse_real(DllInfo* dll) {
altrep_sparse_real_class = R_make_altreal_class("altrep_sparse_real", "sparsevctrs", dll);

// ALTVEC
R_set_altvec_Dataptr_method(altrep_sparse_real_class, altrep_sparse_real_Dataptr);
R_set_altvec_Dataptr_or_null_method(altrep_sparse_real_class, altrep_sparse_real_Dataptr_or_null);
R_set_altvec_Extract_subset_method(altrep_sparse_real_class, altrep_sparse_real_Extract_subset);

// ALTREP
R_set_altrep_Length_method(altrep_sparse_real_class, altrep_sparse_real_Length);
R_set_altrep_Inspect_method(altrep_sparse_real_class, altrep_sparse_real_Inspect);

// ALTREAL
R_set_altreal_Elt_method(altrep_sparse_real_class, altrep_sparse_real_Elt);
}
18 changes: 18 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#include <Rinternals.h>

// Defined in altrep-sparse-real.c
extern SEXP ffi_altrep_new_sparse_real(SEXP);
extern void sparsevctrs_init_altrep_sparse_real(DllInfo*);

static const R_CallMethodDef CallEntries[] = {
{"ffi_altrep_new_sparse_real", (DL_FUNC) &ffi_altrep_new_sparse_real, 1},
{NULL, NULL, 0}
};

void R_init_sparsevctrs(DllInfo* dll) {
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);

// altrep classes
sparsevctrs_init_altrep_sparse_real(dll);
}
12 changes: 12 additions & 0 deletions tests/testthat.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/testing-design.html#sec-tests-files-overview
# * https://testthat.r-lib.org/articles/special-files.html

library(testthat)
library(sparsevctrs)

test_check("sparsevctrs")
43 changes: 43 additions & 0 deletions tests/testthat/test-altrep.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
test_that("length() works with new_sparse_real()", {
expect_identical(
length(new_sparse_real(numeric(), integer(), 0)),
0L
)

expect_identical(
length(new_sparse_real(1, 1, 10)),
10L
)

expect_identical(
length(new_sparse_real(1, 1, 100)),
100L
)
})

test_that("subsetting works with new_sparse_real()", {
x_sparse <- new_sparse_real(value = c(10, 13, 20), position = c(1, 5, 8), 10)
x_dense <- c(10, 0, 0, 0, 13, 0, 0, 20, 0, 0)

for (i in seq_len(10)) {
expect_identical(x_sparse[i], x_dense[i])
}

expect_identical(x_sparse[1:2], x_dense[1:2])

expect_identical(x_sparse[3:7], x_dense[3:7])

expect_identical(x_sparse[-5], x_dense[-5])

expect_identical(x_sparse[-c(5:7)], x_dense[-c(5:7)])

# testing outside range returns NA
# expect_identical(x_sparse[c(1, 11)], x_dense[c(1, 11)])
})

test_that("materialization works with new_sparse_real()", {
x_sparse <- new_sparse_real(value = c(10, 13, 20), position = c(1, 5, 8), 10)
x_dense <- c(10, 0, 0, 0, 13, 0, 0, 20, 0, 0)

expect_identical(x_sparse[], x_dense)
})
Loading