Skip to content

Commit

Permalink
test.data.table print threads info at start (#3728)
Browse files Browse the repository at this point in the history
  • Loading branch information
jangorecki authored and mattdowle committed Jul 29, 2019
1 parent 29c270b commit 4c1207a
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 21 deletions.
3 changes: 3 additions & 0 deletions CRAN_Release.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,9 @@ grep -n "[^A-Za-z0-9]F[^A-Za-z0-9]" ./inst/tests/tests.Rraw
# No system.time in main tests.Rraw. Timings should be in benchmark.Rraw
grep -n "system[.]time" ./inst/tests/tests.Rraw

# All % in *.Rd should be escaped otherwise text gets silently chopped
grep -n "[^\]%" ./man/*.Rd

# seal leak potential where two unprotected API calls are passed to the same
# function call, usually involving install() or mkChar()
# Greppable thanks to single lines and wide screens
Expand Down
8 changes: 5 additions & 3 deletions R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,9 @@ test.data.table = function(verbose=FALSE, pkg="pkg", silent=FALSE, with.other.pa
# oldlocale = Sys.getlocale("LC_CTYPE")
# Sys.setlocale("LC_CTYPE", "") # just for CRAN's Mac to get it off C locale (post to r-devel on 16 Jul 2012)

cat("Running", fn, "\n")
cat("getDTthreads(verbose=TRUE):\n") # for tracing on CRAN; output to log before anything is attempted
print(getDTthreads(verbose=TRUE)) # print output of getDTthreads() verbatim as simply as possible; e.g. without depending on data.table for formatting
cat("test.data.table() running:", fn, "\n") # print fn to log before attempting anything on it (in case it is missing); on same line for slightly easier grep
env = new.env(parent=.GlobalEnv)
assign("testDir", function(x) file.path(fulldir, x), envir=env)

Expand All @@ -82,7 +84,7 @@ test.data.table = function(verbose=FALSE, pkg="pkg", silent=FALSE, with.other.pa
assign("filename", fn, envir=env)
assign("inittime", as.integer(Sys.time()), envir=env) # keep measures from various test.data.table runs
# It doesn't matter that 3000L is far larger than needed for other and benchmark.
if(isTRUE(silent)){
if (isTRUE(silent)){
try(sys.source(fn, envir=env), silent=silent) # nocov
} else {
sys.source(fn, envir=env)
Expand Down Expand Up @@ -115,7 +117,7 @@ test.data.table = function(verbose=FALSE, pkg="pkg", silent=FALSE, with.other.pa
", TZ=", suppressWarnings(Sys.timezone()),
", locale='", Sys.getlocale(), "'",
", l10n_info()='", paste0(names(l10n_info()), "=", l10n_info(), collapse="; "), "'",
", getDTthreads()='", paste0(capture.output(invisible(getDTthreads(verbose=TRUE))), collapse="; "), "'")
", getDTthreads()='", paste0(gsub("[ ][ ]+","==",gsub("^[ ]+","",capture.output(invisible(getDTthreads(verbose=TRUE))))), collapse="; "), "'")
DT = head(timings[-1L][order(-time)],10) # exclude id 1 as in dev that includes JIT
if ((x<-sum(timings[["nTest"]])) != ntest) {
warning("Timings count mismatch:",x,"vs",ntest) # nocov
Expand Down
6 changes: 3 additions & 3 deletions man/openmp-utils.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
\alias{getDTthreads}
\title{ Set or get number of threads that data.table should use }
\description{
Set and get number of threads to be used in \code{data.table} functions that are parallelized with OpenMP.
Set and get number of threads to be used in \code{data.table} functions that are parallelized with OpenMP. The number of threads is initialized when \code{data.table} is first loaded in the R session using optional envioronment variables. Thereafter, the number of threads may be changed by calling \code{setDTthreads}. If you change an environment variable using \code{Sys.setenv} you will need to call \code{setDTthreads} again to reread the environment variables.
}
\usage{
setDTthreads(threads = NULL, restore_after_fork = NULL, percent = NULL)
Expand All @@ -12,7 +12,7 @@
\arguments{
\item{threads}{ NULL (default) rereads environment variables. 0 means to use all logical CPUs available. Otherwise a number >= 1 }
\item{restore_after_fork}{ Should data.table be multi-threaded after a fork has completed? NULL leaves the current setting unchanged which by default is TRUE. See details below. }
\item{percent}{ If provided it should be a number between 2 and 100; the percentage of logical CPUs to use. } By default on startup this is data.table uses 50%. }
\item{percent}{ If provided it should be a number between 2 and 100; the percentage of logical CPUs to use. By default on startup, 50\%. }
\item{verbose}{ Display the value of relevant OpenMP settings plus the \code{restore_after_fork} internal option. }
}
\value{
Expand All @@ -25,7 +25,7 @@
Some hardware allows CPUs to be removed and/or replaced while the server is running. If this happens, our understanding is that \code{omp_get_num_procs()} will reflect the new number of processors available. But if this happens after data.table started, \code{setDTthreads(...)} will need to be called again by you before data.table will reflect the change. If you have such hardware, please let us know your experience via GitHub issues / feature requests.
Use \code{getDTthreads(verbose=TRUE)} to see the relevant environment variables, their values and the current number of threads data.table is using. For example, the environment variable \code{R_DATATABLE_NUM_PROCS_PERCENT} can be used to change the default number of logical CPUs from 50% to another value between 2 and 100. If you change these environment variables using `Sys.setenv()` after data.table and/or OpenMP has initialized then you will need to call \code{setDTthreads(threads=NULL)} to reread their current values. \code{getDTthreads()} merely retrieves the internal value that was set by the last call to \code{setDTthreads()}. \code{setDTthreads(threads=NULL)} is called when data.table is first loaded and is not called again unless you call it.
Use \code{getDTthreads(verbose=TRUE)} to see the relevant environment variables, their values and the current number of threads data.table is using. For example, the environment variable \code{R_DATATABLE_NUM_PROCS_PERCENT} can be used to change the default number of logical CPUs from 50\% to another value between 2 and 100. If you change these environment variables using `Sys.setenv()` after data.table and/or OpenMP has initialized then you will need to call \code{setDTthreads(threads=NULL)} to reread their current values. \code{getDTthreads()} merely retrieves the internal value that was set by the last call to \code{setDTthreads()}. \code{setDTthreads(threads=NULL)} is called when data.table is first loaded and is not called again unless you call it.
\code{setDTthreads()} affects \code{data.table} only and does not change R itself or other packages using OpenMP. We have followed the advice of section 1.2.1.1 in the R-exts manual: "\ldots or, better, for the regions in your code as part of their specification\ldots num_threads(nthreads)\ldots That way you only control your own code and not that of other OpenMP users." Every parallel region in data.table contain a \code{num_threads(getDTthreads())} directive. This is mandated by a \code{grep} in data.table's quality control script.

Expand Down
30 changes: 15 additions & 15 deletions src/openmp-utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,9 @@ int getDTthreads() {
return DTthreads;
}

static const char *mygetenv(const char *name) {
static const char *mygetenv(const char *name, const char *unset) {
const char *ans = getenv(name);
if (ans==NULL) ans="";
return ans;
return (ans==NULL || ans[0]=='\0') ? unset : ans;
}

SEXP getDTthreads_R(SEXP verbose) {
Expand All @@ -71,16 +70,17 @@ SEXP getDTthreads_R(SEXP verbose) {
Rprintf("This installation of data.table has not been compiled with OpenMP support.\n");
#endif
// this output is captured, paste0(collapse="; ")'d, and placed at the end of test.data.table() for display in the last 13 lines of CRAN check logs
Rprintf("omp_get_num_procs()==%d\n", omp_get_num_procs());
const char *p = mygetenv("R_DATATABLE_NUM_PROCS_PERCENT");
Rprintf("R_DATATABLE_NUM_PROCS_PERCENT==\"%s\" %s\n", p, p[0]=='\0' ? "(default 50)" : "");
Rprintf("R_DATATABLE_NUM_THREADS==\"%s\"\n", mygetenv("R_DATATABLE_NUM_THREADS"));
Rprintf("omp_get_thread_limit()==%d\n", omp_get_thread_limit());
Rprintf("omp_get_max_threads()==%d\n", omp_get_max_threads());
Rprintf("OMP_THREAD_LIMIT==\"%s\"\n", mygetenv("OMP_THREAD_LIMIT")); // CRAN sets to 2
Rprintf("OMP_NUM_THREADS==\"%s\"\n", mygetenv("OMP_NUM_THREADS"));
Rprintf("data.table is using %d threads. This is set on startup, and by setDTthreads(). See ?setDTthreads.\n", getDTthreads());
Rprintf("RestoreAfterFork==%s\n", RestoreAfterFork ? "true" : "false");
// it is also printed at the start of test.data.table() so that we can trace any Killed events on CRAN before the end is reached
// this is printed verbatim (e.g. without using data.table to format the output) in case there is a problem even with simple data.table creation/printing
Rprintf(" omp_get_num_procs() %d\n", omp_get_num_procs());
Rprintf(" R_DATATABLE_NUM_PROCS_PERCENT %s\n", mygetenv("R_DATATABLE_NUM_PROCS_PERCENT", "unset (default 50)"));
Rprintf(" R_DATATABLE_NUM_THREADS %s\n", mygetenv("R_DATATABLE_NUM_THREADS", "unset"));
Rprintf(" omp_get_thread_limit() %d\n", omp_get_thread_limit());
Rprintf(" omp_get_max_threads() %d\n", omp_get_max_threads());
Rprintf(" OMP_THREAD_LIMIT %s\n", mygetenv("OMP_THREAD_LIMIT", "unset")); // CRAN sets to 2
Rprintf(" OMP_NUM_THREADS %s\n", mygetenv("OMP_NUM_THREADS", "unset"));
Rprintf(" RestoreAfterFork %s\n", RestoreAfterFork ? "true" : "false");
Rprintf(" data.table is using %d threads. See ?setDTthreads.\n", getDTthreads());
}
return ScalarInteger(getDTthreads());
}
Expand All @@ -99,7 +99,7 @@ SEXP setDTthreads(SEXP threads, SEXP restore_after_fork, SEXP percent) {
// Allows robust testing of environment variables using Sys.setenv() to experiment.
// Default is now (as from 1.12.2) threads=NULL which re-reads environment variables.
// If a CPU has been unplugged (high end servers allow live hardware replacement) then omp_get_num_procs() will
// reflect that and a call to setDTthreads(threads=NULL) will update DTthreads.
// reflect that and a call to setDTthreads(threads=NULL) will update DTthreads.
} else {
int n=0, protecti=0;
if (length(threads)!=1) error("threads= must be either NULL (default) or a single number. It has length %d", length(threads));
Expand Down Expand Up @@ -140,7 +140,7 @@ SEXP setDTthreads(SEXP threads, SEXP restore_after_fork, SEXP percent) {
From v1.12.0 we're trying again to RestoreAferFork (#2285) with optional-off due to success
reported by Ken Run and Mark Klik in fst#110 and fst#112. We had tried that before but had
experienced problems likely on Intel's OpenMP only (Mac).
DO NOT call omp_set_num_threads(1) inside when_fork()!! That causes a different crash/hang on MacOS
upon mclapply's fork even if data.table is merely loaded and neither used yet in the session nor by
what mclapply is calling. Even when passing on CRAN's MacOS all-OK. As discovered by several MacOS
Expand Down

0 comments on commit 4c1207a

Please sign in to comment.