Releases: PredictiveEcology/reproducible
v1.2.10
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.10
Dependency changes
- Drop support for R 3.6 (#230)
- remove
gdalUtilities
,gdalUtils
, andrgeos
fromSuggests
- Added minimum versions of
raster
andterra
, because previous versions were causing collisions.
Enhancements
- all direct calls to GDAL are removed: only
terra
andsf
are used throughout prepInputs
can now takefun
as a quoted expression onx
, the object loaded bydlFun
inpreProcess
preProcess
argdlFun
can now be a quoted expression- changes to the internals and outputs of
objSize
; now is primarily a wrapper aroundlobstr::obj_size
, but has an option to get more detail for lists and environments. .robustDigest
now deals explicitly with numerics, which digest differently on different OSs. Namely, they get rounded prior to digesting. Through trial and error, it was found that settingoptions("reproducible.digestDigits" = 7)
was sufficient for all known cases. Rounding to deeper than 7 decimal places was insufficient. There are also new methods forlanguage
,integer
,data.frame
(which does each column one at a time to address the numeric issue)- New version of
postProcess
calledpostProcessTerra
. This will eventually replacepostProcess
as it is much faster in all cases and simpler code base thanks to the fantastic work of Robert Hijmans (terra
) and all the upstream work thatterra
relies on - Minor message updates, especially for "adding to memoised copy...". The three dots made it seem like it was taking a long time. When in reality, it is instantaneous and is the last thing that happens in the
Cache
call. If there is a delay after this message, then it is the code following theCache
call that is (silently) slow. retry
can now return a named list for theexprBetween
, which allows for more than one object to be modified between retries.
Bug fixes
.robustDigest
was removing Cache attributes from objects under many conditions, when it should have left them there. It is unclear what the issues were, as this would likely not have impactedCache
. Now these attributes are left on.data.table
objects appear to not be recovered correctly from disk (e.g., from Cache repository. We have addeddata.table::copy
when recovering from Cache repositoryclearCache
andcc
did not correctly remove file-backed raster files (when not clearing whole CacheRepo); this may have resulted in a proliferation of files, each a filename with an underscore and a new higher number. This fix should eliminate this problem.- deal with development versions of GDAL in
getGDALVersion()
(#239) - fix issue with
maskInputs()
when not passingrasterToMatch
. - fix issue with
isna.SpatialFix
when usingpostProcess.quosure
v1.2.8
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.8
Dependency changes
lwgeom
now a suggested package
Enhancements
terra
class objects can now be correctly saved and recovered byCache
fixErrors
can now distinguishtestValidity = NA
meaning don't fix errors andtestValidity = FALSE
run buffering which fixes many errors, but don't test whether there are any invalid polygons first (maybe slow), ortestValidity = TRUE
meaning test for validity, then if some are invalid, then run buffer.- Change default option to
reproducible.useNewDigestAlgorithm = 2
which will have user visible changes. To keep old behaviour, setoptions(reproducible.useNewDigestAlgorithm = 1)
- minor changes to messaging when
options(reproducible.showSimilar)
is set. It is now more compact e.g., 3 lines instead of 5. - added
sf
methods tostudyAreaName
Bug fixes
- A small, but very impactful bug that created false positive
Cache
returns; i.e., a 2nd time through a Cache would return a cached copy, when some of the arguments were different. It occurred for when the differences were in unnamed arguments only.
v1.2.7
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.7
reproducible
will be slowly changing the defaults for vector GIS datasets from the sp
package to the sf
package.
There is a large user-visible change that will come (in the next release), which will cause prepInputs
to read .shp
files with sf::st_read
instead of raster::shapefile
, as it is much faster. To change now, set options("reproducible.shapefileRead" = "sf::st_read")
Enhancements
- default
fun
inprepInputs
for shapefiles (.shp
) is nowsf::st_read
if the system hassf
installed. This can be overridden withoptions("reproducible.shapefileRead" = "raster::shapefile")
, and this is indicated with a message at the moment this is occurring, as it will cause different behaviour. quick
argument inCache
can now be a character vector, allowing individual character arguments to be digested as character vectors and others to be digested as files located at the specified path as represented by the character vector.objSize
previously included objects innamespaces
,baseenv
andemptyenv
, so it was generally too large. Now uses the same criteria aspryr::object_size
- improvements with messaging when
unzip
missing (thanks to C. Barros #202) - while unzipping, will also search for
7z.exe
on Windows if the object is larger than 2GB, if can't findunzip
. fun
argument inprepInputs
and family can now be a quoted expression.archive
argument inprepInputs
can now beNA
which means to treat the file downloaded not as an archive, even if it has a.zip
file extension- many minor improvements to functioning of esp.
prepInputs
- speed improvements during
postProcess
especially for very large objects (>5GB tested). Previously, it was running manyfixErrors
calls; now only callsfixErrors
on fail of the proximate call (e.g., st_crop or whatever) retry
now has a new argumentexprBetween
to allow for doing something after the fail (for example, if an operation fails, e.g.,st_crop
, then runfixErrors
, then return back tost_crop
for the retry)Cache
now has MUCH better nested levels detection, with messaging... and control of how deep the Caching goes seems good, via useCache = 2 will only Cache 2 levels in...archive
argument inprepInputs
family can now be NA ... meaning do not try to unzip even if it is a.zip
file or other standard archive extensiongdb.zip
files (e.g., a file with a .zip extension, but that should not be opened with an unzip-type program) can now be opened withprepInputs(url = "whateverUrl", archive = NA, fun = "sf::st_read")
fun
argument inprepInputs
can now be a quoted function call.preProcess
now does a better job with large archives that can't be correctly handled with the defaultzip
andunzip
with R, by tryingsystem2
calls to possible7z.exe
or other options on Linux-alikes.
Bug fixes
Copy
generic no longer hasfileBackedDir
argument. It is now passed through with the...
. This was creating a bug with some cases wherefileBackedDir
was not being correctly executed.fixErrors()
now better handlessf
polygons with mixed geometries that include points.- inadvertent deleting of file-backed rasters in multi-filed stacks during
Cache
writeOutputs.Raster
attempted to changedatatype
ofRaster
class objects using the setReplacementdataType<-
, without subsequently writing to disk viawriteRaster
. This created bad values in theRaster*
object. This now performs awriteRaster
if there is adatatype
passed towriteOutputs
e.g., throughprepInputs
orpostProcess
.updateSlotFilename
has many more tests.prepInputs(..., fun = NA)
now is the correct specification for "do not load object into R". This essentially replicatespreProcess
with same arguments.- several minor bugfixes
Copy
did not correctly copyRasterStack
s when some of theRasterLayer
objects were in memory, some on disk;raster::fromDisk
returnedFALSE
in those cases, soCopy
didn't occur on the file-backed layer files. UsingFilenames
instead to determine if there are any files that need copying.
v1.2.6
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.6
Enhancements
- Optional (and may be default soon) -- An update to the internal digesting for file-backed Rasters that should be substantially faster, and smaller disk footprint. Set using
options("reproducible.useNewDigestAlgorithm" = 2)
- changed default of
options("reproducible.polygonShortcut" = FALSE)
as there were still too many edge cases that were not covered.
Bug fix
RasterStack
objects with a single file (thus acting like aRasterBrick
) are now handled correctly byCache
andprepInputs
families, especially with newoptions("reproducible.useNewDigestAlgorithm" = 2)
, though in tests, it worked with default also- Fix issue #185, RSQLite now uses a RNG during dbAppend; this affected 2 tests.
v1.2.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.1
New features
- harmonized message colours that are use adjustable via options:
reproducible.messageColourPrepInputs
for allprepInputs
functions;reproducible.messageColourCache
for allCache
functions; andreproducible.messageColourQuestion
for questions that require user input. Defaults arecyan
,blue
andgreen
respectively. These are user-visible colour changes. - improved messaging for
Cache
cases where afile.link
is used instead of saving. - with improved messaging, now
options(reproducible.verbose = 0)
will turn off almost all messaging. postProcess
and family now havefilename2 = NULL
as the default, so not saved to disk. This is a change.verbose
is now an argument throughout, whose default isgetOption(reproducible.verbose)
, which is set by default to1
. Thus, individual function calls can be more or less verbose, or the whole session via option.
Bug fixes
RasterStack
objects were not correctly saved to disk under some conditions inpostProcess
- fixed- several minor
v1.1.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.1.1
New features
- none
Dependency changes
- none
bug fixes
- fix CRAN test failure when
file.link
does not succeed.
v1.1.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.1.0
New features
- begin to accommodate changes in GDAL/PROJ and associated updates to other spatial packages.
More updates are expected as other spatial packages (namelyraster
) are updated. - can now change
options('reproducible.cacheSaveFormat')
on the fly; cache will look for the file bycacheId
and write it usingoptions('reproducible.cacheSaveFormat')
.
If it is in another format, Cache will load it and resave it with the new format. Experimental still. - new
Copy
methods forrefClass
objects,SQLite
and movedenvironment
method intoANY
as it would be dispatched for unknown classes that inherit fromenvironment
, of which there are many and this should be intercepted Require
can now handle minimum version numbers, e.g.,Require("bit (>=1.1-15.2)")
; this can be worked into downstream tools. Still experimental.- Cache will do
file.link
orfile.symlink
if an existing Cache entry with identical output exists and it is large (currently1e6
bytes); this will save disk space. - Cache database now has tags for elapsed time of "digest", "original call", and "subsequent recovery from file",
elapsedTimeDigest
,elapsedTimeFirstRun
, andelapsedTimeLoad
, respectively. - Better management of temporary files in package and tests, e.g., during downloading (
preProcess
). Includes 2 new functions,tempdir2
andtempfile2
for use withreproducible
package - New option:
reproducible.tempPath
, which is used for the new control of temporary files. Defaults tofile.path(tempdir(), "reproducible")
. This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned - Copying or moving of Cache directories now works automatically if using default
drv
andconn
; user may need to manually callmovedCache
if cache is not responding correctly.
File-backed Rasters are automatically updated with new paths. - Cache now treats file-backed Rasters as though they had a relative path instead of their absolute path.
This means that Cache directories can be copied from one location to another and the file-backedRaster*
will have their filenames updated on the fly during a Cache recovery.
User doesn't need to do anything. postProcess
now will perform simple tests and skipcropInputs
andprojectInputs
with a message if it can, rather than usingCache
to "skip". This should speed uppostProcess
in many cases.- messaging with
Cache
has change. Now,cacheId
is shown in all cases, making it easier to identify specific items in the cache. - Automatically cleanup temporary (intermediate) raster files (with #110).
Dependency changes
- none
bug fixes
Copy
only creates a temporary directory for filebacked rasters; previously anyCopy
command was creating a temporary directory, regardless of whether it was neededcropInputs.spatialObjects
had a bug when object was a large non-Raster class.cropInputs
may have failed due to "self intersection" error when x was aSpatialPolygons*
object; now catches error, runsfixErrors
and retriescrop
.
Great reprex by @tati-micheletti. Fixed in commit89e652ef111af7de91a17a613c66312c1b848847
.Filenames
bugfix related toRasterBrick
prepInputs
does a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.prepInputs
now will not show message that it is loading object into R iffun = NULL
(#135).
v1.0.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.0.0
New features
- This version is not backwards-compatible out of the box. To maintain backwards compatibility, set:
options("reproducible.useDBI" = FALSE)
- A new backend was introduced that uses
DBI
package directly, withoutarchivist
. This has much improved speed. - New option:
options("reproducible.cacheSaveFormat")
. This can be eitherrds
(default) orqs
. All cached objects will be saved with this format. Previously it wasrda
. - Cache objects can now be saved with with
qs::qsave
. In many cases, this has much improved speed and file sizes compared tords
; however, testing across a wide range of conditions will occur before it becomes the default. - Changed default behaviour for memoising
...
becauseCache
is now much faster, the default is to turn memoising off, viaoptions("reproducible.useMemoise" = FALSE)
.
In cases of large objects, memoising should still be faster, so user can still activate it, setting the option toTRUE
. - Much better SQLite database handling for concurrent write attempts.
Tested with dozens of write attempts per second by 3 cores with abundant locked database occurrences. postProcess
arguseGDAL
can now take"force"
as the default behaviour is to not use GDAL if the problem can fit into RAM andsf
orraster
tools will be faster thanGDAL
toolsuseCloud
argument inCache
and family has slightly modified functionality (see ?Cache new sectionuseCloud
) and now has more tests including edge cases, such asuseCloud = TRUE, useCache = 'overwrite'
. The cloud version now will also follow the"overwrite"
command.
Dependency changes
- deprecating
archivist
; moved to Suggests. - removed imports for
bitops
,dplyr
,fasterize
,flock
,git2r
,lubridate
,RcppArmadillo
,RCurl
andtidyselect
. Some of these went to Suggests.
bug fixes
postProcess
calls that use GDAL made more robust (including #93).- Several minor, edge cases were detected and fixed.
v0.2.10
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.10
Dependency changes
- made compatible with
googledrive
v 1.0.0 (#119)
New features
pkgDep2
, a new convenience function to get the dependencies of the "first order" dependencies.useCache
, used in many functions (inclCache
,postProcess
) can now be numeric, a qualitative indicator of "how deep" nestedCache
calls should setuseCache = TRUE
-- implemented as 1 or 2 inpostProcess
currently. See?Cache
bug fixes
pkgDep
was becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second forpkgDep("reproducible")
)- improved
retry
to use exponential backoff when attempting to access online resources (#121)
v0.2.9
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.9
New features
- Cache has 2 new arguments,
useCloud
andcloudFolderID
. This is a new approach to cloud caching. It has been tested with file backed RasterLayer, RasterStack and RasterBrick and all normal R objects. It will not work for any other class of disk-backed files, e.g.,ff
orbigmatrix
, nor is it likely to work for R6 class objects. - Slowly deprecating cloudCache and family of functions in favour of a new approach using arguments to
Cache
, i.e.,useCache
andcloudFolderID
downloadData
from GoogleDrive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
Bug fixes
- fixes for
rcnst
errors on R-devel, tested usingdevtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5))
- other minor impovements, included fixes for #115