I'm Simon (he/him), an R developer and data scientist. I build tools for data scientists at Posit PBC (formerly RStudio).🐛
Most of the time, I'm a generalist maintainer on R packages for statistical modeling:
- tidymodels/broom: convert statistical analysis objects to tidy tibbles
- tidymodels/infer: a grammar for tidy statistical inference
- tidymodels/stacks: tidymodels-friendly model stacking and ensembling
- tidymodels/bonsai: model wrappers for tree-based models
- tidymodels/workflows: combine preprocessing, modeling, and postprocessing objects
- tidymodels/tailor: postprocessing with tidymodels
- tidymodels/workflowsets: creating collections of modeling workflows
- rstudio/bundle: a consistent interface for model serialization
Related to the above packages, I'm also working on a book called Efficient Machine Learning with R.
I'm also interested in LLM code-assist for data scientists and have built a number of tools to that end:
- simonpcouch/gander: high-performance, low-friction chat for data science
- simonpcouch/pal: a library of LLM assistants
- simonpcouch/evalthat: testthat-style LLM evaluation
- simonpcouch/ensure: automated unit testing for R developers
Another part of my gig is maintaining database interfaces for R:
- r-dbi/odbc: connect to any ODBC-compliant database with DBI
- tidyverse/dbplyr: database backend for dplyr
I also maintain some personal R packages that range in functionality from to performance profiling to data querying to biological methods:
- simonpcouch/mdl: performant reimagining of R model matrices, written in rust
- simonpcouch/syrup: profile memory and CPU usage of parallel R code
- simonpcouch/stopwatch: high precision timings using mocking
- simonpcouch/anyflights: query
nycflights13
-like data for any recent year and US airport - simonpcouch/gbfs: query data on public bikes from hundreds of bikeshare programs
- simonpcouch/carpentR: predicting lake algal blooms using plankton dynamics
- rudeboybert/forestecology: methods for model fitting and assessment in forest ecology
- simonpcouch/detectors: prediction data from GPT detectors
- simonpcouch/readmission: hospital readmission data for patients with type 1 diabetes
- simonpcouch/forested: forest attributes in Washington State
Keep up to date with what I'm up to on my website's blog as well as the tidyverse blog.