I'm James, an engineer / data scientist from Chicago. My time on GitHub is mostly spent writing Python, R, and shell scripts on projects for data scientists and data engineers. My time off GitHub is spent with family, at hip hop shows, and watching reality TV.
- LightGBM: a lightweight gradient boosting machine
- lightgbm-dask-testing: containerized setup for testing LightGBM's Dask interface locally and on Amazon ECS
- pkgnet: R package for analyzing an R package's dependencies
- pydistcheck: linter that finds portability issues in Python package distributions (wheels, sdists, and conda packages)
- uptasticsearch: an R data frame client for Elasticsearch
- hamilton: a "micro-framework" for feature engineering in Python
- prefect: a workflow management thing in Python that plays nicely with Dask
- xgboost: another gradient boosting machine
click for details
The pull requests and none-code contributions below were chosen to showcase the types of software work I've done. This list is not exhaustive.
- adapting
lightgbm
andxgboost
toscikit-learn
1.6: - setting up
conda
packages forlegate-boost
,legate-dataframe
, andlegate-raft
: rapidsai/legate-boost#115 - replacing LightGBM's
setup.py
withscikit-build-core
for PEP 517/518 compatibility: microsoft/LightGBM#5759 - upstreaming
dask-lightgbm
into LightGBM and guiding community discussion with Dask, XGBoost maintainers - adding
Webhook
storage toprefect
: PrefectHQ/prefect#3000 - adding
autoconf
-based builds of LightGBM's R package: microsoft/LightGBM#3188 - making
snowflake-connector-python
compatible withpyjwt
1.x and 2.x: snowflakedb/snowflake-connector-python#604 - allow tight control over ports in LightGBM distributed traiining with Dask: microsoft/LightGBM#3994
- cut compiled size of
{lightgbm}
by ignoring CLI-only objects: microsoft/LightGBM#3566 - allow use of multiple image pull secrets in
prefect
kubernetes agent: PrefectHQ/prefect#3596 - replace single-shot HTTP requests with
httr::RETRY()
in various R packages- project I led at Chi R Collab 2020: chircollab/chircollab20#1
{sergeant}
(one example): hrbrmstr/sergeant#42
- tutorial on distributed LightGBM training with Dask: microsoft/LightGBM#4030
- early stopping example in XGBoost Dask docs: dmlc/xgboost#6501
- detailed information on how LightGBM parameters affect training speed: microsoft/LightGBM#3628
- guide on how to find valid memory and CPU combinations for ECS / Fargate clusters in
dask-cloudprovider
: dask/dask-cloudprovider#156
- fixing OpenMP conflicts in
lightgbm
: - detecting debug symbols in
pandas
2.0 wheels: pandas-dev/pandas#51900 - prevent
conda
from "downgrading" Python from CPython to PyPy, while also reducing the risk of a subtle networking error made worse by unpredictability in when Dask garbage collects objects (microsoft/LightGBM#5510) - create a reproducible example for
lightgbm
loading failing withGLIBCXX
compatibility errors: microsoft/LightGBM#5106 (comment) - fix
jupyter_server
conda-forge feedstock recipe to prevent broken environments: conda-forge/jupyter_server-feedstock#84 - make multioutput behavior of
dask-ml
regression metrics consistent withscikit-learn
: dask/dask-ml#820 - fix saving Dask Random Forest models in
cuml
: rapidsai/cuml#3388 - fix checks for availability of
mm_malloc
in{lightgbm}
autoconf-based builds: microsoft/LightGBM#3510 - fix broken plots in
{lightgbm}
's docs site: microsoft/LightGBM#3508 - factor out dependency on
gendef.exe
for compiling XGBoost and LightGBM R packages with Visual Studio compilers and R 4.0:{xgboost}
: dmlc/xgboost#5764{lightgbm}
: microsoft/LightGBM#3065
- helping with various migrations for all of the RAPIDS libraries:
- updating to newer
fmt
/spdlog
: rapidsai/build-planning#56 - Dropping Python 3.9: rapidsai/build-planning#88
- CUDA 12.5: rapidsai/build-planning#73
- Adding Python 3.12: rapidsai/build-planning#40
- Adding Python 3.11: rapidsai/build-planning#3
- updating to newer
- switching LightGBM's Python package jobs to
manylinux_2_28
: microsoft/LightGBM#5580 - automatically publish
prefect-saturn
to PyPI when a new release is created: saturncloud/prefect-saturn#7 - moving LightGBM CI jobs from Travis to GitHub Actions:
- move
{uptasticsearch}
CI to GitHub Actions: uptake/uptasticsearch#217 - add CI job testing
{lightgbm}
within ASAN and UBSAN sanitizers: microsoft/LightGBM#3439 - reduce data loading work in LightGBM tests by caching data loading calls: microsoft/LightGBM#3486
- add Dockerfile to build an image for testing the Apache Arrow R package: apache/arrow#2770
- Sr. Software Engineer at NVIDIA, working on RAPIDS (https://github.com/rapidsai)
- adjunct instructor at Marquette University, where I teach "Intro to R Programming"
I've given talks on Dask, LightGBM, R, Python packaging, and other random stuff. For a full list and links to videos, see https://github.com/jameslamb/talks#gallery.
My DMs are open if you want to talk about open source, data science careers, Bravo shows, or anything else.
- π¦ Twitter: https://twitter.com/_jameslamb
- π LinkedIn: https://www.linkedin.com/in/jameslamb1/
- π¦ Bluesky: https://bsky.app/profile/jameslamb.bsky.social