Releases: tracel-ai/burn
v0.10.0
Burn v0.10.0
sees the addition of the burn-compute
crate to simplify the process of creating new custom backends, a new training dashboard and the possibility of using the GPU in the browser along with a web demo. Additionally, numerous new features, bug fixes, and CI improvements have been made.
Warning: there are breaking changes, see below.
Changes
Burn Compute
-
Introduction of
burn-compute
, a new Burn crate making it easier to create async backends with custom kernels. @nathanielsimard, @louisfd -
Add new memory management strategies @louisfd, @nathanielsimard
-
Add autotune capabilities @louisfd
Burn Import
-
Add more ONNX record types @antimora
-
Support no-std for ONNX imported models @antimora
-
Add custom file location for loading record with ONNX models @antimora
-
Support importing erf operation to ONNX @AuruTus
Burn Tensor
-
Add covariance and diagonal operations @ArvidHammarlund
-
[Breaking] Reading operations are now
async
when compiling towasm
, except whenwasm-sync
feature is enabled. @nathanielsimard @AlexErrant -
[Breaking] Improved Clamp API @nathanielsimard
-
Add unfold tensor operation @agelas, @nathanielsimard
-
Improve tensor display implementation with ellipsis for large tensors: @macroexpansion
Burn Dataset
-
Improved speed of SqLite Dataset @antimora
-
Use gix-tempfile only when sqlite is enabled @AlexErrant
Burn Common
Burn Autodiff
- Use AtomicU64 for node ids improving performance @dae
Burn WGPU
-
Enable non-blocking reads when compiling to
wasm
to fully support WebGPU @nathanielsimard -
Add another faster matmul kernel @louisfd
-
[Breaking] Massive refactor to use burn-compute @nathanielsimard
Burn Candle
Burn Train
-
New training cli dashboard using ratatui @nathanielsimard
-
[Breaking] Heavy refactor of burn-train making it more extensible and easier to work with @nathanielsimard
-
Checkpoints can be customized with criteria based on collected metrics @nathanielsimard
-
Add the possibility to do early stopping based on collected metrics @nathanielsimard
Examples
- Add image classifier web demo using different backends, including WebGPU, @antimora
Bugfixes
-
Epoch and iteration were swapped. (#838) @daniel-vainsencher
-
RNN (Gru & LSTM) were not generic over the batch size @agelas, @EddieMataEwy
-
Other device adaptors in WGPU were ignored when best available device was used @chistophebiocca
Documentation
-
Update book @nathanielsimard
-
Doc improvements with std feature flag: @ArvidHammarlund
Chores
-
Update all dependencies @antimora
-
Lots and lots of CI Improvements with coverage information @Luni-4, @DrChat, @antimora, @dae, @nathanielsimard
Thanks
Thanks to all aforemetioned contributors and to our sponsors @smallstepman, @0x0177b11f and @premAI-io.
v0.9.0
Burn v0.9.0 sees the addition of the Burn Book, a new model repository, and many new operations and optimizations.
Burn Book
The Burn Book is available at https://burn-rs.github.io/book/
- Burn Book setup and plan @nathanielsimard @wdoppenberg @antimora
- Motivation & Getting started @louisfd @nathanielsimard
- Basic Workflow: from training to inference @nathanielsimard @louisfd
- Building blocks @nathanielsimard
- ONNX models @antimora
- Advanced sections @nathanielsimard
Model repository
The Model repository is available at https://github.com/burn-rs/models
- Setup @nathanielsimard
- Add SqueezeNet @antimora
- Multiple models made with Burn @Gadersd
- Llama 2
- Whisper
- Stable Diffusion v1.4
Changes to Burn
Neural networks
- Three new optimizers
- AdamW @wdoppenberg
- AdaGrad @CohenAriel
- RMSProp @AuruTus
- Custom initializer for transformer-related modules @wbrickner
- Cross Entropy with label smoothing and weights @ArvidHammarlund
Tensors
- Many new operators
- cast @trfdeer @nathanielsimard
- clamp, clamp_min, clamp_max @antimora
- abs @mmalczak
- max_pool1d, max_pool with dilation @caiopiccirillo
- adaptive_avg_pool 1d and 2d @nathanielsimard
- conv_transpose 1d and 2d, with backward @nathanielsimard
- Not operator @louisfd
- Dim iterator @ArvidHammarlund
- More tests for basic tensor ops @louisfd
Training
- New training metrics @Elazrod56
- CPU temperature and use
- GPU temperature
- Memory use
- Custom training and validation metric loggers @nathanielsimard
- Migration from log4rs to tracing, better integration in a GUI app @dae
- Training interruption @dae
- New custom optimize method @nathanielsimard
Backends
- WGPU backend
- Autotune @louisfd @nathanielsimard
- Cache optimization @agelas
- Pseudo-random number generator @louisfd
- Fix configs @nathanielsimard
- Matmul optimization @louisfd
- Autotune @louisfd @nathanielsimard
- ndarray backend
- Candle backend @louisfd
- Support for all basic operations
- Work in progress
Dataset
- Option for with or without replacement in dataset sampler @nathanielsimard
Import & ONNX
- Refactor, performance, tests and fixes @antimora @Luni-4 @nathanielsimard, @Gadersd
- New operators @Luni-4 @antimora @AuruTus
- Reshape
- Transpose
- Binary operators
- Concat
- Dropout
- Avg pool
- Softmax
- Conv1d, Conv2d
- Scalar and constants
- tanh
- clip
Fix
- Hugging Face downloader Windows support @Macil
- Fix grad replace and autodiff backward broadcast @nathanielsimard
- Fix processed count at learning completion @dae
- Adjust some flaky tests @dae
- Ability to disable experiment logging @dae
Configuration
- Rewrite publish and checks scripts in Rust, with cargo-xtask @Luni-4 @DrChat
- Add Typos verification to checks @caiopiccirillo @antimora
- Checks for Python and venv environment @mashirooooo
- Feature flags for crates in different scenarios @dae
Documentation
- Configuration doc for vscode environment setup @caiopiccirillo
- Jupyter notebook examples @antimora
- Readme updated @louisfd
Thanks
Thanks to all aforemetioned contributors and to our sponsors @smallstepman and @premAI-io.
v0.8.0
In this release, our main focus was on creating a new backend using wgpu.
We greatly appreciate the meaningful contributions made by the community across the project.
As usual, we have expanded the number of supported operations.
Changes
Tensor
- Added Max/Minimum operation @nathanielsimard
- Added average pooling 1D operation @nathanielsimard
- Added Gather/Scatter operations @nathanielsimard
- Added Mask Where operation @nathanielsimard
- Refactor index related operations @nathanielsimard
index
,index_assign
=>slice
,slice_assign
index_select
,index_select_assign
=>select
,select_assign
- New syntax sugar for transpose @wbrickner
- Added SiLU activation function @Poxxy
Dataset
- Added a dataset using Sqlite for storage. Now used to store huggingface datasets. @antimora
- New speech command audio dataset. @antimora
- Create python virtual environment for huggingface dependencies. @dengelt
Burn-Import
- Big refactor to make it easier to support new operations. @nathanielsimard
- Support bool element type. @maekawatoshiki
- Added Add operator. @Luni-4
- Added MaxPool2d operator. @Luni-4
- Parse convolution 2D config. @Luni-4
- Added sigmoid operation. @Luni-4
Backend
- New burn-wgpu backend 🔥! @nathanielsimard @louisfd
- Tile 2D matrix multiplication
- All operations are supported
- Improve performance of repeat with the tch backend. @nathanielsimard
Neural Networks
- Added LSTM module. @agelas
- Added GRU module. @agelas
- Better weights initialization with added support for Xavier Glorot. @louisfd
- Added MSE loss. @bioinformatist
- Cleanup padding for convolution and pooling modules. @Luni-4
- Added sinusoidal positional embedding module. @antimora
Fix
- Deserialization of constant arrays. @nathanielsimard
- Concat backward with only one dim. @nathanielsimard
- Conv1d stride hardcoded to 1. @antimora
- Fix arange with the tch backend. @nathanielsimard
Documentation
- Improve documentation across the whole project ♥! @antimora
Thanks
Thanks to all contributors and to the sponsor @smallstepman.
v0.7.0
Serialization
Serialization has been completely revamped since the last release. Modules, Optimizers, and Learning Rate Scheduler now have an associative type, allowing them to determine the type used for serializing and deserializing their state. The solution is documented in the new architecture doc.
State can be saved with any precision, regardless of the backend in use. Precision conversion is performed during serialization and deserialization, ensuring high memory efficiency since the model is not stored twice in memory with different precisions.
All saved states can be loaded from any backend. The precision of the serialized state must be set correctly, but the element types of the backend can be anything.
Multiple (de)serialization recorders are provided:
- Default (compressed gzip with named message pack format)
- Bincode
- Compressed gzip bincode
- Pretty JSON
Users can extend the current recorder using any serde implementation.
Multiple precision settings are available:
- Half (f16, i16)
- Full (f32, i32)
- Double (f64, i64)
Users can extend the current settings using any supported number type.
Optimizer
The optimizer API has undergone a complete overhaul. It now supports the new serialization paradigm with a simplified trait definition. The learning rate is now passed as a parameter to the step method, making it easier to integrate the new learning rate scheduler. The learning rate configuration is now a part of the learner API. For more information, please refer to the documentation.
Gradient Clipping
You can now clip gradients by norm or by value. An integration is done with optimizers, and gradient clipping can be configured from optimizer configs (Adam & SGD).
Learning Rate Scheduler
A new trait has been introduced for creating learning rate schedulers. This trait follows a similar pattern as the Module and Optimizer APIs, utilizing an associative type that implements the Record trait for state (de)serialization.
The following learning rate schedulers are now available:
- Noam learning scheduler
- Constant learning scheduler
Module
The module API has undergone changes. There is no longer a need to wrap modules with the Param struct; only the Tensor struct requires a parameter ID.
All modules can now be created with their configuration and state, eliminating the unnecessary tensor initializations during model deployment for inference.
Convolution
Significant improvements have been made to support all convolution configurations. The stride, dilation, and groups can now be set, with full support for both inference and training.
Transposed convolutions are available in the backend API but do not currently support the backward pass. Once they are fully supported for both training and inference, they will be exposed as modules.
Pooling
The implementation of the average pooling module is now available.
Transformer
The transformer decoder has been implemented, offering support for efficient inference and autoregressive decoding by leveraging layer norms, position-wise feed forward, self-attention, and cross-attention caching.
Tensor
The developer experience of the Tensor API has been improved, providing more consistent error messages across different backends for common operations. The Tensor struct now implements Display, allowing values, shape, backend information, and other useful details to be displayed in an easily readable format.
New operations
- The flatten operation
- The mask scatter operation
Torch Backend
The Torch backend now supports bf16.
ONNX
The burn-import
project now has the capability to generate the required Burn code and model state from an ONNX file, enabling users to easily import pre-trained models into Burn. The code generation utilizes the end user API, allowing the generated model to be fine-tuned and trained using the learner struct.
Please note that not all operations are currently supported, and assistance from the community is highly appreciated. For more details, please refer to the burn-import repository https://github.com/burn-rs/burn/tree/main/burn-import.
Bug Fixes
- Backward pass issue when there is implicit broadcasting in add #181
Thanks 🙏
Thanks to all contributors @nathanielsimard , @antimora, @agelas, @bioinformatist, @sunny-g
Thanks to current sponsors: @smallstepman
v0.6.0
Backend API
- Almost all tensor operations now receive owned tensors instead of references, which enables backend implementations to reuse tensor-allocated memory.
- Backends now have a different type for their int tensor, with its own set of operations.
- Removed the
IntegerBackend
type. - Simpler
Element
trait with fewer functions. - New index-related operations (
index_select
,index_select_assign
,index_select_dim
andindex_select_dim_assign
).
Tensor API
- The
Tensor
struct now has a third generic parameterKind
with a default value ofFloat
. - There are three kinds of tensors:
Float
,Bool
, andInt
,- Float Tensor ⇒
Tensor<B, D>
orTensor<B, D, Float>
- Bool Tensor ⇒
Tensor<B, D, Bool>
- Int Tensor ⇒
Tensor<B, D, Int>
- Float Tensor ⇒
- You still don’t have to import any trait to have functions enabled, but they have an extra constraint based on the kind of tensor, so you can’t call
matmul
on a bool tensor. All of it with zero match or if statement, just pure zero-cost abstraction. - The
BoolTensor
struct has been removed.
Autodiff
- Not all tensors are tracked by default. You now have to call
require_grad
. - The state is not always captured. Operations manually have to clone the state they need for their backward step. This results in a massive performance enhancement.
No Std
- Some Burn crates don't require std anymore, which enables them to run on any platform:
- burn-core
- burn-ndarray
- burn-common
- burn-tensor
- We have a WebAssembly demo with MNIST inference. The code is also available here with a lot of details explaining the process of compiling a model to WebAssembly.
Performance
- The Tch backend now leverages in-place operations.
- The NdArray backend now leverages in-place operations.
- The convolution and maxpooling layers in the NdArray backend have been rewritten with much better performance.
- The cross-entropy loss module leverages the new
index_select
operation, resulting in a big performance boost when the number of classes is high.
And of course, a lot of fixes and enhancements everywhere.
Thanks to all the contributors for their work @antimora @twitchax @h4rr9
v0.5.0
New Modules for Vision Tasks
Conv1D
,Conv2D
currently without support for stride, dilation, or group convolutionMaxPool2D
BatchNorm2D
New General Tensor Operations
log1p
thanks to @bioinformatistsin
,cos
,tanh
thanks to @makroiss
Breaking Changes
- Devices are now passed by reference, thanks to feedback from @djdisodo.
- The shape function now returns an owned struct, and backends no longer need to cache each shape.
v0.4.0
Bump versions (#141)
v0.3.0
- Separed backend crates