What's Changed
- doc warning fix by @crutcher in #4130
- Fix tch bf16 into_data by @laggui in #4142
- Update raspberry-pi-pico example to use the Pico 2, and burnpack by @BjornTheProgrammer in #4132
- Unify all_reduce
LocalCollectiveClientoperation handling. by @crutcher in #4125 - Add direct tensor snapshot retrieval API to ModuleStore by @antimora in #4131
- Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) by @antimora in #4119
- Add removed docs for tensor equal_elem by @laggui in #4145
- Add ceil_mode support to pooling operations (MaxPool, AvgPool) by @antimora in #4112
- chore: Update cubecl by @wingertge in #4134
- Implement Slice iterator and utility methods. by @crutcher in #4042
- Bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #4148
- Add slice_dyn, slice_assign_dyn, and slice_fill_dyn variants. by @crutcher in #4127
- Add Reshape scalar optimization and Gather scalar input support by @antimora in #4146
- Shape FromStr/ToString by @crutcher in #4143
- Add contiguous reindexing for non-contiguous layer indices by @antimora in #4150
- Add warmup epochs to
MetricEarlyStoppingStrategy. (#3970) by @crutcher in #4041 - fix(onnx): Use activation function for GELU codegen instead of non-existent tensor method by @antimora in #4161
- Refactor more basic ops by @laggui in #4156
- Refactor
LocalCollectiveServerfor improved clarity and error handling by @crutcher in #4126 - Fix typo in comment for logger_task function by @crutcher in #4159
- Refactor configurable backend tests (no more testgen macros) by @laggui in #4129
- Zero-copy loading for embedded burnpack weights by @antimora in #4154
- Fix candle cuda imports by @laggui in #4171
- Backends no longer depend on
burn-tensor, but strictlyburn-backendby @laggui in #4169 - Chore/update cubek cubecl by @nathanielsimard in #4172
- Add ONNX CumSum operator support by @antimora in #4162
- Add backend supports_dtype by @laggui in #4155
- Fix attention shapes and out rank by @laggui in #4192
- Fix matmul & reduce execute fuse no autotune by @laggui in #4193
- Fix output dtype for argmin / argmax by @laggui in #4195
- Add
flatten_dimsmethod toShapeand refactor tensor flattening API by @crutcher in #4189 - Return slice for each dimension in shape by @laggui in #4152
- Make xtask validate run no-std checks first. by @crutcher in #4198
- Fix: CubeCL Reduce by @nathanielsimard in #4197
- Reorganize and tracing::instrument collective operations. by @crutcher in #4157
- Log running values by @Charles23R in #4199
- Remove global ONNX opset version restriction, recommend opset 16 by @antimora in #4168
- Fix dtype preservation when loading tensors in burn-store by @antimora in #4194
- Fix TchTensor::from_data bf16 by @laggui in #4203
- Perf/reduce cpu + Fix OOB by @nathanielsimard in #4204
- feat: Implicit GEMM weight gradients for convolution by @wingertge in #4182
- Fix checkpoint and summary log level by @J-F-Liu in #4201
- fix: handle 1D slope when importing prelu from onnx by @mertalev in #4205
- Zero-copy tensor loading for NdArray backend by @antimora in #4178
- Fix quantized tensor storage data length calculation by @antimora in #4180
- Fix handling scalar scan outputs in ONNX loop nodes by @antimora in #4210
- Perf/improve reduce autotuning + plane non uniform control flow check by @nathanielsimard in #4208
- Add ONNX external data support for models >2GB by @antimora in #4158
- Update/cubek by @louisfd in #4214
- Refactor: Replace
canonicalize_dimwithexpect_dimby @crutcher in #4196 - fix: handle negative indices in onnx gather op by @mertalev in #4207
- Refactor/cube dim by @nathanielsimard in #4217
- Refactor: Consolidate shape and slice error handling into
ExpressionErrorby @crutcher in #4218 - Update: CubeK by @louisfd in #4222
- feat: Accelerated convolution data gradient by @wingertge in #4220
- Fix repeat 0 times by @laggui in #4216
- Burn train api refactor by @Charles23R in #4223
- Chore/pre release 6 by @nathanielsimard in #4224