What's Changed
- Fix select_assign OOB (#4760) @nathanielsimard
- Fix
unsqueeze_dimspanic (#4755) @softmaximalist - Fix quantization tests and flaky tolerance (#4743) @laggui
- Fix fusion scalar broadcasting in
write_output_aligned(#4741) @laggui - Feat/implement fusion for irfft (#4736) @Sublime12
- Fix cubecl cuda all-reduce + remove useless check in distributed server (#4720) @Charles23R
- Feat/implement fusion for rfft (#4735) @Sublime12
- update cubek & fix gemv autotune (#4726) @louisfd
- Add more checks for quantized tensor reshape (#4704) @laggui
- feat: support cross-kind tensor casting via .cast() (#4713) @antimora
- chore: Fix some clippy errors, fix quant tests (#4708) @wingertge
- Update cubek (#4714) @louisfd
- Feat/add irfft (#4719) @Sublime12
- Feat/add rfft (#4707) @Sublime12
- Make Param Sync for parallel model inference (#4701) @antimora
- Perf/burn fusion overhead (#4645) @nathanielsimard
- Split
TrainingStrategyto decouple theDistributedBackendrequirement (#4710) @laggui - fix: use integer arithmetic for nearest-neighbor coordinate scaling (#4687) @wkrettek
- All reduce in backward (#4650) @Charles23R
- fix output in attention tuner (#4702) @louisfd
- Fix attention_fallback NaN for fully-masked rows (#4697) @antimora
- Add HammingWindow operator to burn-tensor (#4698) @RunjiaChen
- update cubek and cubecl (#4699) @louisfd
- Fix fusion consistency checks and binding estimation in burn-cubecl-fusion (#4695) @nathanielsimard
- Update cubek and fix vecmat autotune (#4682) @louisfd
- Ignore local tests with pre-trained weights (#4676) @laggui
- Fix dispatch when only wgpu is enabled (maps to webgpu) (#4678) @laggui
- update cubek (#4677) @louisfd
- Fix fusion kernel vector_size mismatch on f16 output writes (#4675) @AdrianEddy
- Include new vec2mat routine in matmul autotune (#4673) @louisfd
- Update cubecl & cubek revs (#4672) @laggui
- feat: add categorical sampling for tensors (#4655) @majiayu000
- chore: Update to upstream changes in cubecl (#4670) @wingertge
- Refactor backend tests to set device settings at initialization + use
Dispatch(#4666) @laggui - Add HannWindow operator to burn-tensor (#4631) @walkinggo
- fixup:(burn-ndarray) fix comment and tidy imports (#4668) @TsaoLun
- Fix tch int_zeros dtype in sync (#4664) @laggui
- [Breaking] Use device settings to provide output dtype (#4653) @laggui
- feat: add FID vision metric (#4644) @cong-or
- Add Adan optimizer implementation with tests (#4651) @sepcnt
- [Breaking] Add bool store dtype + remove bool elem from fusion (#4649) @laggui
- Selector/attention (#4648) @louisfd
- fix(burn-ndarray): use owned storage for native heap allocations in from_data (#4647) @TsaoLun
- add utilities fn to FusionServer (#4640) @Charles23R
- Remove int powf and make powi numeric op (#4646) @laggui
- refactor: View launch (#4639) @wingertge
- chore: Update to cubecl changes (#4630) @wingertge
- Dispatch autodiff checkpointing strategy support (#4629) @laggui
- Implement RNNT loss (#4623) @cong-or
- Remove named tensor (#4628) @laggui
- Perf: Improve fusion score (#4511) @nathanielsimard
- refactor: Vector size generic (#4624) @wingertge
- Fix function arg name inconsistencies (#4626) @softmaximalist
- Update building-blocks chapter (#4625) @softmaximalist
- Refactor/device handle (#4593) @nathanielsimard
- feat: Introduce Lanczos3 interpolation method (#4601) @ovr
- Add Gram Matrix Loss for vision tasks (#4595) @softmaximalist
- Fix fusion cumulative op inputs (#4621) @laggui
- fix: replace ValidStep with InferenceStep in training.md (#4620) @TsaoLun
- Update documentation link for burn-store (#4619) @softmaximalist
- Improve module derive + add
#[module(skip)]attribute (#4618) @laggui - Add HalfPrecisionAdapter for F32/F16 mixed-precision storage (#4594) @antimora
- Fix cosine scheduler record in composed scheduler (#4617) @laggui
- Update ONNX import docs for LoadStrategy and from_bytes (#4607) @antimora
- Use shape in
TensorData(#4603) @laggui - Update SSIM float types to f32 (#4602) @softmaximalist
- Fix
conv2d_weight_backwardw/ strided channels and unit spatial dims (viaconv_im2col_1x1) (#4591) @laggui - Add multi-scale SSIM for image quality assessment (#4555) @softmaximalist
- Remove
Clonebound fromWindowsDatasetitem (#4597) @laggui - Add contributing guidelines with AI-assisted contributions policy (#4569) @antimora
- feat: Implements DISTS metric (#4574) @koreaygj
- Fix dispatch autodiff feature propagation (#4592) @laggui
Full Changelog: v0.21.0-pre.2...v0.21.0-pre.3