What's Changed
- Update cubek: tile matmul refactor (#4888) @louisfd
- Add ctc_loss backend trait hook + tch and cubecl impls (#4819) @antimora
- Centralize internal burn-* deps in [workspace.dependencies] (#4876) @antimora
- Update cubecl + cubek: fix matmul, reduce WASM and vector size check on strided tensors (#4874) @laggui
- Split Associated Types from Backend into BackendTypes (#4868) @skewballfox
- All reduce backward (#4873) @Charles23R
- Update/cubecl to client (#4866) @Charles23R
- Fix select_assign OOB units (#4870) @laggui
- Add linear op to ModuleOps for fused matmul+bias (#4747) @antimora
- Add burn-std::config runtime configuration with fusion logging and search optimization (#4864) @nathanielsimard
- Fix Typo in One Hot encoding class size error (#4869) @Baseng0815
- Fix fusion reduce broadcasted when multi block local might be a view (#4867) @laggui
- Add STFT/ISTFT and thread n through FFT backend trait (#4835) @antimora
- Fix burn-flex argmax NaN ordering; tighten expand; precise erf (#4859) @antimora
- Fix burn-flex sum_dim reading contiguous storage on transposed input (#4861) @antimora
- Fix rustls-webpki audit (#4863) @laggui
- Add det (determinant) tensor operation (#4813) @softmaximalist
- Add Blackman window function to signal module (#4842) @softmaximalist
- Display FlexDevice as Cpu (#4857) @antimora
- Update cubecl: refactor toml config, fix autotune priority and fix persistent memory pool reset (#4858) @nathanielsimard
- Migrate default test backend from NdArray to Flex (#4854) @antimora
- Use burn-flex in docs and examples (#4841) @antimora
- Fix burn-flex to_contiguous fast path for prefix views (#4856) @antimora
- Migrate benchmarks from burn-flex to burn-backend-tests (#4853) @antimora
- Fix autotune context, remove unsafe code (#4781) @ArthurBrussee
- Override
float_meanin cubecl backends (#4840) @laggui - Device service usage (#4839) @nathanielsimard
- Fusion all reduce + refactor collective (#4803) @Charles23R
- Add missing dispatch overrides and native tch ops for softmax, layer_norm (#4834) @antimora
- Fix
CrossEntropyLosswith probabilities (#4829) @laggui - Move tensor tests from burn-flex to burn-backend-tests (#4812) @antimora
- Remove unused M param from SimpleOptimizerMapper. (#4823) @crutcher
- Forward gemm perf features and fix burn-flex SIMD flag cascade (#4826) @antimora
- Add Record<(R0,)> 1-Tuple (#4825) @crutcher
- Cleanup OptimizerAdaptor / GradAdaptor API. (#4822) @crutcher
- Prep for Group Multi Optimizers (#4818) @crutcher
- Fix clippy lints (#4820) @laggui
- Matmul selection (#4773) @nathanielsimard
- Fix conv x-backward padding_out bug (#4806) @antimora
- burn-flex: implement softmax and layer_norm backend op (#4805) @antimora
- Add
FloatInfofor dtype-aware precision info (#4721) @antimora - Add softmax and layer_norm backend trait hooks (#4797) @antimora
- Update bitstream-io & rustls-webpki (yanked + audit) (#4801) @laggui
- feat(burn-nn): add native LocalResponseNorm module (#4765) @jcwal1516
- Fix: make module cloning efficient for CPU devices (#4703) @antimora
- burn-flex: enable f16 tests and fix mean overflow, grid_sample and quantization (#4769) @antimora
- Seed CubeCL normal distribution test (#4791) @leohenon
- Drop burn-flex I64 debug_asserts (#4780) @antimora
- fix(vision): propagate backend features to burn-vision (#4753) @jcwal1516
- Optimize and update LU decomposition function (#4738) @softmaximalist
- Fix burn-flex attention rejecting broadcasted mask/bias (#4777) @antimora
- Fix burn-flex bool binary ops to broadcast operands (#4775) @antimora
- Add burn-flex CPU backend (#4761) @antimora
- Fix flaky initializer_normal_init test (#4766) @leohenon
- Fix unsqueeze_dims panic on duplicate sorted axes (#4764) @antimora
- fix(ndarray): grouped conv SIMD clamp + regressions (#4727) @dnvt
- Fix xtask CI renamed feature (#4763) @laggui
- Fix/fusion autotune context (#4759) @nathanielsimard
Full Changelog: v0.21.0-pre.2...v0.21.0-pre.3