Performance improvements
- Fixed an issue with SDPA attention that was causing slowdowns and high memory usage on any device that didn't support flash attention.
- Truncate excessively long input text to get a predictable max VRAM usage
Accuracy improvements
- Unwrap non-math sections in math tags
- Truncate repetitive text optionally
What's Changed
- Optimize Inference by @tarun-menta in #377
- Unwrap math from individual digits, etc by @VikParuchuri in #378
- Dev by @VikParuchuri in #379
Full Changelog: v0.14.2...v0.14.3