What's Changed
- [Fix] Postpone cuda import to the calling site by @Ubospica in #231
- [Feature] Support model vocab size being less than tokenizer by @Ubospica in #237
- [Style] Remove unused headers by @DarkSharpness in #219
- Fallback to triton if we fail to compile for CUDA by @zbowling in #223
- [Feature] Build and run C++ Python tests by @DarkSharpness in #218
- [Fix] Fix missing dependency in ci by @DarkSharpness in #239
Full Changelog: v0.1.15...v0.1.16