Highlights
- Significant speedup for grammars with repeat, e.g.,
a{1, 100}
. Now the preprocessing of it is O(1) instead of O(n) - Release the new serialization library
- Fix bugs about max_rollback_tokens
- Fix bugs in cuda kernels
- Refactor: migrate the grammar backend to FSMs
What's Changed
- [Refactor] JSON serializer and MemorySize by @DarkSharpness in #380
- [Feature] Add a new expression to represent repetition to speed up. by @Seven-Streams in #368
- [Minor] remove unnecessary debug file by @DarkSharpness in #381
- Fix document error and add docs for serialization by @Ubospica in #384
- Bind fill_next_token_bitmask against nb::ndarray by @Ahajha in #338
- [Feature] support internal check option in cmake by @DarkSharpness in #370
- [Refactor] Remove logics about max_rollback_tokens by @Ubospica in #385
- Fix and improve apply_token_bitmask benchmark script by @Jialin in #391
- Fix apply_bitmask logit for both CPU and triton versions when shape and stride doesn't match by @Jialin in #390
- Fix apply_bit_mask cuda implementation by @Jialin in #394
- [Fix]Fix the grammar_compiler. by @Seven-Streams in #395
- [Feature] Migrate the parsing backend to fsms. by @Seven-Streams in #376
- [Fix] Fix Multi-byte unicode characters in StructuralTagItem. by @Seven-Streams in #396
- Bump to v0.1.23 by @Ubospica in #399
New Contributors
Full Changelog: v0.1.22...v0.1.23