Details
model : Granite Embedding support (#15641)
ModernBERT but without head.norm so will currently fail to convert and run any other ModernBERT models, PRs with head.norm support welcome!
-
constants and tensor mappings for modern bert support, model not supported yet but working on getting conversion to work for encoder only
-
conversion now working, hf -> gguf
-
working on support, now working on building graph
-
some cleanup
-
cleanup
-
continuing
-
correct tensor shape for qkv
-
fixed tensor mappings and working on buildin graph
-
tensor debugging now works -> (llama-eval-callback), instead of simulated gate split with views, GEGLU is now used which does exactly this
-
cleanup
-
cleanup
-
cleanup
-
more cleanup
-
ubatch issues, the assert for checking equal seqs in llama-graph.cpp when building attention keeps failing, setting ubatch size to 1 when running llama-embedding with --ubatch-size 1 makes it work, but needs to be looked into more
-
added cls token per previous modern bert attempt, still working on checking out the rest
-
fixed pre tokenizer and still working through previous pr
-
working through previous attemp, implimented more accurate conversion per previous attempt, added local sliding window attention that alternates every third layer
-
fixed pre tokenizer
-
working on swa with local and global alternating attention
-
some cleanup and now fails on build attn
-
starting to work, and some cleanup, currently failing on last layer construction in graph build
-
alternating rope implemented and modern bert graph build succeeds
-
fixed asser for equal ubatch seq
-
cleanup
-
added mask check in vocab
-
fixed alternating rope, the hparams.rope_freq_base_train and hparams.rope_freq_base_train_swa were the same and i set them to correct values
-
reuse variable
-
removed repeat
-
standard swa method can be used instead of a new enum being LLAMA_SWA_TYPE_LOCAL
-
correct swa layer indexing, is supposed to be 0, 3, 6 ... instead of 1, 4, 7 ...
-
more modular hparam setting
-
replaced attn out norm with ffn_norm and cosine similarity between hf embds and llama.cpp embds went way up, from 0.05 to 0.24, replaced the cacheless kv with swa todo per the previous conversion
-
Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update convert_hf_to_gguf_update.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-graph.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
-
removed redundant hparam set
-
enums for model sizes
-
conversion for modern-bert model supported rather than just granite-small
-
Update src/llama-model.cpp
Co-authored-by: Gabe Goodhart ghart@us.ibm.com
- Update src/llama-model.cpp
Co-authored-by: Gabe Goodhart ghart@us.ibm.com
-
fixed ordering of enum for freq_base_swa
-
fixed where I added residual, now gives much much better embeddings~
-
readded cacheless logic
-
removing whitespace
-
conversion now working for swa pattern - dense every n layers
-
modern bert put into seperate src file
-
removing whitespace
-
fixed whitespace and newline errors in editorconfig job
-
Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
-
better naming convention, n_swa_pattern -> swa_period
-
reusing sliding_window_pattern key rather than making new dense_every_n_layers key, and adding writing and reading support
-
fixing pyright type-check fail
-
Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/gguf_writer.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-hparams.h
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model-saver.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/models/modern-bert.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/models/modern-bert.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/models/modern-bert.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update gguf-py/gguf/gguf_writer.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/models/modern-bert.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/models/modern-bert.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
-
added descriptions in llama-model
-
fixed tensor mappings for conversion
-
Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
- Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
-
mapping name for size
-
nits
-
unused
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Gabe Goodhart ghart@us.ibm.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: