Highlights
- Metal GPU backend expansion — new
MetalIndexIVFFlatwith IVF scan/merge kernels and expanded top-k support (#5202); Metal now enabled by default on Apple Silicon machines (#5280) - TurboQuant in ScalarQuantizer — full Algorithm 2 (QJL stage) with SIMD and optimizations (#5170)
- Sapphire Rapids (SPR) optimizations — ScalarQuantizer L2/IP (#5173), VPOPCNTDQ-based HammingComputer (#5183), and VPOPCNTDQ-based RaBitQ kernel (#5149)
- faiss-gpu pip wheels — new GPU wheel packaging (#5131); musllinux wheels re-enabled for faiss-cpu (#5299)
- SVS static vamana support (#5224)
- HNSW
is_similaritymode for IP/similarity metrics (#5246, #5226) - HNSW search performance — reserve
VisitedTableSetcapacity to avoid rehashes (#5290),search_from_candidate_unboundedtemplatized forVisitedTabledevirtualization (#5270), and runtime checks avoided inVisitedTable(#5234) - NEON FINE_SIZE=2 specializations for
Index2LevelDecoderImplandIndexPQDecoder(#5255) - cuVS upgraded to 26.06 (#5240); CI updated to ROCm 7 (#5196)
- Deserialization hardening — bool-field validation (#5279) and null inner-index rejection in IDMap / BinaryFromFloat (#5239)
- Build robustness — fix non-AVX2 import SIGILL via
.rodatapartitioning tables (#5298), Windows ARM64 (MSVC) NEON build fix (#5274), and AVX512_SPR dispatch fix on AMD (#5281)
Full Changelog
Added
- 261d8f2 Add IVFSQTurboQSearchParameters to init.pyi stub (#5304)
- 684a32d Add manual Faiss nightly workflow dispatch (#5300)
- 10b6b2a Add MetalIndexIVFFlat with IVF scan/merge kernels and expanded top-k support (#5202)
- 74d3619 TurboQuant in ScalarQuantizer: full Algorithm 2 (QJL stage) with SIMD and optimizations (#5170)
- 5066009 Add Python HNSW tutorial (#5260)
- 3cff0f4 Add Sapphire Rapids optimizations for ScalarQuantizer (L2, IP) (#5173)
- 46ef80d Support IndexIDMap/IndexIDMap2 in reverse_index_factory (#5266)
- 420158b Add VPOPCNTDQ-based HammingComputer for Sapphire Rapids+ (#5183)
- d8f1c27 Implement NEON-based FINE_SIZE=2 specializations for Index2LevelDecoderImpl and IndexPQDecoder (#5262)
- 0951b53 Support user provided blas library (#5189)
- 215740e SVS static vamana support (#5224)
- d24ad6e Add is_similarity mode to HNSW (#5246)
- aa332cd Implement NEON-based FINE_SIZE=2 specializations for Index2LevelDecoderImpl and IndexPQDecoder (#5255)
- 5d9aae6 Add faiss-gpu pip wheel packaging (#5131)
- 6f1cf64 Add VPOPCNTDQ-based RaBitQ kernel for Sapphire Rapids+ (#5149)
Changed
- 262fc3c Re-enable musllinux wheels for faiss-cpu (#5299)
- 1cdc370 Run CI on push to main to refresh ccache cache (#5291)
- 379ee75 Reserve VisitedTableSet capacity to avoid rehashes during HNSW search (#5290)
- 6513a24 Enable Metal by default on Apple machines (#5280)
- fe46c3c Validate bool fields during deserialization (#5279)
- 480f917 Type imbalance_factor and wire the .pyi stub into the buck build (#5269)
- e60baeb Templatize search_from_candidate_unbounded for VisitedTable devirtualization (#5270)
- c000190 Accelerate ScalarQuantizer::QT_bf16 with AVX512-BF16. (#4889)
- 7504fc8 Upgrade CUVS Version to 26.06 (#5240)
- d12683c facebook-unused-include-check in IndexBinaryIVF.cpp (#5263)
- 99d9013 facebook-unused-include-check in distances_simd.cpp (#5264)
- 2f0368b facebook-unused-include-check in hamming_avx2.cpp (#5265)
- 0c72755 Remove unused include of platform_macros.h in partitioning.cpp
- e6b8f6d IndexHNSW: use HNSW::is_similarity for IP/similarity metrics + tests (#5226)
- c0575f2 avoid runtime checks in VisitedTable (#5234)
- 1cb7601 Eliminate per-code denormalization in uniform SQ distance computation (#5166)
- f29d862 Revert D106693266: Implement NEON-based FINE_SIZE=2 specializations for Index2LevelDecoderImpl and IndexPQDecoder
- ef96e3d Updating CI to ROCm 7 (#5196)
- f5217d7 facebook-unused-include-check in IndexBinaryHNSW.cpp (#5251)
- 3e6ed99 facebook-unused-include-check in IndexBinaryIVF.cpp (#5252)
- 108868b Reject null inner index in IDMap and BinaryFromFloat deserialization (#5239)
- a64b549 Add IndexLattice r2 limit to cap decode-cache build cost (#5238)
- 0993715 faiss: Replace remaining get_single_code calls with ScopedCodes (#5248)
- 910e435 facebook-unused-include-check in hamming_avx2.cpp (#5242)
- d581f2f Use per-SIMD TU scan for standalone PQ (AVX2 gather inlining) (#5233)
Fixed
- 20afed0 make intentional cudaGetLastError() error-clears explicit ((void)) for clang21 -- fixes S674096 (#5302)
- e420e94 Move partitioning shifts tables to .rodata to fix non-AVX2 import SIGILL (#5298)
- 15bd823 Fix cuVS nightly (#5273)
- 17cc967 Fix AVX512_SPR dispatch on AMD: require AVX512_FP16 CPUID (#5281)
- e69bfee Stabilize RaBitQ tests on AVX512_SPR by switching to cross-level equivalence (#5277)
- c4c6514 Stub fixes: knn torch overload, ResidualCoarseQuantizer ctor, remove duplicate I/O defs (#5283)
- eb4c1ea Fix Windows ARM64 (MSVC) build broken by NEON SIMD templatization (#5274)
- ab63238 Install missing InvertedListScannerStats.h header (#5268)
- 1405415 Fix broken fbcode//faiss/tests:test_index_binary - test_replicas (test_index_binary.TestReplicasAndS (#5258)
- 34eb989 fix(python): int64 coercions for MapLong2Long + InvertedLists DOWNCAST chain (#5257)
- d492af4 faiss: initialize ids_tab to -1 in Top1BlockResultHandler::begin_multiple (#5249)
- a4e417f Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds (#5057)
- 3edc6e1 Guard Panorama autovec pragmas against nvcc frontend (#5241)
- c7689c4 Fix ODR violation in ScannerMixIn by adding SIMDLevel template parameter (#5148)
- ca87f41 Open SIFT demo data in binary mode (#5213)
Full diff: v1.14.2...v1.14.3