v2.4.0-rc.1
Release date: March 20, 2024
Milvus version | Python SDK version |
---|---|
2.4.0-rc.1 | 2.4.0 |
This release introduces several scenario-based features:
-
New GPU Index - CAGRA: Thanks to NVIDIA's contribution, this new GPU index offers a 10x performance boost, especially for batch searches. For details, refer to GPU Index.
-
Multi-vector and Hybrid Search: This feature enables storing vector embeddings from multiple models and conducting multi-vector searches. For details, refer to Multi-vector Search.
-
Sparse Vectors: Ideal for keyword interpretation and analysis, sparse vectors are now supported for processing in your collection. For details, refer to Sparse Vectors.
-
Grouping Search: Categorical aggregation enhances document-level recall for Retrieval-Augmented Generation (RAG) applications. For details, refer to Grouping Search.
-
Inverted Index and Fuzzy Matching: These capabilities improve keyword retrieval for scalar fields. For details, refer to Index Scalar Fields and Filtered Search.
New Features
GPU Index - CAGRA
We would like to express our sincere gratitude to the NVIDIA team for their invaluable contribution to CAGRA, a state-of-the-art (SoTA) GPU-based graph index that can be used online.
Unlike previous GPU indices, CAGRA demonstrates overwhelming superiority even in small batch queries, an area where CPU indices traditionally excel. In addition, CAGRA's performance in large batch queries and index construction speed, domains where GPU indices already shine, is truly unparalleled.
Example code can be found in example_gpu_cagra.py.
Sparse Vector (Beta)
In this release, we are introducing a new type of vector field called sparse vector. Sparse vectors are different from their dense counterparts as they tend to have several magnitude higher number of dimensions with only a handful being non-zero. This feature offers better interpretability due to its term-based nature and can be more effective in certain domains. Learned sparse models such as SPLADEv2/BGE-M3 have proven to be very useful for common first-stage ranking tasks. The main use case for this new feature in Milvus is to allow efficient approximate semantic nearest neighbor search over sparse vectors generated by neural models such as SPLADEv2/BGE-M3 and statistics models such as the BM25 algorithm. Milvus now supports effective and high-performance storage, indexing, and searching (MIPS, Maximum Inner Product Search) of sparse vectors.
Example code can be found in hello_sparse.py.
Multi Embedding & Hybrid Search
Multi-vector support is the cornerstone for applications that require multi-model data processing or a mix of dense and sparse vectors. With multi-vector support, now you can:
- Store vector embeddings generated for unstructured text, image, or audio samples from multiple models.
- Conduct ANN searches that include multiple vectors of each entity.
- Customize search strategies by assigning weights to different embedding models.
- Experiment with various embedding models to find the optimal model combination.
Multi-vector support allows storing, indexing, and applying reranking strategies to multiple vector fields of different types, such as FLOAT_VECTOR and SPARSE_FLOAT_VECTOR, in a collection. Currently, two reranking strategies are available: Reciprocal Rank Fusion (RRF) and Average Weighted Scoring. Both strategies combine the search results from different vector fields into a unified result set. The first strategy prioritizes the entities that consistently appear in the search results of different vector fields, while the other strategy assigns weights to the search results of each vector field to determine their importance in the final result set.
Example code can be found in hybrid_search.py.
Inverted Index and Fuzzy Match
In previous releases of Milvus, memory-based binary search indexes and Marisa Trie indexes were used for scalar field indexing. However, these methods were memory-intensive. The latest release of Milvus now employs the Tantivy-based inverted index, which can be applied to all numeric and string data types. This new index dramatically improves scalar query performance, reducing the query of keywords in strings by ten times. In addition, The inverted index consumes less memory, thanks to additional optimizations in data compression and Memory-mapped storage (MMap) mechanism of the internal indexing structure.
This release also supports fuzzy matches in scalar filtering using prefixes, infixes, and suffixes.
Example code can be found in inverted_index_example.py and fuzzy_match.py.
Grouping Search
You can now aggregate the search results by the values in a specific scalar field. This helps RAG applications to implement document-level recall. Consider a collection of documents, each document splits into various passages. Each passage is represented by one vector embedding and belongs to one document. To find the most relevant documents instead of scattering passages, you can include the group_by_field argument in the search() operation to group results by the document ID.
Example code can be found in example_group_by.py.
Float16 and BFloat16 Vector DataType
Machine learning and neural networks often use half-precision data types, such as Float16 and BFloat16. While these data types can improve query efficiency and reduce memory usage, they come with a tradeoff of reduced accuracy. With this release, Milvus now supports these data types for vector fields.
Example code can be found in float16_example.py and bfloat16_example.py.
Upgraded Architecture
L0 Segment
This release includes a new segment called L0 Segment, designed to record deleted data. This segment periodically compacts stored deleted records and splits them into sealed segments, reducing the number of data flushes required for small deletions and leaving a small storage footprint. With this mechanism, Milvus completely separates data compactions from data flushes, enhancing the performance of delete and upsert operations.
Refactored BulkInsert
This release also introduces improved bulk-insert logic. This allows you to import multiple files in a single bulk-insert request. With the refactored version, both the performance and stability of bulk insert have seen significant improvements. The user experience has also been enhanced, such as fine-tuned rate limiting and more user-friendly error messages. In addition, you can easily access the bulk-insert endpoints through Milvus' RESTful API.
Memory-mapped Storage
Milvus uses memory-mapped storage (MMap) to optimize its memory usage. Instead of loading file content directly into memory, this mechanism maps the file content into memory. This approach comes with a tradeoff of performance degradation. By enabling MMap for an HNSW-indexed collection on a host with 2 CPUs and 8 GB RAM, you can load 4x more data with less than 10% performance degradation.
In addition, this release also allows dynamic and fine-grained control over MMap without the need to restart Milvus.
For details, refer to MMap Storage.
Others
Milvus-CDC
Milvus-CDC is an easy-to-use companion tool to capture and synchronize incremental data between Milvus instances, allowing for easy incremental backup and disaster recovery. In this release, Milvus-CDC has improved stability, and its Change Data Capture (CDC) functionality now becomes generally available.
To learn more about Milvus-CDC, refer to GitHub repository and Milvus-CDC Overview.
Refined MilvusClient Interfaces
MilvusClient is an easy-to-use alternative to the ORM module. It adopts a purely functional approach to simplify interactions with the server. Instead of maintaining a connection pool, each MilvusClient establishes a gRPC connection to the server.
The MilvusClient module has implemented most of the functionalities of the ORM module.
To learn more about the MilvusClient module, visit pymilvus and the reference documents.