6.10.0 (2024-11-18)
The MongoDB Node.js team is pleased to announce version 6.10.0 of the bson
package!
Release Notes
BSON Binary Vector Support!
The Binary
class has new helpers to assist with using the newly minted Vector sub_type
of Binary sub_type == 9
🎉! For more on how these types can be used with MongoDB take a look at How to Ingest Quantized Vectors!
Here's a summary of the API:
class Binary {
toInt8Array(): Int8Array;
toFloat32Array(): Float32Array;
toPackedBits(): Uint8Array;
static fromInt8Array(array: Int8Array): Binary;
static fromFloat32Array(array: Float32Array): Binary;
static fromPackedBits(array: Uint8Array, padding: number = 0): Binary;
}
Relatively self-explanatory: each one supports converting to and constructing from a native Javascript data type that corresponds to one of the three vector types: Int8
, Float32
, PackedBit
.
Vector Bytes Format
When a Binary is sub_type
9 the first two bytes are set to important metadata about the vector.
binary.buffer[0]
- Thedatatype
that indicates what the following bytes are.binary.buffer[1]
- Thepadding
amount, a value 0-7 that indicates how many bits to ignore in aPackedBit
vector.
Packed Bits 📦
static fromPackedBits(array: Uint8Array, padding: number = 0)
When handling packed bits, the last byte may not be entirely used. For example, a PackedBit vector = [0xFF, 0xF0]
with padding = 4
ignores those last four 0s making the bit vector logically equal to 12 ones.
F F F 0
[1111 1111 1111] // ignored: the four 0s are padding
Important
When using the fromPackedBits
method to set your padding amount to avoid inadvertently extending your bit vector.
Unpacking Bits 🧳
Packed bits get special treatment with two styles of conversion methods to suit your vector-y needs. toBits
will return individually addressable bits shifted apart into an array. fromBits
takes the same format in reverse and packs the bits into bytes.
Notice there is no argument to set the padding
. That is because it can be determined by the array's length. Recall those 12 ones from the previous example, well, the padding has to be 4 to reach a multiple of 8.
class Binary {
toBits(): Int8Array;
static fromBits(bits: ArrayLike<number>): Binary;
}
Caution
We highly encourage using ONLY these methods to interact with vector data and avoid operating directly on the byte format. Other Binary class methods (put()
, write()
read()
, and value()
) and direct access of data in a Binary's buffer
beyond the 1st index should only be used in exceptional circumstances and with extreme caution after closely consulting the BSON Vector specification.
Details to keep in mind
- A javascript engine's endianness is platform dependent whereas BSON is always in little-endian format so if viewing bytes as Float32s take care to re-order bytes as needed.
- Int8 vectors are signed bytes but
read()
always returns unsigned bytes. - The vector data begins at offset
2
.
Binary's read()
returns a view of Binary.buffer
Binary's read()
return type claimed it would return number[]
or Uint8Array
which was true in previous BSON versions that didn't always store a Uint8Array on the buffer property like Binary
does today.
read()
's length parameter did not respect the position
value allowing reading bytes beyond the data that is actually stored in the Binary. This has been corrected.
Additionally, this method returned a view in Node.js environments and a copy in Web environments. it has been fixed to always return a view.
Features
Bug Fixes
Documentation
We invite you to try the bson
library immediately, and report any issues to the NODE project.