Breaking changes
PostgreSQL indexes API was changed
The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit
Previous API
- No way to define SQL expressions inside
.on
. .using
and.on
in our case are the same thing, so the API is incorrect here..asc()
,.desc()
,.nullsFirst()
, and.nullsLast()
should be specified for each column or expression on indexes, but not on an index itself.
// Index declaration reference
index('name')
.on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
.concurrently()
.using(sql``) // sql expression
.asc() or .desc()
.nullsFirst() or .nullsLast()
.where(sql``) // sql expression
Current API
// First example, with `.on()`
index('name')
.on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
.concurrently()
.where(sql``)
.with({ fillfactor: '70' })
// Second Example, with `.using()`
index('name')
.using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
.where(sql``) // sql expression
.with({ fillfactor: '70' })
New Features
🎉 "pg_vector" extension support
There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the
pg_vector
extension installed.
You can now specify indexes for pg_vector
and utilize pg_vector
functions for querying, ordering, etc.
Let's take a few examples of pg_vector
indexes from the pg_vector
docs and translate them to Drizzle
L2 distance, Inner product and Cosine distance
// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
const table = pgTable('items', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))
L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version
// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);
const table = pgTable('table', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))
For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.
You can also use the following helpers:
import { l2Distance, l1Distance, innerProduct,
cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'
l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'
innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'
hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'
If pg_vector
has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done
export function l2Distance(
column: SQLWrapper | AnyColumn,
value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
return sql`${column} <-> ${value}`;
}
return sql`${column} <-> ${JSON.stringify(value)}`;
}
Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR
Examples
Let's take a few examples of pg_vector
queries from the pg_vector
docs and translate them to Drizzle
import { l2Distance } from 'drizzle-orm';
// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))
// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })
// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)
// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)
// and more!