drizzle-team/drizzle-orm v0.31.0-beta on GitHub

Breaking changes

PostgreSQL indexes API was changed

The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit

Previous API

No way to define SQL expressions inside .on.
.using and .on in our case are the same thing, so the API is incorrect here.
.asc(), .desc(), .nullsFirst(), and .nullsLast() should be specified for each column or expression on indexes, but not on an index itself.

// Index declaration reference
index('name')
  .on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
  .concurrently()
  .using(sql``) // sql expression
  .asc() or .desc()
  .nullsFirst() or .nullsLast()
  .where(sql``) // sql expression

Current API

// First example, with `.on()`
index('name')
  .on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
  .concurrently()
  .where(sql``)
  .with({ fillfactor: '70' })

// Second Example, with `.using()`
index('name')
  .using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
  .where(sql``) // sql expression
  .with({ fillfactor: '70' })

New Features

🎉 "pg_vector" extension support

There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the pg_vector extension installed.

You can now specify indexes for pg_vector and utilize pg_vector functions for querying, ordering, etc.

Let's take a few examples of pg_vector indexes from the pg_vector docs and translate them to Drizzle

L2 distance, Inner product and Cosine distance

// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

const table = pgTable('items', {
    embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
    l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
    ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
    cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))

L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version

// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);

const table = pgTable('table', {
    embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
    l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
    hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
    bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))

For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.

You can also use the following helpers:

import { l2Distance, l1Distance, innerProduct, 
          cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'

l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'

innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'

hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'

If pg_vector has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done

export function l2Distance(
  column: SQLWrapper | AnyColumn,
  value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
  if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
    return sql`${column} <-> ${value}`;
  }
  return sql`${column} <-> ${JSON.stringify(value)}`;
}

Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR

Examples

Let's take a few examples of pg_vector queries from the pg_vector docs and translate them to Drizzle

import { l2Distance } from 'drizzle-orm';

// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))

// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })

// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)

// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)

// and more!