github drizzle-team/drizzle-orm v0.31.0-beta

pre-release4 months ago

Breaking changes

PostgreSQL indexes API was changed

The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit

Previous API

  • No way to define SQL expressions inside .on.
  • .using and .on in our case are the same thing, so the API is incorrect here.
  • .asc(), .desc(), .nullsFirst(), and .nullsLast() should be specified for each column or expression on indexes, but not on an index itself.
// Index declaration reference
index('name')
  .on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
  .concurrently()
  .using(sql``) // sql expression
  .asc() or .desc()
  .nullsFirst() or .nullsLast()
  .where(sql``) // sql expression

Current API

// First example, with `.on()`
index('name')
  .on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
  .concurrently()
  .where(sql``)
  .with({ fillfactor: '70' })

// Second Example, with `.using()`
index('name')
  .using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
  .where(sql``) // sql expression
  .with({ fillfactor: '70' })

New Features

🎉 "pg_vector" extension support

There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the pg_vector extension installed.

You can now specify indexes for pg_vector and utilize pg_vector functions for querying, ordering, etc.

Let's take a few examples of pg_vector indexes from the pg_vector docs and translate them to Drizzle

L2 distance, Inner product and Cosine distance

// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

const table = pgTable('items', {
    embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
    l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
    ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
    cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))

L1 distance, Hamming distance and Jaccard distance - added in pg_vector 0.7.0 version

// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);

const table = pgTable('table', {
    embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
    l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
    hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
    bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))

For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.

You can also use the following helpers:

import { l2Distance, l1Distance, innerProduct, 
          cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'

l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'

innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'

hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'

If pg_vector has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done

export function l2Distance(
  column: SQLWrapper | AnyColumn,
  value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
  if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
    return sql`${column} <-> ${value}`;
  }
  return sql`${column} <-> ${JSON.stringify(value)}`;
}

Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR

Examples

Let's take a few examples of pg_vector queries from the pg_vector docs and translate them to Drizzle

import { l2Distance } from 'drizzle-orm';

// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))

// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })

// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)

// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)

// and more!

Don't miss a new drizzle-orm release

NewReleases is sending notifications on new releases.