timescale/pgai pgai-v0.10.0 on GitHub

0.10.0 (2025-04-28)

This is a major release of the pgai Python library.

The vectorizer now works with any Postgres database. The vectorizer no longer requires the ai extension. This pgai Python library release now contains everything you need to run the pgai vectorizer. That means it can work with any cloud provider. Users who have previously used vectorizer with the extension should refer to our migration guide.
Document support for files in S3. We now support processing binary documents and files stored in s3 buckets. (learn more)
Support for storing embeddings in the same table as the source data. For some use cases, chunking is not required. In that case, It is often beneficial to store embeddings in a column of the source row, instead of as a separate table. This is now supported with destination_column.

⚠ BREAKING CHANGES

The vectorizer now works without the ai extension. Users upgrading from a version of the vectorizer that used the ai extension should see the migration guide.

`ai.create_vectorizer` call

The following APIs changed for the ai.create_vectorizer call. These changes are automatically applied to existing vectorizers. But, when creating new vectorizers, developers should be aware of them. On a high level the changes include:

The ai.create_vectorizer call now requires a top-level loading argument. This allows us more flexibility in how we load data into the vectorizer. For example, we can now load data from file using the loading => loading_uri() function.
The destination where embeddings are stored is now configured via the destination top-level argument. This was done to allow us to support more types of schema design for storing embeddings. For example, we can now store embeddings in a column of a table via the destination => ai.destination_column() function in addition to the previous behavior of using a separate table via the destination => ai.destination_table() function.

In particular:

ai.create_vectorizer now requires a loading => argument. Previous behavior is provided via the loading => loading_column() function.
ai.create_vectorizer no longer takes destination, target_table, target_schema, view_schema, view_name as arguments configure these options via the new destination => ai.destination_table() function instead.
ai.chunking_character_text_splitter and ai.chunking_recursive_character_text_splitter no longer take a chunk_column argument, that column name is now provided via loading => loading_column() function instead.

Other breaking changes

truncate inputs to OpenAI (#567)

Features

Now contains everything to run the pgai vectorizer without the pgai extension. use pgai.install(DB_URL) to install it (#580) (3fe83c6)
add ai.chunking_none() to skip chunking (#575) (d84965a)
add support for generating embeddings for external documents (#442) (c356ae8)
truncate inputs to OpenAI (#567) (ab29dd4)
add destination config, allows saving embeddings to source table (#582)(83631fa)
add named vectorizers (#622) (278cdf4)

Performance Improvements

make pgai --help faster (#605) (00f1eb7)
maximally defer library imports (#606) (5e3033b)

Miscellaneous

add missing dep (#588) (af9d1d3)
add token counts to tracing (#607) (b75183c)
refactor processing logic (#604) (47bb2ae)
submit plain (not tokenized) input to OpenAI (#593) (c6a9e6c)
use basemodel everywhere instead of mixing it with pydantic dataclasses (#590) (c180627)

timescale/pgai pgai-v0.10.0 pgai: v0.10.0 on GitHub