0.10.0 (2025-04-28)
This is a major release of the pgai
Python library.
- The vectorizer now works with any Postgres database. The vectorizer no longer requires the
ai
extension. This pgai Python library release now contains everything you need to run the pgai vectorizer. That means it can work with any cloud provider. Users who have previously used vectorizer with the extension should refer to our migration guide. - Document support for files in S3. We now support processing binary documents and files stored in s3 buckets. (learn more)
- Support for storing embeddings in the same table as the source data. For some use cases, chunking is not required. In that case, It is often beneficial to store embeddings in a column of the source row, instead of as a separate table. This is now supported with
destination_column
.
⚠ BREAKING CHANGES
The vectorizer now works without the ai
extension. Users upgrading from a version of the vectorizer that used the ai
extension should see the migration guide.
ai.create_vectorizer
call
The following APIs changed for the ai.create_vectorizer
call. These changes are automatically applied to existing vectorizers. But, when creating new vectorizers, developers should be aware of them. On a high level the changes include:
- The
ai.create_vectorizer
call now requires a top-levelloading
argument. This allows us more flexibility in how we load data into the vectorizer. For example, we can now load data from file using theloading => loading_uri()
function. - The destination where embeddings are stored is now configured via the
destination
top-level argument. This was done to allow us to support more types of schema design for storing embeddings. For example, we can now store embeddings in a column of a table via thedestination => ai.destination_column()
function in addition to the previous behavior of using a separate table via thedestination => ai.destination_table()
function.
In particular:
ai.create_vectorizer
now requires aloading =>
argument. Previous behavior is provided via theloading => loading_column()
function.ai.create_vectorizer
no longer takesdestination
,target_table
,target_schema
,view_schema
,view_name
as arguments configure these options via the newdestination => ai.destination_table()
function instead.- ai.chunking_character_text_splitter and ai.chunking_recursive_character_text_splitter no longer take a
chunk_column
argument, that column name is now provided vialoading => loading_column()
function instead.
Other breaking changes
- truncate inputs to OpenAI (#567)
Features
- Now contains everything to run the pgai vectorizer without the pgai extension. use
pgai.install(DB_URL)
to install it (#580) (3fe83c6) - add ai.chunking_none() to skip chunking (#575) (d84965a)
- add support for generating embeddings for external documents (#442) (c356ae8)
- truncate inputs to OpenAI (#567) (ab29dd4)
- add destination config, allows saving embeddings to source table (#582)(83631fa)
- add named vectorizers (#622) (278cdf4)