github IliasHad/edit-mind v0.13.0
Breaking Change: V0.13.0

latest releases: v0.14.4, v0.14.3, v0.14.2...
one month ago

Overview

I've been working over the last couple of weeks to make the proof of concept for Edit Mind have the features the users asked for and the ones that I was looking to have when I started this project.

In this release, I've improved the PostgreSQL database schema that will store key data that will be accessed by the user like Videos, Chats, Collections, and Projects.

Split main video indexer into stages:

Instead of having one background job that will handle all tasks, we now have multiple stages starting from transcription, frame analysis, scene creation, text embedding, visual embedding, audio embedding and the final job will be importing the video into PostgresDB and cleaning up files.

New DB Models:

I have added three new database models:

  • Projects: will be responsible for organizing video files into a project that will contain instructions when we're chatting with Edit Mind assistant. For example, I have created a project for my YouTube videos and I added more context about what I'm presenting in these videos. When I'm chatting with the assistant, I can limit the search to only the videos I have in the project and the assistant can have more context about the project itself.

  • Collections: will be auto-generated to help you organize your video library. It will be something like the iOS photos app. We have pre-defined collection definitions and we run them over the 3 vector database collections that we have. For example, you can have a collection of "Happy moments", "Conversations and Talks", "Moments captured at Location X" or "Moments captured with Person X".

  • Exports: when you're chatting with the assistant, you can have the option to stitch the selected scenes that you get back from the assistant or export them as a ZIP file and this is what exports is about.

Audio and Visual Embedding:

We have the text embedding as the base collection but it was limited for the project. I added support for visual or image embedding.

For visual embedding, we extract 5 keyframes per video scene (a video segment of the full video that will be shorter for faster processing) using FFmpeg, get the embedding using clip-vit-base-patch32 and save them in the vector DB with metadata like source, start and end time, faces with the same document ID as the text collection.

For audio embedding, we extract the audio from the video scene using FFmpeg, get the embedding using clap-htsat-unfused and save them in the vector DB with metadata like source, start and end time, faces with the same document ID as the text collection.

Improved Unknown Face Clustering:

When we're passing the video frame into the Face recognition plugin using Deep Face package, we will recognize the face in the frame based on your personal faces library. When we can't recognize the face, we save them in the unknown face registry and folders. I worked on new improvements to compare the face in the current frame with the one in the next frame to check if they're similar or not. If they're similar, we'll update the face X appearance in the JSON file.

So, the user later on in the UI can label the face with less repetition.

Improved Text Search:

I've worked on improving the text search after adding text, visual and audio collections. When you're typing now in the search input, you may get search suggestions and it could be the face name, object, transcription or detected text. If not, we'll take the search query, convert it to embedding for each vector collection (text, visual and audio) and run the search across all 3 and get back the combined results.

New Image Search:

Because we added the visual collection, we can now have the option to search over the videos using an image. We take the uploaded image, convert it to image embedding and search over the visual collection and then get back the exact video scenes where the image or a similar one has appeared.

Improved Python Script:

I have split the python script from the background jobs docker service. Now we have 3 docker services. The python script will communicate with the background job docker service using web socket.

Also, I've worked on refactoring the python script into smaller modular files to make it easier for new contributors and myself to develop and debug.

We're still working on getting the system working properly. Previously the focus was on getting the proof of concept working but right now, the goal is to make it work reliably, improve the code quality, and implement best practices.

Don't miss a new edit-mind release

NewReleases is sending notifications on new releases.