Who knew that (efficiently) extracting texts from PDFs was so hard.
PDF Management
- I couldn't make PDFJs run in the background, so I replaced it with a wasm (rust-based) solution
- Changed bundler from esbuild to Rollup, because wasm is much easier to integrate with Rollup
- Omnisearch imports PDFs in the background. This process uses a lot of CPU, so it spawns a maximum of
cpu.count() / 2
worker threads at once. - Extracted text is cached, so PDFs are only treated once.
- The PDF cache is unrelated to the notes cache, so you don't need to activate the later.
- TODO: remove deleted PDFs from the cache
Other changes
- Refactored the cache management system, and compressed cache data
Full Changelog: 1.6.4...1.6.5-beta.3