✨ New Features
Producer-Consumer Queue: Implemented a real-time data pipeline. Scrapers now toss data onto a "conveyor belt" as they find it, allowing the system to process results instantly without waiting for an entire batch to finish.
"Self-Aware" TUI Widgets: Completely refactored Table UI filters. Components now automatically detect and sync your config.json or CLI flags on launch—no more re-entering filters every time you open the TUI.
Configurable DB Timeouts: Added a DATABASE_TIMEOUT environment variable. You can now increase the wait time to keep the database stable during massive concurrent operations.
Forward-Time Pagination: Corrected the logic for Timeline, Archived, and Stream endpoints to crawl forward in time. The scraper now correctly continues until it passes the newest required ID rather than stopping early.
⚡ Improvements & Optimizations
Async Stream Processing: Scraper functions were converted to async generators using yield. This allows the script to start processing and saving data immediately, drastically reducing the RAM footprint.
- Killed the "Convoy Effect": Fast workers no longer wait for slow workers to finish a "round." Each bucket now paginates independently, resulting in much higher overall throughput.
Centralized Semaphore Management: Moved semaphore handling out of nested loops and into the top-level manager. Everything is now wrapped in strict try...finally blocks to ensure locks are always released, preventing the script from hanging on network errors.
Smooth Progress Bars: Because data is streamed, progress bars tick up smoothly in real-time instead of jumping in large, irregular chunks.
🛠️ Bug Fixes
Database Lock Serialization: Implemented a single-worker ThreadPoolExecutor for all database writes. This eliminates the "Database is locked" errors caused by multiple threads trying to write at the exact same time.
"Ghost Post" Safety Valve: Hardened stop conditions in scraping loops. The scraper now recognizes when it has moved past a post's timestamp, preventing infinite loops if a post was deleted from the API but exists in your local cache.
TUI Crash Prevention: Hardened the Table filter runner to ignore data-only columns, preventing the "NoMatches" exceptions that previously crashed the interface.
Independent Field Logic: Fixed a bug where media_id was accidentally overwriting post_id in the UI by making numerical fields fully independent.
Docker Permission Hardening: Completely rewrote the Docker entrypoint scripts for robust UID/GID remapping. This ensures downloaded files are always owned by the correct user on the host system.
Would you like me to tweak any of these descriptions, or does this cover everything for the post?