Highlights
- Chrome Side Panel: Chat mode with metrics bar, message queue, and improved context (full transcript + summary metadata, jump-to-latest).
- Slides: YouTube slide screenshots + OCR + transcript-aligned cards, timestamped seek, and an OCR/Transcript toggle.
- Media-aware summarization in the Side Panel: Page vs Video/Audio dropdown, automatic media preference on video sites, plus visible word count/duration.
- CLI: robust URL + media extraction with transcript-first workflows and cache-aware streaming.
Features
- Slides: extract slide screenshots + OCR for YouTube/direct video URLs in the CLI + extension (#41, thanks @philippb).
- Slides: top-of-summary slide strip with expand/collapse full-width cards, timestamps, and click-to-seek.
- Slides: slide descriptions without model calls (transcript windowing, OCR fallback) + OCR/Transcript toggle.
- Slides: stream slide extraction status/progress and show a single header progress bar (no duplicate spinners).
- Chrome Side Panel chat: stream agent replies over SSE and restore chat history from daemon cache (#33, thanks @dougvk).
- Chrome Side Panel chat: timestamped transcript context plus clickable
[mm:ss]links that seek the current media. - Summaries: when transcript timestamps are available, prompts require timestamped bullet summaries; side panel auto-links
[mm:ss]in summaries for media. - Transcripts:
--timestampsadds segment-level timings (transcriptSegments+transcriptTimedText) for YouTube, podcasts, and embedded captions. - Media-aware summarization in the Side Panel: Page vs Video/Audio dropdown, automatic media preference on video sites, plus visible word count/duration.
- CLI: transcribe local audio/video files with mtime-aware transcript cache invalidation (thanks @mvance!).
- Browser extension: add Firefox sidebar build + multi-browser config (#31, thanks @vlnd0).
- Chrome automation: add artifacts tool + REPL helpers for persistent session files (notes/JSON/CSV) and downloads.
- Chrome automation: expand navigate tool with list/switch tab support and return matching skills after navigation.
Fixes
- Prompts: ignore sponsor/ads segments in video and podcast summaries.
- Prompts: enforce no-ads/no-skipped language and italicized standout excerpts (no quotation marks).
- Media: route direct media URLs to the transcription pipeline and raise the local media limit to 2GB (#47, thanks @n0an).
- Slides: render Slide X/Y labels and parse slide markers more robustly in streaming output.
- Slides: ensure slide summary segments start with a title line when missing.
- Slides: progress updates during yt-dlp downloads and OSC progress mirrors slide extraction.
- Slides: reuse the media cache for downloaded videos (even with
--no-cache). - Slides: clear slide progress line before the finish summary to avoid stray
Slides x/youtput. - Slides: parse
Slide N/Totallabels and stabilize title/body extraction. - CLI:
--no-cachenow bypasses summary caching only; transcript/media caches still apply. - Chrome Side Panel chat: keep auto-scroll pinned while streaming when you’re already at the bottom.
- Chrome Side Panel: scope streams/state per window so other windows don’t wipe active summaries.
- Chrome Side Panel chat: support JSON agent replies with explicit SSE/JSON negotiation to avoid “stream ended” errors.
- Chrome Side Panel chat: clear streaming placeholders on errors/aborts.
- Chrome Side Panel: add inline error toast above chat composer; errors stay visible when scrolled.
- Chrome Side Panel: clear/hide the inline error toast when no message is present to avoid empty red boxes.
- Cache: include transcript timestamp requests in extract cache keys so timed summaries don’t reuse plain transcript content.
- Extract-only: remove implicit 8k cap; new
--max-extract-characters/daemonmaxExtractCharactersallow opt-in limits; resolves transcript truncation. - Automation: require userScripts (no isolated-world fallback), with improved guidance and in-panel permission notice.
- Daemon: avoid URL flow crashes when url-preference helpers are missing (ReferenceError guard).
- CLI: clear OSC progress on SIGINT/SIGTERM to avoid stuck indicators.
- Slides: detect headline-style first lines and render them as slide titles (no required
Title:markers). - YouTube: prefer English caption variants (
en-*) when selecting caption tracks.
Improvements
- Daemon: emit slides start/progress/done metadata in extended logging for easier debugging.
- Media: refactor routing helpers and size policy (#48, thanks @steipete).
- CLI: show determinate transcription progress percent when duration is known.
- CLI: theme transcription progress lines and mirror part-based progress to OSC when duration is unknown.
- CLI: show determinate OSC progress for transcription/download when totals are known.
- CLI: keep OSC progress determinate when recent percent updates are available.
- CLI: theme tweet/extraction progress lines for consistent loading indicators.
- CLI: theme file/slide spinner labels so all progress lines share the same styling.
- CLI: simplify media download labels (avoid “media, video” duplication).
- Transcription: add auto transcriber selection (default) with ONNX-first when configured +
summarize transcriber setup. - Slides: cap auto slide targets at 6 by default for long videos.
- CLI: add themed output (24-bit ANSI),
--theme, and config/env defaults for a consistent color scheme. - Cache: add media download caching with TTL/size caps + optional verification, plus
--no-media-cache. - Slides: render headline-style first lines as slide titles above the slide marker.
- Prompts: allow straight quotes and encourage 1-2 short exact quotes when relevant.
Docs
- README: 0.10.0 preview layout with clearer install flow, daemon rationale, and prominent Chrome Web Store link.
- README: document ONNX transcriber setup + auto selection.
- README/docs: add UI theme config + ONNX install hints.