github steipete/summarize v0.2.0

latest releases: v0.10.0, v0.9.0, v0.8.2...
one month ago

Changes

  • Remove map-reduce summarization; reject inputs that exceed the model’s context window.
  • Preflight prompts with a GPT tokenizer against the model’s input limit (LiteLLM catalog).
  • Reject text files over 10 MB before tokenization.
  • Reject too-small numeric --length / --max-output-tokens values.
  • Cap requested summary length to extracted content length.
  • Skip summarization for tweets when extracted content is already below requested length.
  • Use bird CLI for tweet extraction when available; fall back to Nitter when bird fails.
  • Improve fetch spinner; show Firecrawl fallback status + reason.
  • Enforce a hard deadline for stalled streaming; fall back to non-streaming on streaming timeouts.
  • Preserve parentheses in URL paths.

Fixes

  • Avoid Firecrawl fallback when block keywords only appear in scripts/styles.
  • Improve Bird/Nitter error messaging and install hints.

Tests

  • Add coverage for prompt length capping, cumulative stream merge handling, and streaming timeout fallback.
  • Add live coverage for Wikipedia URLs with parentheses.
  • Add coverage for tweet summaries bypassing the LLM when short.

Docs

  • Update release checklist + document input limits and minimum length/token values.

Dev

  • Add a tokenization benchmark script.

Don't miss a new summarize release

NewReleases is sending notifications on new releases.