github kepano/defuddle 0.17.0

2 days ago
  • YouTube: Improved transcript with better break points, CJK support
  • Wikipedia: New minimal extractor, keep phonetic pronunciations, detect math in tables
  • NYT: Improved extractor and additional removals
  • HN: Home page support
  • Math: Extract LaTeX from data-math attributes and from images
  • Footnotes: Fix duplicate backrefs, WordPress fixes (#237)
  • ChatGPT: Update extractor for changed DOM structure (#236)
  • General: Configurable fetch option, remove unnecessary <br> between paragraphs, remove tables with no text or media, replace custom elements with divs during standardization (#247), fix dismiss buttons surviving hidden-content retry (#234), retain ULs against overly aggressive removal
  • Removals: ieee.org, ToC content patterns, breadcrumbs, sidebar/menu checkboxes

Don't miss a new defuddle release

NewReleases is sending notifications on new releases.