github kepano/defuddle 0.13.0

latest releases: 0.18.1, 0.18.0, 0.17.0...
one month ago

Breaking changes

  • defuddle/node now accepts any DOM Document (linkedom, happy-dom, JSDOM, etc.), not just JSDOM.
  • JSDOM is no longer a peer dependency. linkedom is now the recommended DOM parser
  • Passing a raw HTML string or JSDOM instance to defuddle/node is deprecated and will be removed in the next major version.

Recommended usage

import { parseHTML } from 'linkedom';
import { Defuddle } from 'defuddle/node';

const { document } = parseHTML(html);
const result = await Defuddle(document, 'https://example.com/article');

Passing a JSDOM instance still works but is deprecated:

// @deprecated — pass dom.window.document directly instead
const result = await Defuddle(dom, url);
// Preferred
const result = await Defuddle(dom.window.document, url);

Improvements

  • Generic document support for non-HTML content (#166)
  • YouTube: Use existing page transcript before fetching via API
  • YouTube: Improved transcript grouping, sentence merging, and cross-environment support
  • YouTube: Fix diarization stripping - speaker markers from auto-captions
  • Add .post-body entry point for Ghost CMS sites
  • Smarter retry for hidden content (#163)
  • CJK word count support (#158)
  • Precompile partial selector regex for faster parsing (#157)

Fixes

  • Fix syntax highlighting for Lean (#159, #160)
  • Fix filenames on Windows (#155)
  • Fix spacing between exclamation and image in markdown
  • Fix newlines in Verso

Don't miss a new defuddle release

NewReleases is sending notifications on new releases.