github kepano/defuddle 0.9.0

latest releases: 0.18.1, 0.18.0, 0.17.0...
one month ago

Improvements

  • Async extraction support e.g. X URLs
  • Generic footnote detection fallback and backref cleanup (#138, #120)
  • Substack app support
  • Better Wikidot support
  • Better heading/code/pre preservation
  • Shiki language detection for code blocks
  • Improved scoring around code blocks and bios
  • Fixed nested list indentation

Fixes

  • Fix HTML element with id="menu" breaking content extraction (#106)
  • Fix page content not being able to start with a divider (#114)
  • Fix invalid CSS selector span.leading-tight,, img (#128)
  • Fix [href*="/category"] exact selector removing legitimate page content (#131)
  • Fix .hero exact selector removing primary content on documentation landing pages (#132)
  • Fix content of <time> element being removed (#136)
  • Fix DOMParser is not defined when running via defuddle/node (#137)
  • Fix content sanitization bypass via schema.org text fallback (#139)

Security

  • Fix XSS via attribute injection in image handling
  • Sanitize HTML to prevent unsafe elements in schema text fallback (#139)

Other

  • New website (#133), playground updates, README updates

Don't miss a new defuddle release

NewReleases is sending notifications on new releases.