github microsoft/markitdown v0.1.0

latest release: v0.1.1
8 days ago

Overview

Version 0.1.0 (previously 0.1.0a6) is a large release, bringing many improvements over the previous 0.0.2 version.

High-level changes include:

  • Organized dependencies into feature groups — install only the converters you need, or get everything with pip install markitdown[all]
  • A new plugin-based architecture, allowing 3rd-party developers to add functionality to MarkItDown (see the sample plugin)
  • All conversions are performed in-memory — no more temporary files
  • Support for new formats including EPUB
  • Option to keep data URIs in converted Markdown
  • Option to override MIME type, extension, and charset in the command-line interface (useful when reading input from a pipe or stdin)

Breaking changes

  • As noted above, dependencies are now organized into optional feature groups. Use pip install markitdown[all] for backward-compatible behavior.
  • convert_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, which also accepted text file-like objects, like io.StringIO.
  • The DocumentConverter class interface has changed to read from file-like streams rather than file paths. No temporary files are created anymore. If you are the maintainer of a plugin or custom DocumentConverter, you likely need to update your code. Otherwise, if you're only using the MarkItDown class or CLI (as in these examples), you should not need to change anything.
     

Detailed list of contributions

New Contributors

Full Changelog: v0.0.2...v0.1.0

Don't miss a new markitdown release

NewReleases is sending notifications on new releases.