github jjjake/internetarchive v5.10.0
Version 5.10.0

latest release: v5.10.1
3 hours ago

Features and Improvements

  • Added ia download --range for partial (byte-range) downloads. It requires
    --stdout and is repeatable, taking [FILE:]START-END values: a bare
    range binds to the named file (vary the range or the file, not both at once),
    or FILE:START-END binds each range to its own file. Ranges may be given as
    START-END, open-ended START-, suffix -N (the last N bytes), or
    bytes=..., and a single value may carry several comma-separated ranges
    (0-9,50-99), fetched in order. Segments are streamed back-to-back with no separator, so e.g.
    WARC records selected via a CDX index's compressed offset/length can be piped
    straight to zcat. Useful for partial fetches of private items (configured
    credentials are used). Item.download(), File.download(), and the
    top-level internetarchive.download() gained a headers argument, and
    Item.download() a range_jobs argument; passing a Range header is
    treated as an intentional partial fetch and disables resume and full-file
    checksum validation. An unsatisfiable range (HTTP 416) fails fast with a
    clear message instead of being retried; a range covering the whole file
    returns the full contents (HTTP 200). If any segment fails,
    ia download exits non-zero, so a downstream pipe consumer can tell the
    output is incomplete.
  • Added ia tasks --follow-task-log <task_id> to follow a task log live
    as the task runs (tail -f style), stopping automatically when the task
    finishes. Combine with -p lines=-N to seed the last N lines first
    (Tasks API lines semantics, as with --get-task-log); any other
    -p params are forwarded to the Tasks API. A new
    ArchiveSession.follow_task_log() method exposes the same behavior to the
    library.

Bugfixes

  • Fixed File.download(stdout=True) consulting the local filesystem: a
    same-named local file could cause the stream to be skipped (length/date or
    checksum match) or trigger the auto-resume code path, which seeks the output
    and fails on a pipe. A stdout download now ignores any on-disk file.
  • Fixed a retried stdout download falling back to writing a local disk file
    instead of the pipe (leaving the pipe empty). A stdout download now always
    writes to stdout, even across retries.
  • Fixed auto-resume corrupting a file when a resumed transfer was itself retried:
    the internal Range header was not recomputed for the retry, so it no longer
    matched the (grown) local file and the seek offset, re-fetching already-written
    bytes. The resume Range is now recomputed from the current file size on
    every attempt.
  • Fixed ia tasks --parameter crashing when combined with
    --get-task-log. Parameters such as lines are now merged into the
    task log request's query string, allowing ia tasks -G <task_id> -p lines=100 to fetch a truncated log. get_task_log() gained a params
    argument; params and request_kwargs are now keyword-only and kept
    distinct, so request kwargs (e.g. timeout, headers) are no longer
    serialized into the URL as query parameters
    (#764 <https://github.com/jjjake/internetarchive/pull/764>_).

Don't miss a new internetarchive release

NewReleases is sending notifications on new releases.