jjjake/internetarchive v5.10.0 on GitHub

Features and Improvements

Added ia download --range for partial (byte-range) downloads. It requires
--stdout and is repeatable, taking [FILE:]START-END values: a bare
range binds to the named file (vary the range or the file, not both at once),
or FILE:START-END binds each range to its own file. Ranges may be given as
START-END, open-ended START-, suffix -N (the last N bytes), or
bytes=..., and a single value may carry several comma-separated ranges
(0-9,50-99), fetched in order. Segments are streamed back-to-back with no separator, so e.g.
WARC records selected via a CDX index's compressed offset/length can be piped
straight to zcat. Useful for partial fetches of private items (configured
credentials are used). Item.download(), File.download(), and the
top-level internetarchive.download() gained a headers argument, and
Item.download() a range_jobs argument; passing a Range header is
treated as an intentional partial fetch and disables resume and full-file
checksum validation. An unsatisfiable range (HTTP 416) fails fast with a
clear message instead of being retried; a range covering the whole file
returns the full contents (HTTP 200). If any segment fails,
ia download exits non-zero, so a downstream pipe consumer can tell the
output is incomplete.
Added ia tasks --follow-task-log <task_id> to follow a task log live
as the task runs (tail -f style), stopping automatically when the task
finishes. Combine with -p lines=-N to seed the last N lines first
(Tasks API lines semantics, as with --get-task-log); any other
-p params are forwarded to the Tasks API. A new
ArchiveSession.follow_task_log() method exposes the same behavior to the
library.

Bugfixes

Fixed File.download(stdout=True) consulting the local filesystem: a
same-named local file could cause the stream to be skipped (length/date or
checksum match) or trigger the auto-resume code path, which seeks the output
and fails on a pipe. A stdout download now ignores any on-disk file.
Fixed a retried stdout download falling back to writing a local disk file
instead of the pipe (leaving the pipe empty). A stdout download now always
writes to stdout, even across retries.
Fixed auto-resume corrupting a file when a resumed transfer was itself retried:
the internal Range header was not recomputed for the retry, so it no longer
matched the (grown) local file and the seek offset, re-fetching already-written
bytes. The resume Range is now recomputed from the current file size on
every attempt.
Fixed ia tasks --parameter crashing when combined with
--get-task-log. Parameters such as lines are now merged into the
task log request's query string, allowing ia tasks -G <task_id> -p lines=100 to fetch a truncated log. get_task_log() gained a params
argument; params and request_kwargs are now keyword-only and kept
distinct, so request kwargs (e.g. timeout, headers) are no longer
serialized into the URL as query parameters
(#764 <https://github.com/jjjake/internetarchive/pull/764>_).

jjjake/internetarchive v5.10.0 Version 5.10.0 on GitHub

jjjake/internetarchive v5.10.0
Version 5.10.0

on GitHub