pypi huggingface-hub 0.11.0
Extended HfApi, pagination, simplified login and more

latest releases: 0.23.0, 0.23.0rc1, 0.23.0rc0...
18 months ago

New features and improvements for HfApi

HfApi is the central point to interact with the Hub API (manage repos, create commits,...). The goal is to propose more and more git-related features using HTTP endpoints to allow users to interact with the Hub without cloning locally a repo.

Create/delete tags and branches

from huggingface_hub import create_branch, create_tag, delete_branch, delete_tag

create_tag(repo_id, tag="v0.11", tag_message="Release v0.11")
delete_tag(repo_id, tag="something") # If you created a tag by mistake

create_branch(repo_id, branch="experiment-154")
delete_branch(repo_id, branch="experiment-1") # Clean some old branches

Upload lots of files in a single commit

Making a very large commit was previously tedious. Files are now processed by chunks which makes it possible to upload 25k files in a single commit (and 1Gb payload limitation if uploading only non-LFS files). This should make it easier to upload large datasets.

  • Create commit by streaming a ndjson payload (allow lots of file in single commit) by @Wauplin in #1117

Delete an entire folder

from huggingface_hub import CommitOperationDelete, create_commit, delete_folder

# Delete a single folder
delete_folder(repo_id=repo_id, path_in_repo="logs/")

# Alternatively, use the low-level `create_commit`
create_commit(
    repo_id,
    operations=[
        CommitOperationDelete(path_in_repo="old_config.json") # Delete a file
        CommitOperationDelete(path_in_repo="logs/") # Delete a folder
    ],
    commit_message=...,
)

Support pagination when listing repos

In the future, listing models, datasets and spaces will be paginated on the Hub by default. To avoid breaking changes, huggingface_hub follows already pagination. Output type is currently a list (deprecated), will become a generator in v0.14.

  • Add support for pagination in list_models list_datasets and list_spaces by @Wauplin #1176
  • Deprecate output in list_models by @Wauplin in #1143

Misc

  • Allow create PR against non-main branch by @Wauplin in #1168
  • 1162 Reorder operations correctly in commit endpoint by @Wauplin in #1175

Login, tokens and authentication

Authentication has been revisited to make it as easy as possible for the users.

Unified login and logout methods

from huggingface_hub import login, logout

# `login` detects automatically if you are running in a notebook or a script
# Launch widgets or TUI accordingly
login()

# Now possible to login with a hardcoded token (non-blocking)
login(token="hf_***")

# If you want to bypass the auto-detection of `login`
notebook_login()  # still available
interpreter_login()  # to login from a script

# Logout programmatically
logout()
# Still possible to login from CLI
huggingface-cli login

Set token only for a HfApi session

from huggingface_hub import HfApi

# Token will be sent in every request but not stored on machine
api = HfApi(token="hf_***")

Stop using use_auth_token in favor of token, everywhere

token parameter can now be passed to every method in huggingface_hub. use_auth_token is still accepted where it previously existed but the mid-term goal (~6 months) is to deprecate and remove it.

  • Replace use_auth_token arg by token everywhere by @Wauplin in #1122

Respect git credential helper from the user

Previously, token was stored in the git credential store. Can now be in any helper configured by the user -keychain, cache,...-.

  • Refactor git credential handling in login workflow by @Wauplin in #1138

Better error handling

Helper to dump machine information

# Dump all relevant information. To be used when reporting an issue.
➜ huggingface-cli env

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.11.0.dev0
- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
...

Misc

Modelcards

Few improvements/fixes in the modelcard module:

Cache assets

New feature to provide a path in the cache where any downstream library can store assets (processed data, files from the web, extracted data, rendered images,...)

  • [RFC] Proposal for a way to cache files in downstream libraries by @Wauplin in #1088

Documentation updates

Breaking changes

  • Cannot provide an organization to create_repo
  • identical_ok removed in upload_file
  • Breaking changes in arguments for validate_preupload_info, prepare_commit_payload, _upload_lfs_object (internal helpers for the commit API)
  • huggingface_hub.snapshot_download is not exposed as a public module anymore

Deprecations

  • Remove deprecated code from v0.9, v0.10 and v0.11 by @Wauplin in #1092
  • Rename languages to langage + remove duplicate code in tests by @Wauplin in #1169
  • Deprecate output in list_models by @Wauplin in #1143
  • Set back feature to create a repo when using clone_from by @Wauplin in #1187

Internal

Bugfixes & small improvements

Don't miss a new huggingface-hub release

NewReleases is sending notifications on new releases.