New features and improvements for HfApi
HfApi
is the central point to interact with the Hub API (manage repos, create commits,...). The goal is to propose more and more git-related features using HTTP endpoints to allow users to interact with the Hub without cloning locally a repo.
Create/delete tags and branches
from huggingface_hub import create_branch, create_tag, delete_branch, delete_tag
create_tag(repo_id, tag="v0.11", tag_message="Release v0.11")
delete_tag(repo_id, tag="something") # If you created a tag by mistake
create_branch(repo_id, branch="experiment-154")
delete_branch(repo_id, branch="experiment-1") # Clean some old branches
- Add a
create_tag
method to create tags from the HTTP endpoint by @Wauplin in #1089 - Add
delete_tag
method toHfApi
by @Wauplin in #1128 - Create tag twice doesn't work by @Wauplin in #1149
- Add "create_branch" and "delete_branch" endpoints by @Wauplin #1181
Upload lots of files in a single commit
Making a very large commit was previously tedious. Files are now processed by chunks which makes it possible to upload 25k files in a single commit (and 1Gb payload limitation if uploading only non-LFS files). This should make it easier to upload large datasets.
- Create commit by streaming a ndjson payload (allow lots of file in single commit) by @Wauplin in #1117
Delete an entire folder
from huggingface_hub import CommitOperationDelete, create_commit, delete_folder
# Delete a single folder
delete_folder(repo_id=repo_id, path_in_repo="logs/")
# Alternatively, use the low-level `create_commit`
create_commit(
repo_id,
operations=[
CommitOperationDelete(path_in_repo="old_config.json") # Delete a file
CommitOperationDelete(path_in_repo="logs/") # Delete a folder
],
commit_message=...,
)
Support pagination when listing repos
In the future, listing models, datasets and spaces will be paginated on the Hub by default. To avoid breaking changes, huggingface_hub
follows already pagination. Output type is currently a list (deprecated), will become a generator in v0.14
.
- Add support for pagination in list_models list_datasets and list_spaces by @Wauplin #1176
- Deprecate output in list_models by @Wauplin in #1143
Misc
- Allow create PR against non-main branch by @Wauplin in #1168
- 1162 Reorder operations correctly in commit endpoint by @Wauplin in #1175
Login, tokens and authentication
Authentication has been revisited to make it as easy as possible for the users.
Unified login
and logout
methods
from huggingface_hub import login, logout
# `login` detects automatically if you are running in a notebook or a script
# Launch widgets or TUI accordingly
login()
# Now possible to login with a hardcoded token (non-blocking)
login(token="hf_***")
# If you want to bypass the auto-detection of `login`
notebook_login() # still available
interpreter_login() # to login from a script
# Logout programmatically
logout()
# Still possible to login from CLI
huggingface-cli login
Set token only for a HfApi
session
from huggingface_hub import HfApi
# Token will be sent in every request but not stored on machine
api = HfApi(token="hf_***")
Stop using use_auth_token
in favor of token
, everywhere
token
parameter can now be passed to every method in huggingface_hub
. use_auth_token
is still accepted where it previously existed but the mid-term goal (~6 months) is to deprecate and remove it.
Respect git credential helper from the user
Previously, token was stored in the git credential store
. Can now be in any helper configured by the user -keychain, cache,...-.
Better error handling
Helper to dump machine information
# Dump all relevant information. To be used when reporting an issue.
➜ huggingface-cli env
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.11.0.dev0
- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
...
Misc
- Cache not found is not an error by @singingwolfboy in #1101
- Propagate error messages when multiple on BadRequest by @Wauplin in #1115
- Add error message from
x-error-message
header if exists by @Wauplin in #1121
Modelcards
Few improvements/fixes in the modelcard module:
- 🎨 make repocard content a property by @nateraw in #1147
- ✅ fix content string in repocard tests by @nateraw in #1155
- Add Hub verification token to evaluation metadata by @lewtun in #1142
- Use default
model_name
inmetadata_update
by @lvwerra in #1157 - Refer to modelcard creator app from doc by @Wauplin in #1184
- Parent Model --> Finetuned from model by @meg-huggingface #1191
- FIX overwriting metadata when both verified and unverified reported values by @Wauplin in #1186
Cache assets
New feature to provide a path in the cache where any downstream library can store assets (processed data, files from the web, extracted data, rendered images,...)
Documentation updates
- Fixing a typo in the doc. by @Narsil in #1113
- Fix docstring of list_datasets by @albertvillanova in #1125
- Add repo_type=dataset possibility to guide by @Wauplin in #1134
- Fix PyTorch & Keras mixin doc by @lewtun in #1139
- Update how-to-manage.mdx by @severo in #1150
- Typo fix by @meg-huggingface in #1166
- Adds link to model card metadata spec by @meg-huggingface in #1171
- Removing "Related Models" & just asking for "Parent Model" by @meg-huggingface in #1178
Breaking changes
- Cannot provide an organization to
create_repo
identical_ok
removed inupload_file
- Breaking changes in arguments for
validate_preupload_info
,prepare_commit_payload
,_upload_lfs_object
(internal helpers for the commit API) huggingface_hub.snapshot_download
is not exposed as a public module anymore
Deprecations
- Remove deprecated code from v0.9, v0.10 and v0.11 by @Wauplin in #1092
- Rename languages to langage + remove duplicate code in tests by @Wauplin in #1169
- Deprecate output in list_models by @Wauplin in #1143
- Set back feature to create a repo when using clone_from by @Wauplin in #1187
Internal
- Configure pytest to run on staging by default + flags in config by @Wauplin in #1093
- fix search models test by @Wauplin in #1106
- Add mypy in the CI (and fix existing type issues) by @Wauplin in #1097
- Fix deprecation warnings for assertEquals in tests by @Wauplin in #1135
- Skip failing test in ci by @Wauplin in #1148
- 💚 fix mypy ci by @nateraw in #1167
- Update pr docs actions by @mishig25 in #1170
- Revert "Update pr docs actions" by @mishig25 #1192
Bugfixes & small improvements
- Expose list_spaces by @osanseviero in #1132
- respect NO_COLOR env var by @singingwolfboy in #1103
- Fix list_models bool parameters by @Wauplin in #1152
- FIX url encoding in hf_hub_url by @Wauplin in #1164
- Fix cannot create pr on foreign repo by @Wauplin #1183
- Fix
HfApi.move_repo(...)
and complete tests by @Wauplin in #1136 - Commit empty files as regular and warn user by @Wauplin in #1180
- Parse file size in get_hf_file_metadata by @Wauplin #1179
- Fix get file size on lfs by @Wauplin #1188
- More robust create relative symlink in cache by @Wauplin in #1109
- Test running CI on Python 3.11 #1189