v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation

Datasets

Datasets repositories get better support, by first enabling full usage of the Repository class for datasets repositories:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")

Datasets can now be retrieved from the Python runtime using the list_datasets method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
datasets = api.list_datasets()

len(datasets)
# 1048 publicly available dataset repositories at the time of writing

Information can be retrieved on specific datasets using the dataset_info method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# 	id: squad
#	lastModified: 2021-07-07T13:18:53.595Z
#	tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found', 
# [...]

Add dataset_info and list_datasets #164 (@lhoestq)
Enable dataset repositories #151 (@LysandreJik)

Inference API wrapper client

Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests anymore. See below for an example.

from huggingface_hub import InferenceApi

api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
#    {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'}, 
#    {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'}, 
#    {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'}, 
#    {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'}, 
#    {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]

Inference API wrapper client #65 (@osanseviero)

Auto-track with LFS

Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files method:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")

# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!

It is automatically used when leveraging the commit context manager:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
    # add large files

# No push rejected error anymore!

Auto track with LFS #177 (@LysandreJik)

Documentation

Update docs structure #145 (@Pierrci)
Update links to docs #147 (@LysandreJik)
Add new repo guide #153 (@osanseviero)
Add documentation for endpoints #155 (@osanseviero)
Document hf.co webhook publicly #156 (@julien-c)
docs: ✏️ mention the Training metrics tab #193 (@severo)
doc for Spaces #189 (@julien-c)

Breaking changes

Reminder: the huggingface_hub library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.

Two breaking changes are introduced with version v0.0.14.

The `whoami` return changes from a tuple to a dictionary

Allow obtaining Inference API tokens with whoami #157 (@osanseviero)

The whoami method changes its returned value from a tuple of (<user>, [<organisations>]) to a dictionary containing a lot more information:

In versions v0.0.13 and below, here was the behavior of the whoami method from the HfApi class:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])

In version v0.0.14, this is updated to the following:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
#     'type': str, 
#     'name': str, 
#     'fullname': str, 
#     'email': str,
#     'emailVerified': bool, 
#     'apiToken': str,
#     `plan': str, 
#     'avatarUrl': str,
#     'orgs': List[str]
# }

The `Repository`'s `use_auth_token` initialization parameter now defaults to `True`.

The use_auth_token initialization parameter of the Repository class now defaults to True. The behavior is unchanged if users are not logged in, at which point Repository remains agnostic to the huggingface_hub.

Set use_auth_token to True by default #204 (@LysandreJik)

Improvements and bugfixes

Add sklearn code snippet #133 (@osanseviero)
Allow passing only model ID to clone when authenticated #150 (@LysandreJik)
More robust endpoint with toggled staging endpoint #148 (@LysandreJik)
Add config to list_models #152 (@osanseviero)
Fix audio-to-audio widget and add icon #142 (@osanseviero)
Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)
docs: fix webhook response format #162 (@severo)
Update link in README.md #163 (@nateraw)
Revert "docs: fix webhook response format (#162)" #165 (@severo)
Add Keras docker image #117 (@osanseviero)
Allow multiple models when testing a pipeline #124 (@osanseviero)
scikit rebased #170 (@Narsil)
Upgrading community frameworks to audio-to-audio. #94 (@Narsil)
Add sagemaker docs #173 (@philschmid)
Add Structured Data Classification as task #172 (@osanseviero)
Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)
Updating spacy. #179 (@Narsil)
Create initial superb docker image structure #181 (@osanseviero)
Upgrading asteroid image. #175 (@Narsil)
Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)
Fixing audio-to-audio validation. #184 (@Narsil)
rmdir api-inference-community/src/sentence-transformers #188 (@Pierrci)
Allow generic inference for ASR for superb #185 (@osanseviero)
Add timestamp to snapshot download tests #201 (@LysandreJik)
No need for token to understand HF urls #203 (@LysandreJik)
Remove --no_renames argument to list deleted files. #205 (@LysandreJik)

huggingface/huggingface_hub v0.0.14 v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation on GitHub

v0.0.14: LFS Auto tracking, dataset_info and list_datasets, documentation