v0.0.14: LFS Auto tracking, dataset_info
and list_datasets
, documentation
Datasets
Datasets repositories get better support, by first enabling full usage of the Repository
class for datasets repositories:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")
Datasets can now be retrieved from the Python runtime using the list_datasets
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
datasets = api.list_datasets()
len(datasets)
# 1048 publicly available dataset repositories at the time of writing
Information can be retrieved on specific datasets using the dataset_info
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# id: squad
# lastModified: 2021-07-07T13:18:53.595Z
# tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found',
# [...]
- Add dataset_info and list_datasets #164 (@lhoestq)
- Enable dataset repositories #151 (@LysandreJik)
Inference API wrapper client
Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests
anymore. See below for an example.
from huggingface_hub import InferenceApi
api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
# {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'},
# {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'},
# {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'},
# {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'},
# {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]
- Inference API wrapper client #65 (@osanseviero)
Auto-track with LFS
Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files
method:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!
It is automatically used when leveraging the commit
context manager:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
# add large files
# No push rejected error anymore!
- Auto track with LFS #177 (@LysandreJik)
Documentation
- Update docs structure #145 (@Pierrci)
- Update links to docs #147 (@LysandreJik)
- Add new repo guide #153 (@osanseviero)
- Add documentation for endpoints #155 (@osanseviero)
- Document hf.co webhook publicly #156 (@julien-c)
- docs: ✏️ mention the Training metrics tab #193 (@severo)
- doc for Spaces #189 (@julien-c)
Breaking changes
Reminder: the huggingface_hub
library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.
Two breaking changes are introduced with version v0.0.14.
The whoami
return changes from a tuple to a dictionary
- Allow obtaining Inference API tokens with whoami #157 (@osanseviero)
The whoami
method changes its returned value from a tuple of (<user>, [<organisations>])
to a dictionary containing a lot more information:
In versions v0.0.13 and below, here was the behavior of the whoami
method from the HfApi
class:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])
In version v0.0.14, this is updated to the following:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
# 'type': str,
# 'name': str,
# 'fullname': str,
# 'email': str,
# 'emailVerified': bool,
# 'apiToken': str,
# `plan': str,
# 'avatarUrl': str,
# 'orgs': List[str]
# }
The Repository
's use_auth_token
initialization parameter now defaults to True
.
The use_auth_token
initialization parameter of the Repository
class now defaults to True
. The behavior is unchanged if users are not logged in, at which point Repository
remains agnostic to the huggingface_hub
.
- Set use_auth_token to True by default #204 (@LysandreJik)
Improvements and bugfixes
- Add sklearn code snippet #133 (@osanseviero)
- Allow passing only model ID to clone when authenticated #150 (@LysandreJik)
- More robust endpoint with toggled staging endpoint #148 (@LysandreJik)
- Add config to list_models #152 (@osanseviero)
- Fix audio-to-audio widget and add icon #142 (@osanseviero)
- Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)
- docs: fix webhook response format #162 (@severo)
- Update link in README.md #163 (@nateraw)
- Revert "docs: fix webhook response format (#162)" #165 (@severo)
- Add Keras docker image #117 (@osanseviero)
- Allow multiple models when testing a pipeline #124 (@osanseviero)
- scikit rebased #170 (@Narsil)
- Upgrading community frameworks to
audio-to-audio
. #94 (@Narsil) - Add sagemaker docs #173 (@philschmid)
- Add Structured Data Classification as task #172 (@osanseviero)
- Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)
- Updating spacy. #179 (@Narsil)
- Create initial superb docker image structure #181 (@osanseviero)
- Upgrading asteroid image. #175 (@Narsil)
- Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)
- Fixing audio-to-audio validation. #184 (@Narsil)
rmdir api-inference-community/src/sentence-transformers
#188 (@Pierrci)- Allow generic inference for ASR for superb #185 (@osanseviero)
- Add timestamp to snapshot download tests #201 (@LysandreJik)
- No need for token to understand HF urls #203 (@LysandreJik)
- Remove
--no_renames
argument to list deleted files. #205 (@LysandreJik)