v0.4.0: Tag listing, Namespace Objects, Model Filter

2 years ago

Tag listing

This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:

>>> from huggingface_hub import HfApi

>>> api = HfApi() 
>>> tags = api.get_model_tags()
>>> print(tags)
Available Attributes:
 * benchmark
 * language_creators
 * languages
 * licenses
 * multilinguality
 * size_categories
 * task_categories
 * task_ids

>>> print(tags.benchmark)
Available Attributes:
 * raft
 * superb
 * test

Namespace objects

With a goal of adding more tab-completion to the library, this PR introduces two objects:

  • DatasetSearchArguments
  • ModelSearchArguments

These two AttributeDictionary objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization and dataset (or model) _name as well through careful string splitting.

Model Filter

This PR introduces a new way to search the hub: the ModelFilter class.

It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:

f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")

From there, they can pass in this filter to the new list_models_by_filter function in HfApi to search through it:

models = api.list_modes(filter=f)

The API may then be used for complex queries:

args = ModelSearchArguments()
f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])


Ignoring filenames in snapshot_download

This PR introduces a way to limit the files that will be fetched by the snapshot_download. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.

