huggingface/pytorch-image-models v0.8.2dev0 on GitHub

Part way through the conversion of models to multi-weight support (model_arch.pretrain_tag), module reorg for future building, and lots of new weights and model additions as we go...

This is considered a development release. Please stick to 0.6.x if you need stability. Some of the model names, tags will shift a bit, some old names have already been deprecated and remapping support not added yet. For code 0.6.x branch is considered 'stable' https://github.com/rwightman/pytorch-image-models/tree/0.6.x

Dec 23, 2022 🎄☃

Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
More ImageNet-12k (subset of 22k) pretrain models popping up:
- efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
- vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
- vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
- convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

Dec 8, 2022

Add 'EVA l' to vision_transformer.py, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
- original source: https://github.com/baaivision/EVA

model	top1	param_count	gmac	macts	hub
eva_large_patch14_336.in22k_ft_in22k_in1k	89.2	304.5	191.1	270.2	link
eva_large_patch14_336.in22k_ft_in1k	88.7	304.5	191.1	270.2	link
eva_large_patch14_196.in22k_ft_in22k_in1k	88.6	304.1	61.6	63.5	link
eva_large_patch14_196.in22k_ft_in1k	87.9	304.1	61.6	63.5	link

Dec 6, 2022

Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to beit.py.
- original source: https://github.com/baaivision/EVA
- paper: https://arxiv.org/abs/2211.07636

model	top1	param_count	gmac	macts	hub
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.8	1014.4	1906.8	2577.2	link
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.6	1013	620.6	550.7	link
eva_giant_patch14_336.clip_ft_in1k	89.4	1013	620.6	550.7	link
eva_giant_patch14_224.clip_ft_in1k	89.1	1012.6	267.2	192.6	link

Dec 5, 2022

Pre-release (0.8.0dev0) of multi-weight support (model_arch.pretrained_tag). Install with pip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use --torchcompile argument
Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models

model	top1	param_count	gmac	macts	hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k	88.6	632.5	391	407.5	link
vit_large_patch14_clip_336.openai_ft_in12k_in1k	88.3	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k	88.2	632	167.4	139.4	link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k	88.2	304.5	191.1	270.2	link
vit_large_patch14_clip_224.openai_ft_in12k_in1k	88.2	304.2	81.1	88.8	link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_224.openai_ft_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_336.laion2b_ft_in1k	87.9	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in1k	87.6	632	167.4	139.4	link
vit_large_patch14_clip_224.laion2b_ft_in1k	87.3	304.2	81.1	88.8	link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k	87.2	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in12k_in1k	87	86.9	55.5	101.6	link
vit_base_patch16_clip_384.laion2b_ft_in1k	86.6	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in1k	86.2	86.9	55.5	101.6	link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k	86.2	86.6	17.6	23.9	link
vit_base_patch16_clip_224.openai_ft_in12k_in1k	85.9	86.6	17.6	23.9	link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k	85.8	88.3	17.9	23.9	link
vit_base_patch16_clip_224.laion2b_ft_in1k	85.5	86.6	17.6	23.9	link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k	85.4	88.3	13.1	16.5	link
vit_base_patch16_clip_224.openai_ft_in1k	85.3	86.6	17.6	23.9	link
vit_base_patch32_clip_384.openai_ft_in12k_in1k	85.2	88.3	13.1	16.5	link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k	83.3	88.2	4.4	5	link
vit_base_patch32_clip_224.laion2b_ft_in1k	82.6	88.2	4.4	5	link
vit_base_patch32_clip_224.openai_ft_in1k	81.9	88.2	4.4	5	link

Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing

model	top1	param_count	gmac	macts	hub
maxvit_xlarge_tf_512.in21k_ft_in1k	88.5	475.8	534.1	1413.2	link
maxvit_xlarge_tf_384.in21k_ft_in1k	88.3	475.3	292.8	668.8	link
maxvit_base_tf_512.in21k_ft_in1k	88.2	119.9	138	704	link
maxvit_large_tf_512.in21k_ft_in1k	88	212.3	244.8	942.2	link
maxvit_large_tf_384.in21k_ft_in1k	88	212	132.6	445.8	link
maxvit_base_tf_384.in21k_ft_in1k	87.9	119.6	73.8	332.9	link
maxvit_base_tf_512.in1k	86.6	119.9	138	704	link
maxvit_large_tf_512.in1k	86.5	212.3	244.8	942.2	link
maxvit_base_tf_384.in1k	86.3	119.6	73.8	332.9	link
maxvit_large_tf_384.in1k	86.2	212	132.6	445.8	link
maxvit_small_tf_512.in1k	86.1	69.1	67.3	383.8	link
maxvit_tiny_tf_512.in1k	85.7	31	33.5	257.6	link
maxvit_small_tf_384.in1k	85.5	69	35.9	183.6	link
maxvit_tiny_tf_384.in1k	85.1	31	17.5	123.4	link
maxvit_large_tf_224.in1k	84.9	211.8	43.7	127.4	link
maxvit_base_tf_224.in1k	84.9	119.5	24	95	link
maxvit_small_tf_224.in1k	84.4	68.9	11.7	53.2	link
maxvit_tiny_tf_224.in1k	83.4	30.9	5.6	35.8	link

Oct 15, 2022

Train and validation script enhancements
Non-GPU (ie CPU) device support
SLURM compatibility for train script
HF datasets support (via ReaderHfds)
TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
in_chans !=3 support for scripts / loader
Adan optimizer
Can enable per-step LR scheduling via args
Dataset 'parsers' renamed to 'readers', more descriptive of purpose
AMP args changed, APEX via --amp-impl apex, bfloat16 supportedf via --amp-dtype bfloat16
main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
master -> main branch rename

huggingface/pytorch-image-models v0.8.2dev0 v0.8.2dev0 Release on GitHub

Dec 23, 2022 🎄☃

Dec 8, 2022

Dec 6, 2022

Dec 5, 2022

Oct 15, 2022

huggingface/pytorch-image-models v0.8.2dev0
v0.8.2dev0 Release

on GitHub