Optimum INC CLI

Integration of the Intel Neural Compressor dynamic quantization to the Optimum command line interface. Example commands:

optimum-cli inc --help
optimum-cli inc quantize --help
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output int8_distilbert/

Add Optimum INC CLI to apply dynamic quantization by @echarlaix in #280

Levarage past key values for OpenVINO decoder models

Enable the possibility to use the pre-computed key / values in order to make inference faster. This will be enabled by default when exporting the model.

model = OVModelForCausalLM.from_pretrained(model_id, export=True)

To disable it, use_cache can be set to False when loading the model:

model = OVModelForCausalLM.from_pretrained(model_id, export=True, use_cache=False)

Enable the possibility to use the pre-computed key / values for OpenVINO decoder models by @echarlaix in #274

INC config summarizing optimizations details

Add INCConfig by @echarlaix in #263

Fixes

Remove dynamic shapes restriction for GPU devices by @helena-intel in #262
Enable OpenVINO model caching for CPU devices by @helena-intel in #281
Fix the .to() method for causal langage models by @helena-intel in #284
Fix pytorch model saving for transformers>=4.28.0 when optimized with OVTrainer @echarlaix in #285
Update for task name for ONNX and OpenVINO export for optimum>=1.8.0 by @echarlaix in #286

huggingface/optimum-intel v1.8.0 v1.8.0: Optimum INC CLI, past key values for OpenVINO decoder models on GitHub

Optimum INC CLI

Levarage past key values for OpenVINO decoder models

INC config summarizing optimizations details

Fixes

huggingface/optimum-intel v1.8.0
v1.8.0: Optimum INC CLI, past key values for OpenVINO decoder models

on GitHub