github huggingface/optimum-intel v1.8.0
v1.8.0: Optimum INC CLI, past key values for OpenVINO decoder models

latest releases: v1.19.0, v1.18.3, push...
17 months ago

Optimum INC CLI

Integration of the Intel Neural Compressor dynamic quantization to the Optimum command line interface. Example commands:

optimum-cli inc --help
optimum-cli inc quantize --help
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output int8_distilbert/
  • Add Optimum INC CLI to apply dynamic quantization by @echarlaix in #280

Levarage past key values for OpenVINO decoder models

Enable the possibility to use the pre-computed key / values in order to make inference faster. This will be enabled by default when exporting the model.

model = OVModelForCausalLM.from_pretrained(model_id, export=True)

To disable it, use_cache can be set to False when loading the model:

model = OVModelForCausalLM.from_pretrained(model_id, export=True, use_cache=False)
  • Enable the possibility to use the pre-computed key / values for OpenVINO decoder models by @echarlaix in #274

INC config summarizing optimizations details

Fixes

  • Remove dynamic shapes restriction for GPU devices by @helena-intel in #262
  • Enable OpenVINO model caching for CPU devices by @helena-intel in #281
  • Fix the .to() method for causal langage models by @helena-intel in #284
  • Fix pytorch model saving for transformers>=4.28.0 when optimized with OVTrainer @echarlaix in #285
  • Update for task name for ONNX and OpenVINO export for optimum>=1.8.0 by @echarlaix in #286

Don't miss a new optimum-intel release

NewReleases is sending notifications on new releases.