github neuralmagic/deepsparse v1.7.0
DeepSparse v1.7.0

latest release: v1.7.1
one month ago

New Features:

  • DeepSparse Pipelines v2 was introduced, enabling more complex pipelines to be represented. Text Generation (compatible with Hugging Face Transformers) and Image Classification pipelines have been refactored to the v2 format. (#1324, #1385, #1460, #1596, #1502, #1460, #1626)
  • OpenAI Server compatibility added on top of Pipelines v2. (#1445, #1477)
  • deepsparse.evaluate APIs and CLIs added with plugins for perplexity and lm-eval-harness for LLM evaluations. (#1596)
  • An example was added demonstrating how to use LLMPerf for benchmarking DeepSparse LLM servers. (#1502)
  • Continuous batching support has been added for text generation pipelines and inference server pathways, enabling inference over multiple text streams at once. (#1569, #1571)

Changes:

  • Exposed sequence_length for greater control over text generation pipelines. (#1518)
  • deepsparse.analyze functionality has been updated to work properly with LLMs. (#1324)
  • The logging and timing infrastructure for Pipelines expanded to enable more thorough tracking and logging, in addition to furthering support for integrations with Prometheus and other standard logging platforms. (#1614)
  • UX improved for text generation pipelines to more closely match Hugging Face Transformers pipelines. (#1583, #1584, #1590, #1592, #1598)

Resolved Issues:

  • Compile time for dense LLMs is no longer very slow.
  • Text generation pipeline bug fixes: corrected sampling logic errors and inappropriate in-place logits mutation resulting in incorrect answers for LLMs when using sampling. (#1406, #1414)
  • KV cache was fixed for improper handling of the kv_cache input while using external KV cache management, which resulted in inaccurate model inference for ONNX Runtime comparison pathways. (#1337)
  • Benchmarking runs for LLMs with internal KV cache no longer crash or report inaccurate numbers. (#1512, #1514)
  • SciPy dependencies were removed to address issues for CV pipelines where they would fail on import of scipy and crash. (#1604, #1602)

Known Issues:

  • OPT models produce incorrect outputs and are no longer supported.
  • Streaming support is limited within the DeepSparse Pipeline v2 framework for tasks other than text generation.

Don't miss a new deepsparse release

NewReleases is sending notifications on new releases.