github ollama/ollama v0.1.33-rc5
v0.1.33

latest releases: v0.4.1-rc0, v0.4.1, v0.4.0...
pre-release6 months ago

3 ollamas

Models:

  • Llama 3: a new model by Meta, and the most capable openly available LLM to date
  • Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
  • Moondream moondream is a small vision language model designed to run efficiently on edge devices.
  • Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
  • Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
  • Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

  • Fixed issues where the model would not terminate, causing the API to hang.
  • Fixed a series of out of memory errors on Apple Silicon Macs
  • Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

  • OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
  • OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

Full Changelog: v0.1.32...v0.1.33-rc5

Don't miss a new ollama release

NewReleases is sending notifications on new releases.