Models:
- Llama 3: a new model by Meta, and the most capable openly available LLM to date
- Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
- Moondream moondream is a small vision language model designed to run efficiently on edge devices.
- Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
- Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
- Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations
What's Changed
- Fixed issues where the model would not terminate, causing the API to hang.
- Fixed a series of out of memory errors on Apple Silicon Macs
- Fixed out of memory errors when running Mixtral architecture models
Experimental concurrency features
New concurrency features are coming soon to Ollama. They are available
OLLAMA_NUM_PARALLEL
: Handle multiple requests simultaneously for a single modelOLLAMA_MAX_LOADED_MODELS
: Load multiple models simultaneously
To enable these features, set the environment variables for ollama serve
. For more info see this guide:
OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
New Contributors
- @sidxt made their first contribution in #3705
- @ChengenH made their first contribution in #3789
- @secondtruth made their first contribution in #3503
- @reid41 made their first contribution in #3612
- @ericcurtin made their first contribution in #3626
- @JT2M0L3Y made their first contribution in #3633
- @datvodinh made their first contribution in #3655
- @MapleEve made their first contribution in #3817
- @swuecho made their first contribution in #3810
- @brycereitano made their first contribution in #3895
- @bsdnet made their first contribution in #3889
- @fyxtro made their first contribution in #3855
- @natalyjazzviolin made their first contribution in #3962
Full Changelog: v0.1.32...v0.1.33-rc5