github ggml-org/llama.cpp b7852

latest releases: b7865, b7864, b7862...
20 hours ago
Details

sampling : remove sampling branching in output_reserve (#18811)

  • sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.