Details
llama : add adaptive-p sampler (#17927)
-
initial commit for branch
-
simplify constants
-
add params to
struct common_params_sampling, add reference to PR -
explicitly clamp
min_targetandmax_targetto[0.0, 1.0] -
add args, rename
queue_size->window_size -
improved comments
-
minor
-
remove old unused code from algorithm
-
minor
-
add power law case to
common_sampler_init, add sampler name mappings -
clarify behaviour when
window_size = 0 -
add missing enums
-
remove
target_rangeparam, maketarget == 1no-op, cleanup code -
oops, straggler
-
add missing parameters in
server-task.cpp -
copy from author
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
-
remove old debug log, style nit
-
fix compiler warning, add commented-out logging per token
-
re-write + change parameters + simplify
-
oops forgot args.cpp
-
fix leftover
window_size -
add missing values to
common_params_sampling::print() -
with logging
-
does this fix it?
-
no, but does this?
-
update default decay
-
optimize
-
fix bad merge
my git skills are lacking
-
silence
missing initializer for member -
update default decay to 0.9
-
fix logging
-
format (double)
-
add power law to the new
samplersvector -
log sampler init values
-
improve logging messages in llama_sampler_power_law
-
remove extraneous logging
-
simplify target computation
last commit with debug logging!
-
remove debug logging, explicitly clamp params at init
-
add
use_power_lawflag + logic, minor cleanup -
update
power-law->adaptive-p -
fix cold start EMA
ctx->weighted_sumis now initialized and reset totarget / (1.0f - clamped_decay)ctx->total_weightis now initialized and reset to1.0f / (1.0f - clamped_decay)
this fixes a "cold start" problem with the moving average
-
update
SHARPNESSconstant to10.0f -
minor style fixes
no functional changes
-
minor style fixes cont.
-
update
llama_sampler_adaptive_p_ifor backend sampling (ref: #17004) -
separate into
apply+acceptfunctions -
pending_token_idx: switch fromllama_tokentoint32
functionally identical (llama.h has typedef int32_t llama_token;),
but its more correct now
-
don't transform logits <= -1e9f
-
fix masking in backend top-p, min-p
-
address review comments
-
typo in comments
RND->RNG -
add docs
-
add recommended values in completion docs
-
address PR feedback
-
remove trailing whitespace (for CI
editorconfig) -
add to adaptive-p to
common_sampler_types_from_chars
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: