github ggml-org/llama.cpp b7748

latest releases: b7751, b7750, b7749...
9 hours ago
Details

llama : add adaptive-p sampler (#17927)

  • initial commit for branch

  • simplify constants

  • add params to struct common_params_sampling, add reference to PR

  • explicitly clamp min_target and max_target to [0.0, 1.0]

  • add args, rename queue_size -> window_size

  • improved comments

  • minor

  • remove old unused code from algorithm

  • minor

  • add power law case to common_sampler_init, add sampler name mappings

  • clarify behaviour when window_size = 0

  • add missing enums

  • remove target_range param, make target == 1 no-op, cleanup code

  • oops, straggler

  • add missing parameters in server-task.cpp

  • copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

  • remove old debug log, style nit

  • fix compiler warning, add commented-out logging per token

  • re-write + change parameters + simplify

  • oops forgot args.cpp

  • fix leftover window_size

  • add missing values to common_params_sampling::print()

  • with logging

  • does this fix it?

  • no, but does this?

  • update default decay

  • optimize

  • fix bad merge

my git skills are lacking

  • silence missing initializer for member

  • update default decay to 0.9

  • fix logging

  • format (double)

  • add power law to the new samplers vector

  • log sampler init values

  • improve logging messages in llama_sampler_power_law

  • remove extraneous logging

  • simplify target computation

last commit with debug logging!

  • remove debug logging, explicitly clamp params at init

  • add use_power_law flag + logic, minor cleanup

  • update power-law -> adaptive-p

  • fix cold start EMA

  • ctx->weighted_sum is now initialized and reset to target / (1.0f - clamped_decay)
  • ctx->total_weight is now initialized and reset to 1.0f / (1.0f - clamped_decay)

this fixes a "cold start" problem with the moving average

  • update SHARPNESS constant to 10.0f

  • minor style fixes

no functional changes

  • minor style fixes cont.

  • update llama_sampler_adaptive_p_i for backend sampling (ref: #17004)

  • separate into apply + accept functions

  • pending_token_idx: switch from llama_token to int32

functionally identical (llama.h has typedef int32_t llama_token;),
but its more correct now

  • don't transform logits <= -1e9f

  • fix masking in backend top-p, min-p

  • address review comments

  • typo in comments RND -> RNG

  • add docs

  • add recommended values in completion docs

  • address PR feedback

  • remove trailing whitespace (for CI editorconfig)

  • add to adaptive-p to common_sampler_types_from_chars

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.