Skip to content

Sampling

What is Sampling

If you spend any time in the LLM space you will quickly hear about LLMs being probability machines where they calculate the probality of all the tokens in their vocabulary to be next and then select based on those results the next token. Sampling is the math used in that selection process.

Different use cases can benefit from selecting from the token group differently because of the nature by which LLMs assign probability scores and things like demand for creativity/randomness or factual desire or coherence over long output.

It might not seem obvious, or it might seem like the default for whatever backend is already the 'best you can get', but let's fix this assumption. There are more to language model settings than just 'prompt engineering', and depending on your sampler settings, it can have a dramatic impact.

Your settings are (probably) hurting your model - Why sampler settings matter : r/LocalLLaMA

There are multiple samplers you can use and you can and likely should use multiple in unison. The order and choice of these samplers has a significant impact on your output.

Existing Samplers Explained

I think the existing the resources here are pretty comprehensive so I'll just provide a little map of them here.

Go here for a very short primer on some basics

LLM Samplers Explained

Go here for a great explanation of the underlying premise here of why it's important and stuff to look for

Generation configurations: temperature, top-k, top-p, and test time compute

Go here for a nice inference use focused overview of a lot of different samplers with explanations and advice for usage and trouble shooting

LLM Sampling Parameters Guide | smcleod.netLLM Sampling Parameters: Temperature, top-p, DRY, XTC (2026) | Local AI Master

Go here for a deeper understanding of how this is all works in an easy to digest package

Your settings are (probably) hurting your model - Why sampler settings matter : r/LocalLLaMA

Go here for a much deeper understanding of how this all works and some easier to read but substantially backed takeaways

Aman's AI Journal • Token Sampling Methods

Making Decisions

It comes down to three things. Figure out your use case, evaluate the different options for how they fit your use case with the above resources, and factor in common sampler setups like those provided as inference server defaults and with quants as recommendations because they are popular for a reason.

For a nice short overview focused on use case there is this article

LLM Inference Sampling Methods