We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Greedy Decoding
Implement greedy decoding โ the simplest strategy for generating a sequence of tokens from a language model.
What is greedy decoding?
At each step, greedy decoding picks the single token with the highest logit (argmax) from the distribution produced by a model. There is no randomness: the output is fully deterministic given the same model and prompt.
Algorithm
seq = list(prompt)
for _ in range(max_tokens):
logits = logits_fn(seq) # shape (vocab,)
next_token = argmax(logits)
seq.append(next_token)
if next_token == eos_id:
break # include EOS, then stop
return tensor(seq)
Strengths
- Deterministic โ same prompt always produces the same output.
- Fast โ one forward pass per token, no branching.
- Simple โ trivial to implement and debug.
Weaknesses
- Repetitive loops โ the model can get stuck repeating the same phrase because each argmax ignores diversity.
- No global optimality โ locally best tokens can lead to poor overall sequences (beam search addresses this).
- No diversity โ useful in creative tasks to sample, not just pick the mode.
Inputs / Output
-
logits_fn: callable(seq: list[int]) -> tensor shape (vocab,)โ the model head; called once per generated token. -
prompt: 1-D tensor of starting token ids, shape(T_prompt,). -
max_tokens: int โ maximum number of tokens to generate (prompt not counted). -
eos_id: int โ end-of-sequence token. Halt and include this token in the output.
Output: 1-D tensor of token ids, shape (T_prompt + n_generated,).
The prompt is included.
Where it fits
Greedy decoding is the baseline before any sampling strategy. Understanding it is the entry point to top-k sampling, nucleus (top-p) sampling, beam search, and speculative decoding.
Hints
Sign in to attempt this problem and view the solution.