Implement beam search decoding.
Given a log-probability matrix where log_probs[t][v] is the log-probability
of token v at time step t, find the top-k highest-scoring sequences.
At each time step:
beam_score + log_probs[t][v] beam_width candidates Input:
log_probs: shape (T, V) — log-probabilities at each time step beam_width: number of beams to keep (int) Output: A dict with: