Implement the nucleus filtering step of Top-P sampling from “The Curious Case of Neural Text Degeneration” (Holtzman et al., 2020).
Nucleus sampling restricts the vocabulary to the smallest set of tokens whose cumulative probability exceeds a threshold p. This produces more natural text than top-k or pure sampling.
Given:
probs: shape (vocab_size,) — probability distribution over vocabulary p: float — cumulative probability threshold Steps:
Output: Tensor of shape (vocab_size,) — filtered and renormalized distribution.