We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
BLEU-1gram
Compute a simplified BLEU score using only 1-grams plus a brevity penalty.
What is BLEU?
BLEU (Bilingual Evaluation Understudy) is the standard automatic metric for evaluating machine-translation output. The full metric uses a weighted geometric mean of 1- to 4-gram precisions; this problem implements the 1-gram (unigram) variant, which captures simple word-overlap.
Reference: Papineni et al. 2002, βBLEU: a method for automatic evaluation of machine translation.β
Algorithm
Precision component (with clipping)
For each unique token in the candidate, count how many times it appears
in the candidate (cand_count) and in the reference (ref_count). The
clipped count is min(cand_count, ref_count) β this prevents the model
from gaming precision by repeating a common word many times.
clipped_sum = Ξ£_{t β candidate} min(count_candidate(t), count_reference(t))
precision = clipped_sum / len(candidate)
Brevity penalty (BP)
A short candidate can achieve high precision by only saying safe words. The brevity penalty discourages outputs shorter than the reference:
BP = 1.0 if len(candidate) > len(reference)
= exp(1 β len(reference) / len(candidate)) otherwise
BLEU score
BLEU = BP Γ precision
Edge case
If the candidate is empty, return 0.0.
When to use BLEU
- Evaluating machine translation, summarisation, and other text-generation tasks where a gold reference is available.
- Quick sanity checks during fine-tuning or decoding experiments.
- As a fast proxy before running human evaluation.
Inputs
-
reference: list of token strings β the gold sentence. -
candidate: list of token strings β the generated sentence.
Output
Scalar float BLEU score in [0, 1].
Hints
Sign in to attempt this problem and view the solution.