We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
MLM Eval — Masked Accuracy
Evaluate a masked language model (MLM) by computing the fraction of masked positions where the model correctly predicts the original token — the canonical BERT evaluation metric alongside perplexity.
What this measures
During MLM pre-training (BERT, RoBERTa), a random subset of tokens is replaced
with a [MASK] token. The model must predict the original token at each masked
position. Masked accuracy counts how often argmax(logits) at a masked
position equals the original token id.
Unlike training (which minimises cross-entropy loss), evaluation is a single forward pass followed by argmax — no backward pass, no gradient, no SGD.
Pipeline
-
Run
mlm_forward(same as Task 7) to obtainlogits_allwith shape(N, T, vocab_size). -
At masked positions (
mask_indicator > 0.5) takeargmaxover the vocab dimension → predicted token ids, shape(M,). -
Gather
original_idsat the same masked positions → shape(M,). -
Compute
mean(predicted == original_ids)and return as a scalarfloat. -
Edge case: if no positions are masked (
M = 0), return0.0.
Inputs
-
input_ids: shape(N, T)— corrupted (possibly masked) token ids. -
original_ids: shape(N, T)— the true token ids before masking. -
mask_indicator: shape(N, T)—1.0at masked positions,0.0elsewhere. -
w_emb: shape(vocab_size, d_model). -
pos_embed: shape(T, d_model). -
blocks_weights: shape(num_blocks, 6, d_model, d_model). -
w_head: shape(d_model, vocab_size). -
num_heads: int.
Output
A single scalar float in [0.0, 1.0] — the fraction of masked positions
correctly predicted. The runtime returns this as {"value": <float>}.
References
- Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL 2019.
- Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, 2019.
Hints
Sign in to attempt this problem and view the solution.