We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
← All tracks
Tokenization & Embeddings
From raw text to token tensors โ BPE, subword, and the embedding matrices that turn ids into vectors.
0
/ 15 solved
Continue →
- 1. ○ Implement Embedding Lookup
- 2. ○ BPE Merge Step
- 3. ○ BPE Encode Text
- 4. ○ Tokenize and Pad Batch
- 5. ○ Learned Absolute Position Embedding
- 6. ○ Tied Input/Output Embeddings
- 7. ○ Subword Tokenizer: Greedy Longest-Prefix-Match
- 8. ○ MLM Masking Strategy
- 9. ○ MLM Forward Pass
- 10. ○ Train MLM Pretraining Step
- 11. ○ MLM Forward with Tied Output Head
- 12. ○ MLM Eval โ Masked Accuracy
- 13. ○ Causal LM Forward Pass
- 14. ○ Train Causal LM Pretraining Step
- 15. ○ Train Tiny GPT End-to-End