CrackedAI
Problems Tracks Learn JAX Roadmap Articles
Log in Sign up
Problems Tracks Learn JAX Articles Roadmap
Log in Sign up
Radio

We can't find the internet

Attempting to reconnect

Something went wrong!

Attempting to reconnect

← All tracks

Tokenization & Embeddings

From raw text to token tensors โ€” BPE, subword, and the embedding matrices that turn ids into vectors.

0 / 15 solved Continue →
  1. 1. ○ Implement Embedding Lookup
  2. 2. ○ BPE Merge Step
  3. 3. ○ BPE Encode Text
  4. 4. ○ Tokenize and Pad Batch
  5. 5. ○ Learned Absolute Position Embedding
  6. 6. ○ Tied Input/Output Embeddings
  7. 7. ○ Subword Tokenizer: Greedy Longest-Prefix-Match
  8. 8. ○ MLM Masking Strategy
  9. 9. ○ MLM Forward Pass
  10. 10. ○ Train MLM Pretraining Step
  11. 11. ○ MLM Forward with Tied Output Head
  12. 12. ○ MLM Eval โ€” Masked Accuracy
  13. 13. ○ Causal LM Forward Pass
  14. 14. ○ Train Causal LM Pretraining Step
  15. 15. ○ Train Tiny GPT End-to-End