| Title | Difficulty | Category | Tags |
|---|---|---|---|
| Implement Softmax | easy | primitives |
activation
basics
numerical-stability
|
| Implement ReLU | easy | primitives |
activation
basics
|
| Implement Sigmoid | easy | primitives |
activation
basics
|
| Implement Tanh | easy | primitives |
activation
basics
|
| Implement Leaky ReLU | easy | primitives |
activation
basics
|
| Implement Mean Squared Error | easy | primitives |
loss
basics
regression
|
| Implement Binary Cross-Entropy Loss | medium | primitives |
loss
classification
binary
|
| Implement Momentum Update | medium | primitives |
optimization
momentum
sgd
|
| Implement Scaled Dot-Product Attention | hard | primitives |
attention
transformer
self-attention
|
| Element-wise Operations | easy | framework |
elementwise
arithmetic
torch.pow
jnp.power
|
| Apply Along an Axis | medium | framework |
normalization
axis-operations
keepdim
broadcasting
|
| Implement Cross-Entropy Loss | medium | primitives |
loss
classification
multi-class
|
| Implement Adam Optimizer Step | hard | primitives |
optimization
adam
advanced
|
| Create a Tensor from a List | easy | framework |
tensor-creation
torch.tensor
jnp.array
|
| Broadcasting Addition | easy | framework |
broadcasting
addition
numpy-style
|
| Vectorize with vmap | medium | framework |
vmap
vectorization
jax.vmap
torch.vmap
dot-product
|
| Implement Linear Layer | medium | primitives |
layer
linear-algebra
neural-network
|
| Implement One-Hot Encoding | easy | primitives |
encoding
classification
basics
|
| Implement Cosine Similarity | easy | primitives |
similarity
linear-algebra
basics
|
| Reshape a Tensor | easy | framework |
reshape
tensor-manipulation
torch.reshape
jnp.reshape
|
| Transpose a Matrix | easy | framework |
transpose
linear-algebra
torch.transpose
jnp.transpose
|
| JIT Compile a Function | medium | framework |
jit
compilation
jax.jit
torch.compile
|
| Implement Batch Normalization | medium | primitives |
normalization
training
neural-network
|
| Implement L2 Regularization | easy | primitives |
regularization
optimization
basics
|
| Implement KL Divergence | medium | primitives |
information-theory
divergence
probability
|
| Tensor Indexing and Slicing | easy | framework |
indexing
slicing
torch.index_select
jnp.take
|
| Concatenate Tensors | easy | framework |
concatenate
torch.cat
jnp.concatenate
|
| Implement Layer Normalization | medium | primitives |
normalization
transformer
neural-network
|
| Implement Dropout | medium | primitives |
regularization
training
neural-network
|
| Implement Max Pooling 1D | medium | primitives |
pooling
cnn
basics
|
| Implement Average Pooling 1D | medium | primitives |
pooling
cnn
basics
|
| Implement 1D Convolution | hard | primitives |
convolution
cnn
signal-processing
|
| Implement Embedding Lookup | easy | primitives |
embedding
nlp
basics
|
| Rotary Position Embeddings | hard | research |
rope
rotary-embeddings
su-2021
position-encoding
transformer
|
| SwiGLU Activation | medium | research |
swiglu
glu
shazeer-2020
activation
llm
|
| Implement Gradient Descent Step | easy | primitives |
optimization
gradient-descent
basics
|
| Implement Positional Encoding | hard | primitives |
transformer
positional-encoding
attention
|
| Matrix Multiplication | easy | framework |
matmul
linear-algebra
torch.matmul
jnp.matmul
|
| Stack Tensors | easy | framework |
stack
torch.stack
jnp.stack
|
| Masked Fill | medium | framework |
masking
torch.where
jnp.where
masked_fill
|
| Compute Gradient | medium | framework |
autograd
gradient
torch.autograd
jax.grad
|
| Top-K Values | medium | framework |
topk
sorting
torch.topk
jax.lax.top_k
|
| Batched Matrix Multiply | medium | framework |
bmm
batched-matmul
torch.bmm
jnp.matmul
|
| Mixed Precision Forward Pass | hard | framework |
mixed-precision
float16
torch.half
jnp.float16
performance
|
| Polynomial Regression | medium | end_to_end |
polynomial
regression
feature-engineering
|
| Gather Elements | medium | framework |
gather
indexing
torch.gather
jnp.take_along_axis
|
| Compute Jacobian | hard | framework |
jacobian
autograd
torch.autograd.functional.jacobian
jax.jacobian
|
| Two-Layer MLP | easy | end_to_end |
mlp
feedforward
relu
neural-network
|
| Training Loop | easy | end_to_end |
training-loop
sgd
gradient-descent
linear-regression
|
| Word Embedding Model | medium | end_to_end |
embedding
nlp
average-pooling
lookup
|
| Cumulative Sum | easy | framework |
cumsum
reduction
torch.cumsum
jnp.cumsum
|
| Scatter Add | medium | framework |
scatter
scatter_add
torch.scatter_add
jnp.at
|
| Custom Activation with Gradient | hard | framework |
custom-activation
silu
swish
autograd
jax.grad
|
| Binary Classifier | easy | end_to_end |
binary-classification
sigmoid
bce-loss
logistic-regression
|
| Mini-Batch Training | medium | end_to_end |
mini-batch
sgd
training-loop
batching
|
| Parallel Map with vmap | hard | framework |
vmap
per-sample-gradient
jax.vmap
jax.grad
torch.vmap
|
| Multi-Class Classifier | easy | end_to_end |
multi-class
softmax
cross-entropy
classification
|
| Simple CNN | medium | end_to_end |
cnn
convolution
max-pooling
fully-connected
|
| Einstein Summation | medium | framework |
einsum
trace
torch.einsum
jnp.einsum
|
| Efficient Attention with Masking | hard | framework |
attention
causal-mask
torch.triu
jnp.triu
softmax
|
| Linear Regression | easy | end_to_end |
linear-regression
mse
regression
|
| Simple RNN Cell | medium | end_to_end |
rnn
recurrent
sequence-model
tanh
|
| Beam Search | hard | end_to_end |
beam-search
decoding
sequence-generation
search
|
| Depthwise Separable Convolution | hard | research |
depthwise-separable
xception
chollet-2017
efficient-conv
cnn
|
| LSTM Cell | medium | end_to_end |
lstm
recurrent
gates
sequence-model
|
| Self-Attention Layer | hard | end_to_end |
self-attention
transformer
softmax
scaled-dot-product
|
| Skip Connection Block | medium | end_to_end |
residual
skip-connection
resnet
deep-learning
|
| Gradient Clipping | easy | end_to_end |
gradient-clipping
optimization
training
stability
|
| GRU Cell | medium | end_to_end |
gru
recurrent
gates
sequence-model
|
| Transformer Encoder Block | hard | end_to_end |
transformer
encoder
self-attention
layer-norm
ffn
|
| Feature Normalization Pipeline | easy | end_to_end |
normalization
preprocessing
feature-engineering
pipeline
|
| Multi-Query Attention | medium | research |
multi-query-attention
mqa
shazeer-2019
attention
transformer
|
| Autoencoder | medium | end_to_end |
autoencoder
encoder-decoder
reconstruction
unsupervised
|
| Transformer Decoder Block | hard | end_to_end |
transformer
decoder
causal-attention
cross-attention
|
| Data Augmentation | medium | end_to_end |
data-augmentation
preprocessing
transforms
pipeline
|
| Grouped-Query Attention | hard | research |
grouped-query-attention
gqa
ainslie-2023
attention
transformer
|
| Sequence Classifier | medium | end_to_end |
sequence-classification
rnn
embedding
nlp
|
| Simple GAN Generator | hard | end_to_end |
gan
generator
generative-model
adversarial
|
| Weight Initialization | easy | end_to_end |
initialization
xavier
he-init
kaiming
|
| Sliding Window Attention | medium | research |
sliding-window
longformer
beltagy-2020
local-attention
efficiency
|
| Learning Rate Scheduler | medium | end_to_end |
learning-rate
scheduler
cosine-annealing
training
|
| Contrastive Loss (InfoNCE) | medium | research |
infonce
contrastive-loss
oord-2018
self-supervised
clip
|
| Squeeze-and-Excitation Block | medium | research |
squeeze-excitation
se-net
hu-2018
channel-attention
cnn
|
| RMSNorm | easy | research |
rmsnorm
normalization
zhang-sennrich-2019
llm
|
| Top-K Gating | medium | research |
top-k-gating
moe
sparse-routing
gating
|
| ALiBi Position Bias | medium | research |
alibi
position-bias
press-2022
attention
transformer
|
| Relative Position Encoding | medium | research |
relative-position
shaw-2018
position-encoding
attention
|
| Flash Attention Score Computation | hard | research |
flash-attention
online-softmax
dao-2022
attention
efficiency
|
| KV Cache for Autoregressive Decoding | hard | research |
kv-cache
autoregressive
decoding
inference
transformer
|
| Mixture of Experts Routing | hard | research |
mixture-of-experts
moe
shazeer-2017
sparse
routing
|
| Cross Attention | medium | research |
cross-attention
decoder
encoder-decoder
transformer
|
| Label Smoothing | easy | research |
label-smoothing
szegedy-2016
regularization
classification
|
| Temperature Scaling | easy | research |
temperature-scaling
calibration
guo-2017
softmax
|
| Prefix Tuning | hard | research |
prefix-tuning
li-liang-2021
parameter-efficient
fine-tuning
|
| Focal Loss | medium | research |
focal-loss
lin-2017
class-imbalance
object-detection
loss
|
| Nucleus (Top-P) Sampling | medium | research |
nucleus-sampling
top-p
holtzman-2020
text-generation
decoding
|
| LoRA Update | medium | research |
lora
hu-2021
parameter-efficient
fine-tuning
low-rank
|
| Stochastic Depth | medium | research |
stochastic-depth
huang-2016
regularization
residual
dropout
|
| Speculative Decoding Accept/Reject | hard | research |
speculative-decoding
leviathan-2023
inference
acceleration
llm
|
| GELU Activation | easy | research |
gelu
activation
hendrycks-gimpel-2016
transformer
|