Problems

Difficulty:
Category:
Title Difficulty Category Tags
Implement Softmax easy primitives
activation basics numerical-stability
Implement ReLU easy primitives
activation basics
Implement Sigmoid easy primitives
activation basics
Implement Tanh easy primitives
activation basics
Implement Leaky ReLU easy primitives
activation basics
Implement Mean Squared Error easy primitives
loss basics regression
Implement Binary Cross-Entropy Loss medium primitives
loss classification binary
Implement Momentum Update medium primitives
optimization momentum sgd
Implement Scaled Dot-Product Attention hard primitives
attention transformer self-attention
Element-wise Operations easy framework
elementwise arithmetic torch.pow jnp.power
Apply Along an Axis medium framework
normalization axis-operations keepdim broadcasting
Implement Cross-Entropy Loss medium primitives
loss classification multi-class
Implement Adam Optimizer Step hard primitives
optimization adam advanced
Create a Tensor from a List easy framework
tensor-creation torch.tensor jnp.array
Broadcasting Addition easy framework
broadcasting addition numpy-style
Vectorize with vmap medium framework
vmap vectorization jax.vmap torch.vmap dot-product
Implement Linear Layer medium primitives
layer linear-algebra neural-network
Implement One-Hot Encoding easy primitives
encoding classification basics
Implement Cosine Similarity easy primitives
similarity linear-algebra basics
Reshape a Tensor easy framework
reshape tensor-manipulation torch.reshape jnp.reshape
Transpose a Matrix easy framework
transpose linear-algebra torch.transpose jnp.transpose
JIT Compile a Function medium framework
jit compilation jax.jit torch.compile
Implement Batch Normalization medium primitives
normalization training neural-network
Implement L2 Regularization easy primitives
regularization optimization basics
Implement KL Divergence medium primitives
information-theory divergence probability
Tensor Indexing and Slicing easy framework
indexing slicing torch.index_select jnp.take
Concatenate Tensors easy framework
concatenate torch.cat jnp.concatenate
Implement Layer Normalization medium primitives
normalization transformer neural-network
Implement Dropout medium primitives
regularization training neural-network
Implement Max Pooling 1D medium primitives
pooling cnn basics
Implement Average Pooling 1D medium primitives
pooling cnn basics
Implement 1D Convolution hard primitives
convolution cnn signal-processing
Implement Embedding Lookup easy primitives
embedding nlp basics
Rotary Position Embeddings hard research
rope rotary-embeddings su-2021 position-encoding transformer
SwiGLU Activation medium research
swiglu glu shazeer-2020 activation llm
Implement Gradient Descent Step easy primitives
optimization gradient-descent basics
Implement Positional Encoding hard primitives
transformer positional-encoding attention
Matrix Multiplication easy framework
matmul linear-algebra torch.matmul jnp.matmul
Stack Tensors easy framework
stack torch.stack jnp.stack
Masked Fill medium framework
masking torch.where jnp.where masked_fill
Compute Gradient medium framework
autograd gradient torch.autograd jax.grad
Top-K Values medium framework
topk sorting torch.topk jax.lax.top_k
Batched Matrix Multiply medium framework
bmm batched-matmul torch.bmm jnp.matmul
Mixed Precision Forward Pass hard framework
mixed-precision float16 torch.half jnp.float16 performance
Polynomial Regression medium end_to_end
polynomial regression feature-engineering
Gather Elements medium framework
gather indexing torch.gather jnp.take_along_axis
Compute Jacobian hard framework
jacobian autograd torch.autograd.functional.jacobian jax.jacobian
Two-Layer MLP easy end_to_end
mlp feedforward relu neural-network
Training Loop easy end_to_end
training-loop sgd gradient-descent linear-regression
Word Embedding Model medium end_to_end
embedding nlp average-pooling lookup
Cumulative Sum easy framework
cumsum reduction torch.cumsum jnp.cumsum
Scatter Add medium framework
scatter scatter_add torch.scatter_add jnp.at
Custom Activation with Gradient hard framework
custom-activation silu swish autograd jax.grad
Binary Classifier easy end_to_end
binary-classification sigmoid bce-loss logistic-regression
Mini-Batch Training medium end_to_end
mini-batch sgd training-loop batching
Parallel Map with vmap hard framework
vmap per-sample-gradient jax.vmap jax.grad torch.vmap
Multi-Class Classifier easy end_to_end
multi-class softmax cross-entropy classification
Simple CNN medium end_to_end
cnn convolution max-pooling fully-connected
Einstein Summation medium framework
einsum trace torch.einsum jnp.einsum
Efficient Attention with Masking hard framework
attention causal-mask torch.triu jnp.triu softmax
Linear Regression easy end_to_end
linear-regression mse regression
Simple RNN Cell medium end_to_end
rnn recurrent sequence-model tanh
Beam Search hard end_to_end
beam-search decoding sequence-generation search
Depthwise Separable Convolution hard research
depthwise-separable xception chollet-2017 efficient-conv cnn
LSTM Cell medium end_to_end
lstm recurrent gates sequence-model
Self-Attention Layer hard end_to_end
self-attention transformer softmax scaled-dot-product
Skip Connection Block medium end_to_end
residual skip-connection resnet deep-learning
Gradient Clipping easy end_to_end
gradient-clipping optimization training stability
GRU Cell medium end_to_end
gru recurrent gates sequence-model
Transformer Encoder Block hard end_to_end
transformer encoder self-attention layer-norm ffn
Feature Normalization Pipeline easy end_to_end
normalization preprocessing feature-engineering pipeline
Multi-Query Attention medium research
multi-query-attention mqa shazeer-2019 attention transformer
Autoencoder medium end_to_end
autoencoder encoder-decoder reconstruction unsupervised
Transformer Decoder Block hard end_to_end
transformer decoder causal-attention cross-attention
Data Augmentation medium end_to_end
data-augmentation preprocessing transforms pipeline
Grouped-Query Attention hard research
grouped-query-attention gqa ainslie-2023 attention transformer
Sequence Classifier medium end_to_end
sequence-classification rnn embedding nlp
Simple GAN Generator hard end_to_end
gan generator generative-model adversarial
Weight Initialization easy end_to_end
initialization xavier he-init kaiming
Sliding Window Attention medium research
sliding-window longformer beltagy-2020 local-attention efficiency
Learning Rate Scheduler medium end_to_end
learning-rate scheduler cosine-annealing training
Contrastive Loss (InfoNCE) medium research
infonce contrastive-loss oord-2018 self-supervised clip
Squeeze-and-Excitation Block medium research
squeeze-excitation se-net hu-2018 channel-attention cnn
RMSNorm easy research
rmsnorm normalization zhang-sennrich-2019 llm
Top-K Gating medium research
top-k-gating moe sparse-routing gating
ALiBi Position Bias medium research
alibi position-bias press-2022 attention transformer
Relative Position Encoding medium research
relative-position shaw-2018 position-encoding attention
Flash Attention Score Computation hard research
flash-attention online-softmax dao-2022 attention efficiency
KV Cache for Autoregressive Decoding hard research
kv-cache autoregressive decoding inference transformer
Mixture of Experts Routing hard research
mixture-of-experts moe shazeer-2017 sparse routing
Cross Attention medium research
cross-attention decoder encoder-decoder transformer
Label Smoothing easy research
label-smoothing szegedy-2016 regularization classification
Temperature Scaling easy research
temperature-scaling calibration guo-2017 softmax
Prefix Tuning hard research
prefix-tuning li-liang-2021 parameter-efficient fine-tuning
Focal Loss medium research
focal-loss lin-2017 class-imbalance object-detection loss
Nucleus (Top-P) Sampling medium research
nucleus-sampling top-p holtzman-2020 text-generation decoding
LoRA Update medium research
lora hu-2021 parameter-efficient fine-tuning low-rank
Stochastic Depth medium research
stochastic-depth huang-2016 regularization residual dropout
Speculative Decoding Accept/Reject hard research
speculative-decoding leviathan-2023 inference acceleration llm
GELU Activation easy research
gelu activation hendrycks-gimpel-2016 transformer