QUEST LOG
W1 DONE
`micrograd` — scalar autograd from scratch, no copy-paste
TAP TO SUBMIT
W1 SUBMISSIONS
TAP TO GO BACK
W1 PENDING
LC: Two Sum, Contains Duplicate, Best Time to Buy and Sell Stock
TAP TO SUBMIT
W1 SUBMISSIONS
TAP TO GO BACK
W2 PENDING
`makemore` parts 1–3
TAP TO SUBMIT
W2 SUBMISSIONS
TAP TO GO BACK
W2 PENDING
Read TransformerLens source — understand activation patching implementation
TAP TO SUBMIT
W2 SUBMISSIONS
TAP TO GO BACK
W2 PENDING
LC: Valid Anagram, Group Anagrams, Valid Parentheses
TAP TO SUBMIT
W2 SUBMISSIONS
TAP TO GO BACK
W3 PENDING
`makemore` parts 4–5
TAP TO SUBMIT
W3 SUBMISSIONS
TAP TO GO BACK
W3 PENDING
`py-spy` flamegraph on a TransformerLens run — understand where time goes
TAP TO SUBMIT
W3 SUBMISSIONS
TAP TO GO BACK
W3 PENDING
LC: Maximum Subarray, Product of Array Except Self, 3Sum
TAP TO SUBMIT
W3 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
Implement single-head + multi-head attention in numpy (no PyTorch)
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
Read: "Attention Is All You Need" — understand scaled dot-product attention, multi-head mechanism, positional encoding
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W4 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
LC: Container With Most Water, Longest Substring Without Repeating Characters
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
`nanoGPT` — implement yourself, begin training on Shakespeare
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
Read `nanotron` (Mistral) — write notes on why each parallelism choice was made
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
LC: Search in Rotated Sorted Array, Find Minimum in Rotated Sorted Array
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W6 PENDING
`nanoGPT` — finish training, evaluate results
TAP TO SUBMIT
W6 SUBMISSIONS
TAP TO GO BACK
W6 PENDING
Read `torchtitan` — same exercise as nanotron
TAP TO SUBMIT
W6 SUBMISSIONS
TAP TO GO BACK
W6 PENDING
LC: Reverse Linked List, Merge Two Sorted Lists, Linked List Cycle
TAP TO SUBMIT
W6 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
Break nanoGPT 3 different ways, diagnose from loss curves alone
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
Read: "Deep Residual Learning for Image Recognition" — skip connections, degradation problem, why depth works
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
LC: Invert Binary Tree, Maximum Depth of Binary Tree, Same Tree
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
Read: Anthropic's toy models of superposition paper
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
Calculate 7B model full training memory budget from first principles (weights + gradients + optimizer states + activations)
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
LC: Binary Tree Level Order Traversal, Validate BST, Subtree of Another Tree
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
Read: InstructGPT paper
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
Read: ZeRO paper sections 1–3 — understand memory savings arithmetic
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
LC: Number of Islands, Clone Graph, Course Schedule
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W10 PENDING
Read: DPO paper
TAP TO SUBMIT
W10 SUBMISSIONS
TAP TO GO BACK
W10 PENDING
Read: FlashAttention paper introduction — kernel fusion, HBM roundtrip avoidance
TAP TO SUBMIT
W10 SUBMISSIONS
TAP TO GO BACK
W10 PENDING
LC: Pacific Atlantic Water Flow, Graph Valid Tree, Number of Connected Components
TAP TO SUBMIT
W10 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
Understand GQA — Qwen3 uses 32 query / 8 KV heads
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
Read PyTorch DataLoader source — understand worker spawning model
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
LC: Climbing Stairs, Coin Change, Longest Increasing Subsequence
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
Understand RoPE positional encoding — how it differs from learned positions, why frontier models use it
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
Understand AllReduce, AllGather, ReduceScatter — what each does, when each is used
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
LC: Word Break, Combination Sum, House Robber
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
Understand sliding window attention — Qwen3 L0–L27 window=4096, full attention L28–L35
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
Mac Studio setup + benchmark MLX vs PyTorch MPS for SAE workload
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
LC: Merge Intervals, Insert Interval, Non-overlapping Intervals
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
Backpropagation from scratch in numpy — derive gradient updates manually
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
Read vllm scheduler (`vllm/core/scheduler.py`) — understand continuous batching
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
CF: Div 2 A/B — greedy + implementation (rating 800–1000)
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
SAE architecture deep dive: expansion factor, sparsity penalty choices, dead feature problem + mitigation
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
Read HuggingFace Trainer (`transformers/trainer.py`) — understand training loop abstraction
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
CF: Div 2 A/B — sorting + prefix sums (rating 800–1000)
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
Train small SAE on 2-layer transformer (before attempting Qwen3)
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
OTel + DCGM: collector config ingesting DCGM metrics
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
CF: Div 2 B — binary search + two pointers (rating 1000–1200)
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
SAE evaluation: activation maximization, ablation — how do you know a feature is interpretable?
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
OTel: custom spans — forward pass, backward pass, optimizer step, data loading
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
CF: Div 2 B — BFS/DFS on grids (rating 1000–1200)
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
Qwen3 8B architecture internals: GQA heads, SwiGLU, QK-Norm — why each design choice
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
OTel: dashboard — "why did my loss spike at step 3400?" answerable in under 2 minutes
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
CF: Div 2 B/C — basic DP (rating 1200–1400)
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W19 PENDING
Research Q2: Does L34 suppression replicate on Gemma3 + Phi-4? (attribution scan on same fail cases)
TAP TO SUBMIT
W19 SUBMISSIONS
TAP TO GO BACK
W19 PENDING
CF: Div 2 C — constructive algorithms + math (rating 1200–1400)
TAP TO SUBMIT
W19 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
Read: "Language Models are Few-Shot Learners" (GPT-3) — scaling laws, in-context learning, few-shot prompting
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
Read: "Densely Connected Convolutional Networks" (DenseNet) — feature reuse, hyperconnectivity via dense blocks
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
CF: Div 2 C — segment trees / BIT intro (rating 1300–1500)
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
Read: "An Image is Worth 16x16 Words" (ViT) — how transformers replaced CNNs, patch embedding, position encoding for vision
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
Read: "Highway Networks" — gating mechanisms for deep networks, precursor to residual connections
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
CF: Div 2 C — graphs: shortest paths, Dijkstra (rating 1400–1600)
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
Read: "BERT: Pre-training of Deep Bidirectional Transformers" — masked language modeling, bidirectional context, fine-tuning paradigm
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
Read: "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" — conditional computation, gating, expert routing
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
CF: Div 2 C/D — union-find / DSU (rating 1400–1600)
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
Read: "Scaling Laws for Neural Language Models" — compute-optimal training, power-law relationships, Chinchilla implications
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
Read: "Layer Normalization" — why LayerNorm over BatchNorm for transformers, Pre-LN vs Post-LN stability
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
CF: Div 2 D — number theory + modular arithmetic (rating 1500–1700)
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
Read: "Denoising Diffusion Probabilistic Models" — forward/reverse process, noise schedules, connection to score matching
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
Read: "LoRA: Low-Rank Adaptation of Large Language Models" — parameter-efficient fine-tuning, rank decomposition, when to use it
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
CF: Div 2 D — DP on trees (rating 1600–1800)
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
Read: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" — retriever-reader architecture, when RAG beats fine-tuning
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
Read: "Switch Transformers: Scaling to Trillion Parameter Models" — simplified MoE routing, expert parallelism, load balancing
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
CF: Div 2 D — bitmask DP (rating 1600–1800)
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W27 PENDING
System design practice: "Design a training observability system for a 1000-GPU cluster" — answer out loud, no notes
TAP TO SUBMIT
W27 SUBMISSIONS
TAP TO GO BACK
W27 PENDING
CF: Div 2 D/E — heavy-light decomposition / centroid decomposition (rating 1800–2000)
TAP TO SUBMIT
W27 SUBMISSIONS
TAP TO GO BACK
W28 PENDING
Mock interview: explain attention from first principles to a non-ML engineer
TAP TO SUBMIT
W28 SUBMISSIONS
TAP TO GO BACK
W28 PENDING
CF: Div 2 D/E — flows and matchings (rating 1800–2000)
TAP TO SUBMIT
W28 SUBMISSIONS
TAP TO GO BACK