W1 DONE
`micrograd` — scalar autograd from scratch, no copy-paste
TAP TO SUBMIT
W1 SUBMISSIONS
TAP TO GO BACK
W1 PENDING
LC: Two Sum, Contains Duplicate, Best Time to Buy and Sell Stock
TAP TO SUBMIT
W1 SUBMISSIONS
TAP TO GO BACK
W2 SUBMISSIONS
TAP TO GO BACK
W2 PENDING
Read TransformerLens source — understand activation patching implementation
TAP TO SUBMIT
W2 SUBMISSIONS
TAP TO GO BACK
W2 PENDING
LC: Valid Anagram, Group Anagrams, Valid Parentheses
TAP TO SUBMIT
W2 SUBMISSIONS
TAP TO GO BACK
W3 SUBMISSIONS
TAP TO GO BACK
W3 PENDING
`py-spy` flamegraph on a TransformerLens run — understand where time goes
TAP TO SUBMIT
W3 SUBMISSIONS
TAP TO GO BACK
W3 PENDING
LC: Maximum Subarray, Product of Array Except Self, 3Sum
TAP TO SUBMIT
W3 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
Implement single-head + multi-head attention in numpy (no PyTorch)
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
Read: "Attention Is All You Need" — understand scaled dot-product attention, multi-head mechanism, positional encoding
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
Derive cross-entropy loss for next-token prediction by hand
[1] Cross-Entropy Loss — PyTorch docs
[2] Softmax and Cross-Entropy — Stanford CS231n notes
[3] The Softmax Function and Its Derivative (Eli Bendersky)
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W4 PENDING
LC: Container With Most Water, Longest Substring Without Repeating Characters
TAP TO SUBMIT
W4 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
`nanoGPT` — implement yourself, begin training on Shakespeare
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
Read `nanotron` (Mistral) — write notes on why each parallelism choice was made
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W5 PENDING
LC: Search in Rotated Sorted Array, Find Minimum in Rotated Sorted Array
TAP TO SUBMIT
W5 SUBMISSIONS
TAP TO GO BACK
W6 PENDING
`nanoGPT` — finish training, evaluate results
TAP TO SUBMIT
W6 SUBMISSIONS
TAP TO GO BACK
W6 SUBMISSIONS
TAP TO GO BACK
W6 PENDING
LC: Reverse Linked List, Merge Two Sorted Lists, Linked List Cycle
TAP TO SUBMIT
W6 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
Break nanoGPT 3 different ways, diagnose from loss curves alone
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
Read: "Deep Residual Learning for Image Recognition" — skip connections, degradation problem, why depth works
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W7 PENDING
LC: Invert Binary Tree, Maximum Depth of Binary Tree, Same Tree
TAP TO SUBMIT
W7 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
Read: Anthropic's toy models of superposition paper
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
Calculate 7B model full training memory budget from first principles (weights + gradients + optimizer states + activations)
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W8 PENDING
LC: Binary Tree Level Order Traversal, Validate BST, Subtree of Another Tree
[1] Binary Tree Level Order Traversal (#102)
[2] Validate Binary Search Tree (#98)
[3] Subtree of Another Tree (#572)
TAP TO SUBMIT
W8 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
Read: InstructGPT paper
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
Read: ZeRO paper sections 1–3 — understand memory savings arithmetic
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W9 PENDING
LC: Number of Islands, Clone Graph, Course Schedule
TAP TO SUBMIT
W9 SUBMISSIONS
TAP TO GO BACK
W10 SUBMISSIONS
TAP TO GO BACK
W10 PENDING
Read: FlashAttention paper introduction — kernel fusion, HBM roundtrip avoidance
TAP TO SUBMIT
W10 SUBMISSIONS
TAP TO GO BACK
W10 PENDING
LC: Pacific Atlantic Water Flow, Graph Valid Tree, Number of Connected Components
[1] Pacific Atlantic Water Flow (#417)
[2] Graph Valid Tree (#261)
[3] Number of Connected Components (#323)
TAP TO SUBMIT
W10 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
Understand GQA — Qwen3 uses 32 query / 8 KV heads
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
Read PyTorch DataLoader source — understand worker spawning model
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W11 PENDING
LC: Climbing Stairs, Coin Change, Longest Increasing Subsequence
TAP TO SUBMIT
W11 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
Understand RoPE positional encoding — how it differs from learned positions, why frontier models use it
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
Understand AllReduce, AllGather, ReduceScatter — what each does, when each is used
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W12 PENDING
LC: Word Break, Combination Sum, House Robber
TAP TO SUBMIT
W12 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
Understand sliding window attention — Qwen3 L0–L27 window=4096, full attention L28–L35
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
Mac Studio setup + benchmark MLX vs PyTorch MPS for SAE workload
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W13 PENDING
LC: Merge Intervals, Insert Interval, Non-overlapping Intervals
TAP TO SUBMIT
W13 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
Backpropagation from scratch in numpy — derive gradient updates manually
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
Read vllm scheduler (`vllm/core/scheduler.py`) — understand continuous batching
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W14 PENDING
CF: Div 2 A/B — greedy + implementation (rating 800–1000)
TAP TO SUBMIT
W14 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
SAE architecture deep dive: expansion factor, sparsity penalty choices, dead feature problem + mitigation
[1] Towards Monosemanticity (Anthropic)
[2] Scaling Monosemanticity (Anthropic)
[3] SAE tutorial by Joseph Bloom
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
Read HuggingFace Trainer (`transformers/trainer.py`) — understand training loop abstraction
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W15 PENDING
CF: Div 2 A/B — sorting + prefix sums (rating 800–1000)
TAP TO SUBMIT
W15 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
Train small SAE on 2-layer transformer (before attempting Qwen3)
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
OTel + DCGM: collector config ingesting DCGM metrics
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W16 PENDING
CF: Div 2 B — binary search + two pointers (rating 1000–1200)
TAP TO SUBMIT
W16 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
SAE evaluation: activation maximization, ablation — how do you know a feature is interpretable?
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
OTel: custom spans — forward pass, backward pass, optimizer step, data loading
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W17 PENDING
CF: Div 2 B — BFS/DFS on grids (rating 1000–1200)
TAP TO SUBMIT
W17 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
Qwen3 8B architecture internals: GQA heads, SwiGLU, QK-Norm — why each design choice
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
OTel: dashboard — "why did my loss spike at step 3400?" answerable in under 2 minutes
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W18 PENDING
CF: Div 2 B/C — basic DP (rating 1200–1400)
TAP TO SUBMIT
W18 SUBMISSIONS
TAP TO GO BACK
W19 PENDING
Research Q2: Does L34 suppression replicate on Gemma3 + Phi-4? (attribution scan on same fail cases)
TAP TO SUBMIT
W19 SUBMISSIONS
TAP TO GO BACK
W19 PENDING
CF: Div 2 C — constructive algorithms + math (rating 1200–1400)
TAP TO SUBMIT
W19 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
Read: "Language Models are Few-Shot Learners" (GPT-3) — scaling laws, in-context learning, few-shot prompting
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
Read: "Densely Connected Convolutional Networks" (DenseNet) — feature reuse, hyperconnectivity via dense blocks
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W20 PENDING
CF: Div 2 C — segment trees / BIT intro (rating 1300–1500)
TAP TO SUBMIT
W20 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
Read: "An Image is Worth 16x16 Words" (ViT) — how transformers replaced CNNs, patch embedding, position encoding for vision
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
Read: "Highway Networks" — gating mechanisms for deep networks, precursor to residual connections
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W21 PENDING
CF: Div 2 C — graphs: shortest paths, Dijkstra (rating 1400–1600)
TAP TO SUBMIT
W21 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
Read: "BERT: Pre-training of Deep Bidirectional Transformers" — masked language modeling, bidirectional context, fine-tuning paradigm
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
Read: "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" — conditional computation, gating, expert routing
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W22 PENDING
CF: Div 2 C/D — union-find / DSU (rating 1400–1600)
TAP TO SUBMIT
W22 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
Read: "Scaling Laws for Neural Language Models" — compute-optimal training, power-law relationships, Chinchilla implications
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
Read: "Layer Normalization" — why LayerNorm over BatchNorm for transformers, Pre-LN vs Post-LN stability
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W23 PENDING
CF: Div 2 D — number theory + modular arithmetic (rating 1500–1700)
TAP TO SUBMIT
W23 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
Read: "Denoising Diffusion Probabilistic Models" — forward/reverse process, noise schedules, connection to score matching
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
Read: "LoRA: Low-Rank Adaptation of Large Language Models" — parameter-efficient fine-tuning, rank decomposition, when to use it
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W24 PENDING
CF: Div 2 D — DP on trees (rating 1600–1800)
TAP TO SUBMIT
W24 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
Read: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" — retriever-reader architecture, when RAG beats fine-tuning
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
Read: "Switch Transformers: Scaling to Trillion Parameter Models" — simplified MoE routing, expert parallelism, load balancing
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W25 PENDING
CF: Div 2 D — bitmask DP (rating 1600–1800)
TAP TO SUBMIT
W25 SUBMISSIONS
TAP TO GO BACK
W27 PENDING
System design practice: "Design a training observability system for a 1000-GPU cluster" — answer out loud, no notes
[1] Designing Data-Intensive Applications (Kleppmann)
[2] Google SRE Book — Monitoring Distributed Systems
[3] NVIDIA Collective Communications Library (NCCL) docs
[4] OpenTelemetry Collector architecture
TAP TO SUBMIT
W27 SUBMISSIONS
TAP TO GO BACK
W27 PENDING
CF: Div 2 D/E — heavy-light decomposition / centroid decomposition (rating 1800–2000)
TAP TO SUBMIT
W27 SUBMISSIONS
TAP TO GO BACK
W28 PENDING
Mock interview: explain attention from first principles to a non-ML engineer
[1] Attention Is All You Need
[2] The Illustrated Transformer
[3] 3Blue1Brown — Attention in transformers, visually explained
TAP TO SUBMIT
W28 SUBMISSIONS
TAP TO GO BACK
W28 PENDING
CF: Div 2 D/E — flows and matchings (rating 1800–2000)
TAP TO SUBMIT
W28 SUBMISSIONS
TAP TO GO BACK