QUEST LOG

W1 DONE

`micrograd` — scalar autograd from scratch, no copy-paste

[1] Karpathy micrograd repo [2] micrograd video lecture

TAP TO SUBMIT

W1 SUBMISSIONS

TAP TO GO BACK

W1 PENDING

LC: Two Sum, Contains Duplicate, Best Time to Buy and Sell Stock

[1] Two Sum (#1) [2] Contains Duplicate (#217) [3] Best Time to Buy and Sell Stock (#121)

TAP TO SUBMIT

W1 SUBMISSIONS

TAP TO GO BACK

W2 PENDING

`makemore` parts 1–3

[1] Karpathy makemore repo [2] makemore playlist

TAP TO SUBMIT

W2 SUBMISSIONS

TAP TO GO BACK

W2 PENDING

Read TransformerLens source — understand activation patching implementation

[1] TransformerLens repo [2] TransformerLens docs

TAP TO SUBMIT

W2 SUBMISSIONS

TAP TO GO BACK

W2 PENDING

LC: Valid Anagram, Group Anagrams, Valid Parentheses

[1] Valid Anagram (#242) [2] Group Anagrams (#49) [3] Valid Parentheses (#20)

TAP TO SUBMIT

W2 SUBMISSIONS

TAP TO GO BACK

W3 PENDING

`makemore` parts 4–5

[1] makemore playlist

TAP TO SUBMIT

W3 SUBMISSIONS

TAP TO GO BACK

W3 PENDING

`py-spy` flamegraph on a TransformerLens run — understand where time goes

[1] py-spy docs

TAP TO SUBMIT

W3 SUBMISSIONS

TAP TO GO BACK

W3 PENDING

LC: Maximum Subarray, Product of Array Except Self, 3Sum

[1] Maximum Subarray (#53) [2] Product of Array Except Self (#238) [3] 3Sum (#15)

TAP TO SUBMIT

W3 SUBMISSIONS

TAP TO GO BACK

W4 PENDING

Implement single-head + multi-head attention in numpy (no PyTorch)

[1] Attention Is All You Need [2] The Illustrated Transformer

TAP TO SUBMIT

W4 SUBMISSIONS

TAP TO GO BACK

W4 PENDING

Read: "Attention Is All You Need" — understand scaled dot-product attention, multi-head mechanism, positional encoding

[1] Attention Is All You Need [2] The Annotated Transformer

TAP TO SUBMIT

W4 SUBMISSIONS

TAP TO GO BACK

W4 PENDING

Derive cross-entropy loss for next-token prediction by hand

[1] Cross-Entropy Loss — PyTorch docs [2] Softmax and Cross-Entropy — Stanford CS231n notes [3] The Softmax Function and Its Derivative (Eli Bendersky)

TAP TO SUBMIT

W4 SUBMISSIONS

TAP TO GO BACK

W4 PENDING

LC: Container With Most Water, Longest Substring Without Repeating Characters

[1] Container With Most Water (#11) [2] Longest Substring Without Repeating Characters (#3)

TAP TO SUBMIT

W4 SUBMISSIONS

TAP TO GO BACK

W5 PENDING

`nanoGPT` — implement yourself, begin training on Shakespeare

[1] Karpathy nanoGPT repo [2] nanoGPT video lecture

TAP TO SUBMIT

W5 SUBMISSIONS

TAP TO GO BACK

W5 PENDING

Read `nanotron` (Mistral) — write notes on why each parallelism choice was made

[1] nanotron repo

TAP TO SUBMIT

W5 SUBMISSIONS

TAP TO GO BACK

W5 PENDING

LC: Search in Rotated Sorted Array, Find Minimum in Rotated Sorted Array

[1] Search in Rotated Sorted Array (#33) [2] Find Minimum in Rotated Sorted Array (#153)

TAP TO SUBMIT

W5 SUBMISSIONS

TAP TO GO BACK

W6 PENDING

`nanoGPT` — finish training, evaluate results

TAP TO SUBMIT

W6 SUBMISSIONS

TAP TO GO BACK

W6 PENDING

Read `torchtitan` — same exercise as nanotron

[1] torchtitan repo

TAP TO SUBMIT

W6 SUBMISSIONS

TAP TO GO BACK

W6 PENDING

LC: Reverse Linked List, Merge Two Sorted Lists, Linked List Cycle

[1] Reverse Linked List (#206) [2] Merge Two Sorted Lists (#21) [3] Linked List Cycle (#141)

TAP TO SUBMIT

W6 SUBMISSIONS

TAP TO GO BACK

W7 PENDING

Break nanoGPT 3 different ways, diagnose from loss curves alone

TAP TO SUBMIT

W7 SUBMISSIONS

TAP TO GO BACK

W7 PENDING

Read: "Deep Residual Learning for Image Recognition" — skip connections, degradation problem, why depth works

[1] Deep Residual Learning (ResNet) [2] He et al. — Identity Mappings in Deep Residual Networks

TAP TO SUBMIT

W7 SUBMISSIONS

TAP TO GO BACK

W7 PENDING

LC: Invert Binary Tree, Maximum Depth of Binary Tree, Same Tree

[1] Invert Binary Tree (#226) [2] Maximum Depth of Binary Tree (#104) [3] Same Tree (#100)

TAP TO SUBMIT

W7 SUBMISSIONS

TAP TO GO BACK

W8 PENDING

Read: Anthropic's toy models of superposition paper

[1] Toy Models of Superposition

TAP TO SUBMIT

W8 SUBMISSIONS

TAP TO GO BACK

W8 PENDING

Calculate 7B model full training memory budget from first principles (weights + gradients + optimizer states + activations)

[1] Efficient Training on a Single GPU (HuggingFace)

TAP TO SUBMIT

W8 SUBMISSIONS

TAP TO GO BACK

W8 PENDING

LC: Binary Tree Level Order Traversal, Validate BST, Subtree of Another Tree

[1] Binary Tree Level Order Traversal (#102) [2] Validate Binary Search Tree (#98) [3] Subtree of Another Tree (#572)

TAP TO SUBMIT

W8 SUBMISSIONS

TAP TO GO BACK

W9 PENDING

Read: InstructGPT paper

[1] Training language models to follow instructions with human feedback

TAP TO SUBMIT

W9 SUBMISSIONS

TAP TO GO BACK

W9 PENDING

Read: ZeRO paper sections 1–3 — understand memory savings arithmetic

[1] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

TAP TO SUBMIT

W9 SUBMISSIONS

TAP TO GO BACK

W9 PENDING

LC: Number of Islands, Clone Graph, Course Schedule

[1] Number of Islands (#200) [2] Clone Graph (#133) [3] Course Schedule (#207)

TAP TO SUBMIT

W9 SUBMISSIONS

TAP TO GO BACK

W10 PENDING

Read: DPO paper

[1] Direct Preference Optimization

TAP TO SUBMIT

W10 SUBMISSIONS

TAP TO GO BACK

W10 PENDING

Read: FlashAttention paper introduction — kernel fusion, HBM roundtrip avoidance

[1] FlashAttention: Fast and Memory-Efficient Exact Attention [2] FlashAttention-2

TAP TO SUBMIT

W10 SUBMISSIONS

TAP TO GO BACK

W10 PENDING

LC: Pacific Atlantic Water Flow, Graph Valid Tree, Number of Connected Components

[1] Pacific Atlantic Water Flow (#417) [2] Graph Valid Tree (#261) [3] Number of Connected Components (#323)

TAP TO SUBMIT

W10 SUBMISSIONS

TAP TO GO BACK

W11 PENDING

Understand GQA — Qwen3 uses 32 query / 8 KV heads

[1] GQA: Training Generalized Multi-Query Transformer Models [2] Qwen3 technical report

TAP TO SUBMIT

W11 SUBMISSIONS

TAP TO GO BACK

W11 PENDING

Read PyTorch DataLoader source — understand worker spawning model

[1] PyTorch DataLoader source

TAP TO SUBMIT

W11 SUBMISSIONS

TAP TO GO BACK

W11 PENDING

LC: Climbing Stairs, Coin Change, Longest Increasing Subsequence

[1] Climbing Stairs (#70) [2] Coin Change (#322) [3] Longest Increasing Subsequence (#300)

TAP TO SUBMIT

W11 SUBMISSIONS

TAP TO GO BACK

W12 PENDING

Understand RoPE positional encoding — how it differs from learned positions, why frontier models use it

[1] RoFormer: Enhanced Transformer with Rotary Position Embedding

TAP TO SUBMIT

W12 SUBMISSIONS

TAP TO GO BACK

W12 PENDING

Understand AllReduce, AllGather, ReduceScatter — what each does, when each is used

[1] NCCL docs [2] Collective operations visual guide

TAP TO SUBMIT

W12 SUBMISSIONS

TAP TO GO BACK

W12 PENDING

LC: Word Break, Combination Sum, House Robber

[1] Word Break (#139) [2] Combination Sum (#39) [3] House Robber (#198)

TAP TO SUBMIT

W12 SUBMISSIONS

TAP TO GO BACK

W13 PENDING

Understand sliding window attention — Qwen3 L0–L27 window=4096, full attention L28–L35

[1] Mistral 7B paper (introduces sliding window attention)

TAP TO SUBMIT

W13 SUBMISSIONS

TAP TO GO BACK

W13 PENDING

Mac Studio setup + benchmark MLX vs PyTorch MPS for SAE workload

[1] MLX repo [2] MLX examples

TAP TO SUBMIT

W13 SUBMISSIONS

TAP TO GO BACK

W13 PENDING

LC: Merge Intervals, Insert Interval, Non-overlapping Intervals

[1] Merge Intervals (#56) [2] Insert Interval (#57) [3] Non-overlapping Intervals (#435)

TAP TO SUBMIT

W13 SUBMISSIONS

TAP TO GO BACK

W14 PENDING

Backpropagation from scratch in numpy — derive gradient updates manually

[1] 3Blue1Brown backpropagation video [2] Karpathy "Yes you should understand backprop" blog

TAP TO SUBMIT

W14 SUBMISSIONS

TAP TO GO BACK

W14 PENDING

Read vllm scheduler (`vllm/core/scheduler.py`) — understand continuous batching

[1] vLLM repo [2] Efficient Memory Management for LLM Serving with PagedAttention

TAP TO SUBMIT

W14 SUBMISSIONS

TAP TO GO BACK

W14 PENDING

CF: Div 2 A/B — greedy + implementation (rating 800–1000)

[1] Codeforces Problemset [2] USACO Guide — Getting Started

TAP TO SUBMIT

W14 SUBMISSIONS

TAP TO GO BACK

W15 PENDING

SAE architecture deep dive: expansion factor, sparsity penalty choices, dead feature problem + mitigation

[1] Towards Monosemanticity (Anthropic) [2] Scaling Monosemanticity (Anthropic) [3] SAE tutorial by Joseph Bloom

TAP TO SUBMIT

W15 SUBMISSIONS

TAP TO GO BACK

W15 PENDING

Read HuggingFace Trainer (`transformers/trainer.py`) — understand training loop abstraction

[1] HF Trainer source

TAP TO SUBMIT

W15 SUBMISSIONS

TAP TO GO BACK

W15 PENDING

CF: Div 2 A/B — sorting + prefix sums (rating 800–1000)

[1] Codeforces Problemset

TAP TO SUBMIT

W15 SUBMISSIONS

TAP TO GO BACK

W16 PENDING

Train small SAE on 2-layer transformer (before attempting Qwen3)

[1] SAELens

TAP TO SUBMIT

W16 SUBMISSIONS

TAP TO GO BACK

W16 PENDING

OTel + DCGM: collector config ingesting DCGM metrics

[1] OpenTelemetry Python docs [2] NVIDIA DCGM docs

TAP TO SUBMIT

W16 SUBMISSIONS

TAP TO GO BACK

W16 PENDING

CF: Div 2 B — binary search + two pointers (rating 1000–1200)

[1] Codeforces Problemset [2] CP Algorithms — Binary Search

TAP TO SUBMIT

W16 SUBMISSIONS

TAP TO GO BACK

W17 PENDING

SAE evaluation: activation maximization, ablation — how do you know a feature is interpretable?

[1] Scaling Monosemanticity — evaluation sections

TAP TO SUBMIT

W17 SUBMISSIONS

TAP TO GO BACK

W17 PENDING

OTel: custom spans — forward pass, backward pass, optimizer step, data loading

[1] OTel manual instrumentation

TAP TO SUBMIT

W17 SUBMISSIONS

TAP TO GO BACK

W17 PENDING

CF: Div 2 B — BFS/DFS on grids (rating 1000–1200)

[1] Codeforces Problemset [2] CP Algorithms — BFS

TAP TO SUBMIT

W17 SUBMISSIONS

TAP TO GO BACK

W18 PENDING

Qwen3 8B architecture internals: GQA heads, SwiGLU, QK-Norm — why each design choice

[1] Qwen3 technical report [2] GLU Variants Improve Transformer (SwiGLU)

TAP TO SUBMIT

W18 SUBMISSIONS

TAP TO GO BACK

W18 PENDING

OTel: dashboard — "why did my loss spike at step 3400?" answerable in under 2 minutes

[1] Grafana OTel integration

TAP TO SUBMIT

W18 SUBMISSIONS

TAP TO GO BACK

W18 PENDING

CF: Div 2 B/C — basic DP (rating 1200–1400)

[1] Codeforces Problemset [2] CP Algorithms — Dynamic Programming

TAP TO SUBMIT

W18 SUBMISSIONS

TAP TO GO BACK

W19 PENDING

Research Q2: Does L34 suppression replicate on Gemma3 + Phi-4? (attribution scan on same fail cases)

[1] Gemma3 model card [2] Phi-4 technical report

TAP TO SUBMIT

W19 SUBMISSIONS

TAP TO GO BACK

W19 PENDING

CF: Div 2 C — constructive algorithms + math (rating 1200–1400)

[1] Codeforces Problemset

TAP TO SUBMIT

W19 SUBMISSIONS

TAP TO GO BACK

W20 PENDING

Read: "Language Models are Few-Shot Learners" (GPT-3) — scaling laws, in-context learning, few-shot prompting

[1] Language Models are Few-Shot Learners

TAP TO SUBMIT

W20 SUBMISSIONS

TAP TO GO BACK

W20 PENDING

Read: "Densely Connected Convolutional Networks" (DenseNet) — feature reuse, hyperconnectivity via dense blocks

[1] Densely Connected Convolutional Networks

TAP TO SUBMIT

W20 SUBMISSIONS

TAP TO GO BACK

W20 PENDING

CF: Div 2 C — segment trees / BIT intro (rating 1300–1500)

[1] Codeforces Problemset [2] CP Algorithms — Segment Tree

TAP TO SUBMIT

W20 SUBMISSIONS

TAP TO GO BACK

W21 PENDING

Read: "An Image is Worth 16x16 Words" (ViT) — how transformers replaced CNNs, patch embedding, position encoding for vision

[1] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

TAP TO SUBMIT

W21 SUBMISSIONS

TAP TO GO BACK

W21 PENDING

Read: "Highway Networks" — gating mechanisms for deep networks, precursor to residual connections

[1] Training Very Deep Networks (Highway Networks)

TAP TO SUBMIT

W21 SUBMISSIONS

TAP TO GO BACK

W21 PENDING

CF: Div 2 C — graphs: shortest paths, Dijkstra (rating 1400–1600)

[1] Codeforces Problemset [2] CP Algorithms — Dijkstra

TAP TO SUBMIT

W21 SUBMISSIONS

TAP TO GO BACK

W22 PENDING

Read: "BERT: Pre-training of Deep Bidirectional Transformers" — masked language modeling, bidirectional context, fine-tuning paradigm

[1] BERT: Pre-training of Deep Bidirectional Transformers

TAP TO SUBMIT

W22 SUBMISSIONS

TAP TO GO BACK

W22 PENDING

Read: "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" — conditional computation, gating, expert routing

[1] Sparsely-Gated Mixture-of-Experts

TAP TO SUBMIT

W22 SUBMISSIONS

TAP TO GO BACK

W22 PENDING

CF: Div 2 C/D — union-find / DSU (rating 1400–1600)

[1] Codeforces Problemset [2] CP Algorithms — Disjoint Set Union

TAP TO SUBMIT

W22 SUBMISSIONS

TAP TO GO BACK

W23 PENDING

Read: "Scaling Laws for Neural Language Models" — compute-optimal training, power-law relationships, Chinchilla implications

[1] Scaling Laws for Neural Language Models

TAP TO SUBMIT

W23 SUBMISSIONS

TAP TO GO BACK

W23 PENDING

Read: "Layer Normalization" — why LayerNorm over BatchNorm for transformers, Pre-LN vs Post-LN stability

[1] Layer Normalization [2] On Layer Normalization in the Transformer Architecture (Pre-LN)

TAP TO SUBMIT

W23 SUBMISSIONS

TAP TO GO BACK

W23 PENDING

CF: Div 2 D — number theory + modular arithmetic (rating 1500–1700)

[1] Codeforces Problemset [2] CP Algorithms — Modular Arithmetic

TAP TO SUBMIT

W23 SUBMISSIONS

TAP TO GO BACK

W24 PENDING

Read: "Denoising Diffusion Probabilistic Models" — forward/reverse process, noise schedules, connection to score matching

[1] Denoising Diffusion Probabilistic Models

TAP TO SUBMIT

W24 SUBMISSIONS

TAP TO GO BACK

W24 PENDING

Read: "LoRA: Low-Rank Adaptation of Large Language Models" — parameter-efficient fine-tuning, rank decomposition, when to use it

[1] LoRA: Low-Rank Adaptation

TAP TO SUBMIT

W24 SUBMISSIONS

TAP TO GO BACK

W24 PENDING

CF: Div 2 D — DP on trees (rating 1600–1800)

[1] Codeforces Problemset [2] CP Algorithms — DP on Trees

TAP TO SUBMIT

W24 SUBMISSIONS

TAP TO GO BACK

W25 PENDING

Read: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" — retriever-reader architecture, when RAG beats fine-tuning

[1] Retrieval-Augmented Generation (RAG)

TAP TO SUBMIT

W25 SUBMISSIONS

TAP TO GO BACK

W25 PENDING

Read: "Switch Transformers: Scaling to Trillion Parameter Models" — simplified MoE routing, expert parallelism, load balancing

[1] Switch Transformers

TAP TO SUBMIT

W25 SUBMISSIONS

TAP TO GO BACK

W25 PENDING

CF: Div 2 D — bitmask DP (rating 1600–1800)

[1] Codeforces Problemset [2] CP Algorithms — Bitmask DP

TAP TO SUBMIT

W25 SUBMISSIONS

TAP TO GO BACK

W27 PENDING

System design practice: "Design a training observability system for a 1000-GPU cluster" — answer out loud, no notes

[1] Designing Data-Intensive Applications (Kleppmann) [2] Google SRE Book — Monitoring Distributed Systems [3] NVIDIA Collective Communications Library (NCCL) docs [4] OpenTelemetry Collector architecture

TAP TO SUBMIT

W27 SUBMISSIONS

TAP TO GO BACK

W27 PENDING

CF: Div 2 D/E — heavy-light decomposition / centroid decomposition (rating 1800–2000)

[1] Codeforces Problemset [2] CP Algorithms — Heavy-Light Decomposition

TAP TO SUBMIT

W27 SUBMISSIONS

TAP TO GO BACK

W28 PENDING

Mock interview: explain attention from first principles to a non-ML engineer

[1] Attention Is All You Need [2] The Illustrated Transformer [3] 3Blue1Brown — Attention in transformers, visually explained

TAP TO SUBMIT

W28 SUBMISSIONS

TAP TO GO BACK

W28 PENDING

CF: Div 2 D/E — flows and matchings (rating 1800–2000)

[1] Codeforces Problemset [2] CP Algorithms — Max Flow

TAP TO SUBMIT

W28 SUBMISSIONS

TAP TO GO BACK