21 QUESTIONS
Q1 1 / 21 medium
Topological Sort in Backprop
In micrograd, loss.backward() builds a topological ordering of the computation graph before running _backward() on each node. The ordering is built by recursively visiting children, tracking visited nodes in a set, and appending to a list.
Why does backprop require nodes to be processed in reverse topological order? What goes wrong concretely if a node's _backward() runs before all downstream nodes have backpropped into it?