The Math Behind Neural Networks

A 8-part series.

  1. Part 1

    Vectors and Matrices: The Language Neural Networks Speak

    A ground-up introduction to vectors, matrices, dot products, and matrix multiplication — the operations every neural network is built from.

  2. Part 2

    Derivatives and Gradients: Teaching Machines to Improve

    Derivatives, the chain rule, partial derivatives, the gradient, and gradient descent — the calculus that drives every step of neural network learning.

  3. Part 3

    Probability and the Gaussian: How Neural Networks Express Uncertainty

    Probability distributions, the Gaussian, softmax, and cross-entropy loss — the tools neural networks use to express uncertainty and produce predictions.

  4. Part 4

    A Neural Network from Scratch: Perceptrons, Layers, and Forward Pass

    Build a neural network layer by layer — the perceptron, activation functions (ReLU, sigmoid, tanh), and a complete worked forward pass.

  5. Part 5

    Backpropagation: How Neural Networks Learn from Mistakes

    A complete walkthrough of backpropagation — the chain rule applied to computation graphs. Includes a worked numerical example through a 2-layer network.

  6. Part 6

    Embeddings and Similarity: Turning Words into Vectors

    How neural networks represent words as dense vectors, why dot products measure similarity, and how cosine similarity finds related concepts.

  7. Part 7

    The Attention Mechanism: How Transformers Focus

    A detailed walkthrough of scaled dot-product attention — Query, Key, and Value matrices, the softmax operation, and a complete numerical example.

  8. Part 8

    The Transformer and GPT: Putting It All Together

    Multi-head attention, positional encoding, layer normalisation, and the feed-forward sublayer. A complete step-by-step forward pass through GPT.