The Math Behind Neural Networks

A 8-part series.

Part 1
Vectors and Matrices: The Language Neural Networks Speak

A ground-up introduction to vectors, matrices, dot products, and matrix multiplication — the operations every neural network is built from.

May 10, 2026
Part 2
Derivatives and Gradients: Teaching Machines to Improve

Derivatives, the chain rule, partial derivatives, the gradient, and gradient descent — the calculus that drives every step of neural network learning.

May 10, 2026
Part 3
Probability and the Gaussian: How Neural Networks Express Uncertainty

Probability distributions, the Gaussian, softmax, and cross-entropy loss — the tools neural networks use to express uncertainty and produce predictions.

May 10, 2026
Part 4
A Neural Network from Scratch: Perceptrons, Layers, and Forward Pass

Build a neural network layer by layer — the perceptron, activation functions (ReLU, sigmoid, tanh), and a complete worked forward pass.

May 10, 2026
Part 5
Backpropagation: How Neural Networks Learn from Mistakes

A complete walkthrough of backpropagation — the chain rule applied to computation graphs. Includes a worked numerical example through a 2-layer network.

May 10, 2026
Part 6
Embeddings and Similarity: Turning Words into Vectors

How neural networks represent words as dense vectors, why dot products measure similarity, and how cosine similarity finds related concepts.

May 10, 2026
Part 7
The Attention Mechanism: How Transformers Focus

A detailed walkthrough of scaled dot-product attention — Query, Key, and Value matrices, the softmax operation, and a complete numerical example.

May 10, 2026
Part 8
The Transformer and GPT: Putting It All Together

Multi-head attention, positional encoding, layer normalisation, and the feed-forward sublayer. A complete step-by-step forward pass through GPT.

May 10, 2026