The Central Limit Theorem

mathprobability

The central limit theorem says in simple terms that the sum of many independent samples from any distribution with finite variance, after normalizing, converges to a normal (or Gaussian) distribution. There's a lot to unpack here, so I'll start with a simple concrete example.

Let's say you're flipping a fair coin, and if it lands on heads, you add to some counter, and if it lands on tails, you subtract from the counter. The counter starts at . If you perform this procedure times where is large, the central limit theorem says the probability distribution of the counter is roughly a normal distribution with mean and standard deviation . If the value of the counter is normalized by dividing by , this is equivalent to saying that the probability distribution of the normalized counter is roughly a normal distribution with mean and standard deviation .

This probability distribution function is given by the formula

This function can be used to determine the probability that the counter ends up in a certain range after a large number of coin flips. For example, the probability that the normalized counter ends up between and after, say, a million coin flips is roughly

As a sanity check, this can also be computed discretely by unnormalizing the counter so the range is to . After an even number of flips, the counter always lands on an even number, so this is the probability of getting heads out of flips (i.e. more heads than tails).

Where comes from

You might be wondering why and are part of the function . How does flipping coins relate to circles and quadratic exponential decay?

The setup with the normalized counter value can be reformulated as sampling an -dimensional vector where each dimension is an independent sample of the coin flip and thus has value . Then, the normalized counter value is equal to the signed length of the projection of the sampled vector onto , or alternatively just the dot product of those two vectors.

Every sampled vector has length exactly since each component is , so all possible sample vectors lie on a hypersphere of radius . The interesting intuition here is that generating the normalized counter value can be reformulated as sampling a random vector on a hypersphere and projecting it onto . This intuition still applies even if the sampled distribution were continuous rather than a discrete coin flip, except it would be more like sampling from a hyperball, which is like a hypersphere plus its interior. At the limit, sampling from a hypersphere or hyperball behave the same because basically all of the volume of a hyperball is at the surface.

Now, can be interpreted as being proportional to the probability that a random sampled vector from a hyperball is orthogonal to , because those vectors are analogous to having the normalized counter value set to . The cross section of an -dimensional hyperball is an -dimensional hyperball, and the volume of an -dimensional hyperball is proportional to , which is why has in its denominator.

Where comes from

The ratio can be approximated through discrete counting. A normalized counter value of after flips means getting more heads than tails or heads out of flips, so is approximately . Expanding the factorials roughly gives

For large , this simplifies to approximately

From the previous post on , as , the expression on the RHS inside the outermost exponent approaches , so this simplifies to .

Properties of normal distributions

The geometric perspective also gives intuition for several well-known properties of normal distributions.

The sum of two independent normal random variables is still normal. From the vector perspective, each of the normal random variables can be reinterpreted as high-dimensional sampled vectors projected onto , so their sum can be reinterpreted as adding together their sampled vectors and projecting onto . Therefore, the sum of the two independent normal random variables can be derived by adding together the underlying distributions used to create them and applying the central limit theorem to that.

More generally, any linear combination of independent normal random variables is still normal, since the linear combination can be applied to all underlying distributions before using the central limit theorem to obtain the normal distribution.

The Fourier transform of a normal distribution is also normal. In the vector perspective, the Fourier transform performs some kind of rotation on the vectors. Since the distribution on a hypersphere or hyperball is rotationally symmetric, the procedure with taking lengths of projections onto a vector still produces a normal distribution.