Dot Products

Dot products are amazing when you consider how simple they are to compute, yet they're very expressive with what they can represent. They're used all over the place in machine learning in the form of matrix multiplication, where a single matrix multiplication consists of an array of dot products.

Let's start with the basics. Two common formulations of the dot product for vectors are:

and

where is the angle between and , and .

Proof: Look at the squared distance between the endpoints, . Expanding in coordinates gives

Geometrically, the law of cosines on the triangle formed by and gives

Equating the two expressions and canceling terms yields

which proves the equivalence.

The dot product is also linear, which lets you distribute the dot product over addition, e.g. . This follows directly from the sum definition. It is also commutative due to symmetry, so you have .

If we fix , we can think of a dot product as a computationally cheap way to determine how aligned two vectors are, with the computation being a mix of addition and multiplication. The more aligned they are, the smaller the angle between them, and the cosine of a smaller angle is closer to . If they are misaligned and pointing in orthogonal directions, the angle between them is , so the cosine of that angle is . If they are aligned in opposite directions, the angle between them is , so the cosine of that angle is . If and aren’t unit length, you could still use to determine alignment.

It's also useful in representing projections. If you had a vector that you wanted to decompose into how aligned it is with and the rest that's unrelated (orthogonal) to , then you could use the dot product to determine the length of the projection of onto , and then subtract that from to determine the unrelated/orthogonal component.

Why should you care about this? There are a lot of possible interpretations and connections you can make with the different uses of dot products.

It connects the algebra and computation to the geometric interpretation of vectors.
It tells you how aligned two vectors are.
Matrix multiplication is just a lot of dot products, and it performs a change of basis by determining how aligned some vector is to a given set of basis vectors.
It connects the vectors in Cartesian coordinates to their hyperspherical coordinates where every coordinate is an angle except for a single magnitude coordinate.
It adds meaning to vectors in machine learning where each vector represents a combination of features, and dot products can extract specific features from it.
You can add two vectors to get the combined features of both. This follows from linearity.
It gives intuition behind why the residual stream in LLMs is interpretable. The residual stream is the primary state vector of an LLM.
The attention mechanism in LLMs can be interpreted as performing a lookup on extracted features and adding features to the residual stream based on matches in the lookup.
It's useful in Fourier analysis in comparing how aligned a signal vector is to some frequency vector.

I'm sure there are plenty more connections and interpretations I'm still missing, but this should help motivate why the dot product (and linear algebra in general) is so important.