I recently went off on a tangent trying to figure out how white noise works, and I found that there is a lot of strangeness to it that may not be apparent at a first glance. The content in this post is primarily from:

TLDR: We can’t just define a continuous-time white noise process as an $$\mathbb{R}$$-indexed collection of uncorrelated normal random variables because such a collection does not exist.

The Problem With White Noise

Let’s start with a few simple definitions. In the following we will assume we are working over the well-behaved probability space $$\mathcal{P} = ([0,1], \mathcal{B}, \mu)$$, where $$\mu$$ is the Lebesgue measure on the Borel $$\sigma$$-algebra $$\mathcal{B}$$.

A real-valued stochastic process $$X$$ is a random variable valued function such that $$X_t$$ is a real-valued random variable, or a measurable function from $$\mathcal{P}$$ to $$\mathbb{R}$$. We can think of $$t$$ as representing time, but this does not need to be the case.

A real-valued stochastic process is stationary when its unconditional joint probability distribution does not change when shifted in $$t$$. That is, for any $$\tau \in \mathbb{R}$$ and $$t_1, ..., t_n \in \mathbb{R}$$ we have that the joint distributions of the sets of random variables $$(X_{t_1}, ..., X_{t_n})$$ and $$(X_{t_1 + \tau}, ..., X_{t_n + \tau})$$ are the same.

Continuous-time white noise is often defined as a stationary real-valued stochastic process where all $$X_t = \mathcal{N}(0,1)$$ and for all $$\tau$$ we have that $$E[X(t)]E[X(t+\tau)]$$ is $$\sigma^2$$ when $$\tau=0$$ and $$0$$ otherwise. That is, for all $$t_1,t_2$$, the random variables $$X_{t_1}$$ and $$X_{t_2}$$ are uncorrelated normal random variables with variance $$\sigma^2$$.

However, such a collection cannot exist! To see this, let’s define the collection of random variables $$Y_t = X_t * 1_{\|X_t\| \leq 1}$$. Then we have that $$Y_t$$ is square integrable, and therefore in $$L^2([0,1], \mu)$$. However, $$L^2([0,1], \mu)$$ is separable, and can therefore only countain countably many mutually orthogonal elements. This implies that not all $$X_t$$ can be mutually orthogonal.

Working around the Problem

To resolve this, we need to use some pretty beefy mathematical machinery. Basically, while we can’t define continuous-time white noise to be a random variable valued function over $$t$$, we can define it as a random variable valued generalized function.

To start, let’s define the Brownian Motion Process $$\mathcal{B}$$ to be a stochastic process that satisfies:

• $\mathcal{B}_0 = 0$
• If $$0 < t_1 < t_2 < ... < t_n$$ then the random variables $$\mathcal{B}_{t_k} - \mathcal{B}_{t_{k-1}}$$ for $$k=1,2,...n$$ are independent.
• For each $$t$$ and $$\tau >= 0$$ the random variable $$\mathcal{B}_{t+\tau} - \mathcal{B}_t$$ has distribution $$\mathcal{N}(0, \tau)$$.
• For almost all $$\omega \in [0,1]$$, the function $$\mathcal{B}_t(\omega)$$ is everywhere continuous in $$t$$.

The formal derivative in $$t$$ of $$\mathcal{B}$$ is the continuous-time white noise process. It isn’t too hard to see why this should be the case: by the conditions above, the random variables formed from the increments in Brownian motion are independent and normally distributed. The differentiation process just continuous-ifies this. This suggests that we could reasonably hand wave white noise to be the derivative in $$t$$ of the Brownian motion process. Of course, things are more complex than this. In fact, for almost every $$\omega \in [0,1]$$ the function $$\mathcal{B}_t(\omega)$$ is nowhere continuous in $$t$$.

In order to resolve this, we need to switch from talking about functions to talking about generalized functions. A generalized function is a “linear functional on a space of test functions”. This is a mouthful, but it’s essentially just a linear mapping from a set of smooth functions of compact support (the test functions) into $$\mathbb{R}$$. We can think of a generalized function as behaving somewhat like a probability measure over the set of test functions (although a true mathematician might crucify me for saying this…).

We can view any continuous function as a generalized function. For example, if we write the application of the generalized function corresponding to Brownian motion to the test function $$\psi$$ as $$(\mathcal{B}, \psi)$$ then we have:

$(\mathcal{B}, \psi) = \int_{0}^{\infty} \mathcal{B}(t) \psi(t) dt$

Note that $$(\mathcal{B}, \psi)$$ is itself a random variable that maps $$\omega \in [0,1]$$ to $$\mathbb{R}$$. Now we define the derivative of the generalized function $$F$$ to be the generalized function $$F'$$ such that $$(F', f) = -(F, f')$$. Therefore, the derivative of the generalized function corresponding to Brownian motion is the following random variable valued generalized function, which we can think of as a more formal definition of continuous-time white noise:

$(\mathcal{B}', \psi) = -(\mathcal{B}, \psi') = -\int_{0}^{\infty} \mathcal{B}(t) \psi'(t) dt$