White Noise is Pretty Weird

I recently went off on a tangent trying to figure out how white noise works, and I found that there is a lot of strangeness to it that may not be apparent at a first glance. The content in this post is primarily from:

TLDR: We can’t just define a continuous-time white noise process as an \(\mathbb{R}\)-indexed collection of uncorrelated normal random variables because such a collection does not exist.

The Problem With White Noise

Let’s start with a few simple definitions. In the following we will assume we are working over the well-behaved probability space \(\mathcal{P} = ([0,1], \mathcal{B}, \mu)\), where \(\mu\) is the Lebesgue measure on the Borel \(\sigma\)-algebra \(\mathcal{B}\).

A real-valued stochastic process \(X\) is a random variable valued function such that \(X_t\) is a real-valued random variable, or a measurable function from \(\mathcal{P}\) to \(\mathbb{R}\). We can think of \(t\) as representing time, but this does not need to be the case.

A real-valued stochastic process is stationary when its unconditional joint probability distribution does not change when shifted in \(t\). That is, for any \(\tau \in \mathbb{R}\) and \(t_1, ..., t_n \in \mathbb{R}\) we have that the joint distributions of the sets of random variables \((X_{t_1}, ..., X_{t_n})\) and \((X_{t_1 + \tau}, ..., X_{t_n + \tau})\) are the same.

Continuous-time white noise is often defined as a stationary real-valued stochastic process where all \(X_t = \mathcal{N}(0,1)\) and for all \(\tau\) we have that \(E[X(t)]E[X(t+\tau)]\) is \(\sigma^2\) when \(\tau=0\) and \(0\) otherwise. That is, for all \(t_1,t_2\), the random variables \(X_{t_1}\) and \(X_{t_2}\) are uncorrelated normal random variables with variance \(\sigma^2\).

However, such a collection cannot exist! To see this, let’s define the collection of random variables \(Y_t = X_t * 1_{\|X_t\| \leq 1}\). Then we have that \(Y_t\) is square integrable, and therefore in \(L^2([0,1], \mu)\). However, \(L^2([0,1], \mu)\) is separable, and can therefore only countain countably many mutually orthogonal elements. This implies that not all \(X_t\) can be mutually orthogonal.

Working around the Problem

To resolve this, we need to use some pretty beefy mathematical machinery. Basically, while we can’t define continuous-time white noise to be a random variable valued function over \(t\), we can define it as a random variable valued generalized function.

To start, let’s define the Brownian Motion Process \(\mathcal{B}\) to be a stochastic process that satisfies:

\[\mathcal{B}_0 = 0\]
If \(0 < t_1 < t_2 < ... < t_n\) then the random variables \(\mathcal{B}_{t_k} - \mathcal{B}_{t_{k-1}}\) for \(k=1,2,...n\) are independent.
For each \(t\) and \(\tau >= 0\) the random variable \(\mathcal{B}_{t+\tau} - \mathcal{B}_t\) has distribution \(\mathcal{N}(0, \tau)\).
For almost all \(\omega \in [0,1]\), the function \(\mathcal{B}_t(\omega)\) is everywhere continuous in \(t\).

The formal derivative in \(t\) of \(\mathcal{B}\) is the continuous-time white noise process. It isn’t too hard to see why this should be the case: by the conditions above, the random variables formed from the increments in Brownian motion are independent and normally distributed. The differentiation process just continuous-ifies this. This suggests that we could reasonably hand wave white noise to be the derivative in \(t\) of the Brownian motion process. Of course, things are more complex than this. In fact, for almost every \(\omega \in [0,1]\) the function \(\mathcal{B}_t(\omega)\) is nowhere continuous in \(t\).

In order to resolve this, we need to switch from talking about functions to talking about generalized functions. A generalized function is a “linear functional on a space of test functions”. This is a mouthful, but it’s essentially just a linear mapping from a set of smooth functions of compact support (the test functions) into \(\mathbb{R}\). We can think of a generalized function as behaving somewhat like a probability measure over the set of test functions (although a true mathematician might crucify me for saying this…).

We can view any continuous function as a generalized function. For example, if we write the application of the generalized function corresponding to Brownian motion to the test function \(\psi\) as \((\mathcal{B}, \psi)\) then we have:

\[(\mathcal{B}, \psi) = \int_{0}^{\infty} \mathcal{B}(t) \psi(t) dt\]

Note that \((\mathcal{B}, \psi)\) is itself a random variable that maps \(\omega \in [0,1]\) to \(\mathbb{R}\). Now we define the derivative of the generalized function \(F\) to be the generalized function \(F'\) such that \((F', f) = -(F, f')\). Therefore, the derivative of the generalized function corresponding to Brownian motion is the following random variable valued generalized function, which we can think of as a more formal definition of continuous-time white noise:

\[(\mathcal{B}', \psi) = -(\mathcal{B}, \psi') = -\int_{0}^{\infty} \mathcal{B}(t) \psi'(t) dt\]