At first glance, PCA and Laplacian Eigenmaps seem both very similar. We can view both algorithms as constructing a graph from our data, choosing a matrix to represent this graph, computing the eigenvectors of this matrix, and then using these eigenvectors to determine low-dimensionality embeddings of our data.
However, the algorithms can produce very different results, primarily due to the fact that PCA is a linear dimensionality reduction algorithm that makes few assumptions, whereas Laplacian Eigenmaps is a nonlinear dimensionality reduction algorithm that assumes that the data lies on a low-dimensional manifold.
We can explicitly characterize this divergence in terms of three key differences between the algorithms:
- The graph we construct from our data
- The choice of matrix representation of this graph
- The choice of eigenvectors of this matrix
In the following exposition, we will assume we have \(n\) data points and \(d\) features, represented in a data matrix \(X\) with shape \(n \times d\). For simplicity we will assume that our data is already normalized and zero-centered.
PCA
In PCA, we form a fully connected graph such that the edge between each pair of vertices has weight equal to the dot product of the corresponding data vectors. We represent this graph with the adjacency matrix, \(X^T X\). For some \(d' < d\) our objective is then to construct the \(n \times d'\) matrix \(X'\) such that the following quantity is minimized.
\[\sum_{i,j} (X'^T X' - X^T X)_{ij}^2\]In order to do this, we compute the eigenvectors \(v\) corresponding to the \(d'\) largest eigenvalues \(\lambda\) of the \(n \times n\) matrix \(X^T X\). In practice we implement this with SVD rather than explicitly form the matrix \(X^T X\). We then define \(X'_{i,j}\) to be \(v_{j_i}*\lambda_{j}^{1/2}\). That is, the jth element of the ith data point’s embedding is the product of the square root of the jth largest eigenvalue and the ith element of the corresponding eigenvector.
Because PCA operates over the fully connected graph, all pairwise relationships between data vectors (i.e. elements of \(X^T X\)) are considered equally important, and the optimization objective is framed in terms of reconstructing the exact distances between points.
Laplacian Eigenmaps
In contrast, in Laplacian Eigenmaps we form a different graph based on a nonlinear transformation of our data, and we represent this graph with its Laplacian matrix.
We begin by choosing a number \(k\) and building a graph such that there is a unit-weight edge connecting the vertices \(v_i\) and \(v_j\) if \(v_i\) is one of the kth nearest neighbors of \(v_j\) or vice-versa. In some implementations of Laplacian Eigenmaps, the weight of this edge is defined to be inversely proportional to the distance between the data vectors corresponding to \(v_i\) and \(v_j\).
Now for some \(d' < d\) our objective is then to construct the \(n \times d'\) matrix \(X'\) such that the following quantity is minimized subject to a few constraints around orthogonality and embedding normalization.
\[\sum_{i,j \ | \ i \sim j} (x'_i - x'_j)^2\]Note that \(x'_i\) is the ith row of \(X'\) and that \(i \sim j\) if there is an edge between \(v_i\) and \(v_j\):
In order to do this, we compute the eigenvectors \(v\) corresponding to the \(d'\) smallest eigenvalues for the generalized eigenproblem \(Ly = \lambda Dy\), where \(D\) is the \(n \times n\) diagonal matrix where the ith diagonal entry is the degree of \(v_i\). Note that this is equivalent to computing the eigenvectors of the matrix \(D^{-1}L\). We then define \(X'_{i,j}\) to be \(v_{j_i}\). That is, the jth element of the ith data point’s embedding is the ith element of the eigenvector corresponding to the jth smallest eigenvalue.
Unlike PCA, the Laplacian Eigenmaps algorithm does not try to preserve exact pairwise distances, or even relative pairwise distances between far away points. The algorithm exclusively focuses on mapping the embeddings corresponding to nearest neighbors as close together as possible.
Summary
Graph
- PCA: Fully connected graph where weights of the graph are determined by the dot product between data points
- Laplacian Eigenmaps: Graph where an edge only exists between \(v_i\) and \(v_j\) if \(v_i\) is one of the kth nearest neighbors of \(v_j\) or vice-versa
Matrix
- PCA: The adjacency matrix
- Laplacian Eigenmaps: The Laplacian matrix
Eigenvectors
- PCA: The eigenvectors corresponding to the \(d'\) largest eigenvalues
- Laplacian Eigenmaps: The eigenvectors corresponding to the \(d'\) smallest eigenvalues