One of the primarily used dimension reduction techniques in data science and machine learning is Principal Component Analysis (PCA). Previously, We have already discussed a few examples of applying PCA in a pipeline with Support Vector Machine and here we will see a probabilistic perspective of PCA to provide a more robust and comprehensive understanding of the underlying data structure. One of the biggest advantages of Probabilistic PCA (PPCA) is that it can handle missing values in a dataset, which is not possible with classical PCA. Since we will discuss Latent Variable Model and Expectation-Maximization algorithm, you can also check this detailed post.
What you can expect to learn from this post?
- Short Intro to PCA.
- Mathematical building blocks for PPCA.
- Expectation Maximization (EM) algorithm or Variational Inference? What to use for parameter estimation?
- Implementing PPCA with TensorFlow Probability for a toy dataset.
Let’s dive into this!
1. Singular Value Decomposition (SVD) and PCA:
One of the major important concepts in Linear Algebra is SVD and it’s a factorization technique for real or complex matrices where for example a matrix (say A) can be factorized as:
where U,Vᵀ are orthogonal matrices (transpose equals the inverse) and Σ would be a diagonal matrix. A need not be a square matrix, say it’s a N×D matrix so we can already think of this as our data matrix with N instances and D features. U,V are square matrices (N×N) and (D×D) respectively, and Σ will then be an N×D matrix where the D×D subset will be diagonal and the remaining entries will be zero.
We also know Eigenvalue decomposition. Given a square matrix (B) which is diagonalizable can be factorized as:
where Q is the square N×N matrix whose ith column is the eigenvector q_i of B, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues.