One of the primarily used dimension reduction techniques in data science and machine learning is Principal Component Analysis (PCA). Previously, We have already discussed a few examples of applying PCA in a pipeline with Support Vector Machine and here we will see a probabilistic perspective of PCA to provide a more robust and comprehensive understanding of the underlying data structure. *One of the biggest advantages of Probabilistic PCA (PPCA) is that it can handle missing values in a dataset, which is not possible with classical PCA.* Since we will discuss Latent Variable Model and Expectation-Maximization algorithm, you can also check this detailed post.

What you can expect to learn from this post?

- Short Intro to PCA.
- Mathematical building blocks for PPCA.
- Expectation Maximization (EM) algorithm or Variational Inference? What to use for parameter estimation?
- Implementing PPCA with TensorFlow Probability for a toy dataset.

Let’s dive into this!

## 1. Singular Value Decomposition (SVD) and PCA:

One of the major important concepts in Linear Algebra is SVD and it’s a factorization technique for real or complex matrices where for example a matrix (say *A*) can be factorized as:

where *U*,*Vᵀ* are orthogonal matrices (transpose equals the inverse) and Σ would be a diagonal matrix. *A* need not be a square matrix, say it’s a *N*×*D* matrix so we can already think of this as our data matrix with *N* instances and *D* features. *U*,*V* are square matrices (*N*×*N*) and (*D*×*D*) respectively, and Σ will then be an *N*×*D* matrix where the *D*×*D* subset will be diagonal and the remaining entries will be zero.

We also know Eigenvalue decomposition. Given a square matrix (*B*) which is diagonalizable can be factorized as:

where *Q* is the square *N*×*N* matrix whose *i*th column is the eigenvector *q_i* of *B*, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues.