The Dirichlet distribution is a generalization of the beta distribution. In Bayesian statistics, it is commonly used as the conjugate prior to the multinomial distribution, hence it can be used to model the uncertainty of a random vector of probabilities. It has a wide range of applications including Bayesian analysis, text mining, statistical genetics, and nonparametric inference. This article gives an intuitive introduction to Dirichlet distribution and shows how it is connected to the multinomial distribution. In addition, it shows how it can be modeled and visualized in Python.
Definition
Suppose that the continuous random variables X₁, X₂, …Xₖ (k≥2) form the random vector X defined as:
We also define the vector α as:
where
Now the random vector X is said to have Dirichlet distribution with parameter α if it has the following joint PDF:
The function B(α) is called the multivariate beta function and is defined as
where Г(x) is the gamma function. If the random vector X has a Dirichlet distribution with parameter α, it is denoted by X ~ Dir(α). The multivariate beta function is included in the joint PDF to normalize it. The joint PDF should integrate to 1 over its domain:
Hence, we have:
Based on Equation 1, the values that the random variables X₁, X₂, …Xₖ take should meet the following conditions to have fₓ(x)>0:
These conditions define the support of the Dirichlet distribution. The support of X, and of its distribution, is the set of all x (the values that X can take) where fₓ(x)>0. If X has k elements, the support of X with a Dirichlet distribution is a k-1 dimensional simplex. A simplex is a bounded linear manifold that is created because of the constraints of Equation 3. A simplex is the generalization of the notion of a triangle to higher dimensions. Hence, a k-1…