Stats: Dirichlet process

In probability theory, the Dirichlet process (after Peter Gustav Lejeune Dirichlet) is a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose domain is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables, that is, how likely it is that the random variables are distributed according to one or another particular distribution.

The Dirichlet process is specified by a base distribution and a positive real number (alpha) called the concentration parameter. The base distribution (H) is the expected value of the process, that is, the Dirichlet process draws distributions “around” the base distribution in the way that a normal distribution draws real numbers around its mean. However, even if the base distribution is continuous, the distributions drawn from the Dirichlet process are almost surely discrete. The concentration parameter specifies how strong this discretization is: in the limit of alpha –> 0, the realizations are all concentrated on a single value, while in the limit of alpha –> infinity the realizations become continuous. In between the two extremes the realizations are discrete distributions with less and less concentration as  increases.

The Dirichlet process can also be seen as the infinite-dimensional generalization of the Dirichlet distribution. In the same way as the Dirichlet distribution is the conjugate prior for the categorical distribution, the Dirichlet process is the conjugate prior for infinite, nonparametric discrete distributions.

The Dirichlet process was formally introduced by Thomas Ferguson in 1973[1] and has since been applied in data mining and machine learning, among others for natural language processing, computer vision and bioinformatics.