View on GitHub

PhDthesis

PhD thesis in Applied Physics

Mathematical background

TThe exact knowledge of prior probabilities and conditional probabilities are generally hard to evaluate thus a parametric approach is often needed. A parametric approach aim to create reasonable hypothesis about the data distribution and its fundamental parameters (e.g mean, variance, …). In the following discussion, we are going to focus only on normal distributions for mathematical convenience but the results could be easily generalized.

Given the multi-dimensional form of Gauss distribution:

where is a -dimensional column vector, is the mean vector of the distribution, is the covariance matrix () and $$ \Sigma \Sigma^{-1}\SigmaG\mathbf{x}$$,

where the exponent () is called Mahalanobis distance of vector from its mean. This distance can be reduced to the Euclidean one when the covariance matrix is the identity matrix ().

The covariance matrix is always symmetric and positive semi-definite by definition (useful information for the next algorithmic strategies) so it is invertible. If the covariance matrix has only diagonal terms the multidimensional distribution can be expressed as the simple product of mono-dimensional normal distributions. In this case the main axes are parallel to the Cartesian axes.

Starting from a multi-variate Gaussian distribution 1, the Bayesian rule for classification problems can be rewritten as:

where, removing constant terms ( factors and the absolute probability density $$p(\mathbf{x}) = \sum_{i=1}^s p(\mathbf{x} w_i)\cdot P(w_i)$$) and using the monotonicity of the function, we can extract the logarithmic relation:

which is called Quadratic Discriminant function.

The dependency by the covariance matrix allows 5 different cases:

The Guassian distribution hypothesis of data should be tested before using this classifiers. It can be evaluated using statistical tests as Malkovich-Afifi based on Kolmogorov-Smirnov index or using the empirical visualization of the data points.

next »

  1. In Machine Learning it will correspond to the conditional probability density.