PRINCIPAL COMPONENT ANALYSIS

Principal component analysis (PCA) is an algebraic method to obtain a representation of the data that emphasizes the differences among the samples (ie, their scatter). PCA is good for reconstruction: it provides a compact representation of the most important features of the samples. It is not good for discrimination: if the samples belong to two classes, PCA does not necessarily emphasize the differences between the classes. The PCA representation is decorrelated: it is a projection technique and the components are independent as the projection axes are orthogonal.

Let X1, ..., Xn be the n samples, each being a vector of size m, ie, the components of the sample vectors are Xi,a, where i=1..n is the sample index, and a=1..m is the component index. The mean of the samples is

Xoa = (1/n) ∑i=1..n Xi,a

and the reduced samples are denoted Yi,a = Xi,a - Xoa. The covariance of the samples is the mxm matrix (up to a factor 1/n)

Ca,b = ∑i Yi,a Yi,b

The principal components are the eigenvectors of Ca,b,

b Ca,b Uk,b = Lk Uk,b

where Lk denote the eigenvalues, k=1..m. The covariance matrix is positive semidefinite, therefore its eigenvalues are non-negative. However its rank is at most n and therefore it is singular when n<m, ie, when the number of samples is much smaller than the size of the vector space. In this case we can find the non-zero n eigenvectors and eigenvalues by considering the nxn matrix

Di,j = ∑a Yi,a Yj,a

Given the eigenvalues of Di,j,

j Di,j Vk,j = Lk Vk,i

where k=1..n, the vectors

Uk,a = ∑i Yi,a Vk,a

are the first n eigenvalues of Ca,b, with eigenvalues Lk.


Fisher Linear Discriminant Analysis

The Fisher Linear Discriminant Analysis (FDA or LDA) is aimed to overcome the poor discriminability of PCA, while retaining the capability of dimensionality reduction. The basic idea is to choose a base that emphsizes the discrimination between the classes.

Given k classes, with nk samples Xk,i each, the intraclass covariance is

CW = ∑k (nk/n ) (XkM - XM) (XkM - XM)T

where XkM is the mean vector of the class k and XM is the global mean. This is a matrix of size mxm.

The interclass covariance is (again a matrix of size mxm),

CB = (1/n) ∑ki Yk,i Yk,iT

where Yk,i is the i-th sample of class k normalized to the mean of the class.
Notice that C = CB + CW

If CW is non singular the optimal basis is the matrix with orthonormal columns that maximizes tha ratio of the determinant of CB to that of CW. It can be shown that the transformation matrix Z solves the eigenvalue problem (i = 1, ..., N)

CB Zi = Li CW Zi

The first eigenvectors Zi form the most discriminant basis (Fisher basis).

The rank of CW is at most n - k. When the intraclass covariance is singular, for instance because the number of samples is smaller than their size, one can perform a dimensionality reduction using the PCA, thus reducing from m-dim vectors to n - k size vectors and apply the FDA to them. The first reduction should have complete reconstruction properties due to the features of PCA. Next a new basis, more discriminant is chosen by the FDA in the reduced representation.


Linear Feature Analysis

The Linear Feature Analysis (LFA) is yet another technique for extracting a small set of most significant features that has both good reconstruction properties and small dimensionality. Like all the reconstruction techniques there is a tradeoff between reconstruction and generalization, therefore it has limited discrimination capabilities.

...



Marco Corvi - Page hosted by geocities.com.