CLASSIFICATION

A.F. Kohn, L.G.M. Nakano, M. Oliveira e Silva, "A class discriminability measure based on feature space partitioning" Pattern Recogn. 29 (1996) 873-887.

A classification problem consists of assigning classes to test items. Suppose to have a population of items that belong to a set of classes,

C1 C2 ... CK
The classification problem is to tell to which class any given item belongs. For example if the population is the set of letters of a written text, and the classes are the characters of the alphabet used, the classification problem consists of telling the character of each letter.

This task is tackled by measuring some features on the items and trying to discriminate the class by the values of these features. In the letter/character example the features might be the positions and directions of the strokes of the letter.

Let the features be

X1 ... XN
We will denote the feature values of an item by xn.

Let the classes have a-priori probabilities

P(1) P(2) ... P(K)
The a-priori error is the classification error obtained by assigning the most likely class to each item, ie, the clessification with no information but the knowledge of the a-priori probabilistic distribution of the items in the classes:
P(e) = 1 - Maxk Pk

Given the knowledge of a feature X a better guess can be made, maximizing the a-posteriori probabilities:

P(k|x) = P(k)   P(x|k) / ∑h P(h) P(x|k)
This is Bayes formula for the conditional probability. The denominator is actually P(X=x) In this case the error becomes
P(e|X) = ∑x p(x) ( 1 - Maxk P(k|x) )

It can be shown that P(e|X) ≤ P(e) .


Indeed we have
P(e) - P(e|X) = x p(x) ( Max P(k|x) - Max P(k) )
  = x ( Max P(x,k) - Max p(x)P(k) )
  Maxkx P(x,k) - Maxk P(k)


A feature X is irrelevant if the a-posteriori probabilities P(k|X) are all equal.

It is easy to see that a feature is irrelevant if and only if this feature is identically distributed among the classes:

P(x|1) = P(x|2) = ... = P(x|K)

A feature X is unuseful if it does not help to reduce the classification error, ie, P(e|X) = P(e).

If a feature is irrelevant it is also unuseful. However a feature can be unuseful but not irrelevant.

More about relevance



Marco Corvi - Page hosted by geocities.com.