A.F. Kohn, L.G.M. Nakano, M. Oliveira e Silva, "A class discriminability measure based on feature space partitioning" Pattern Recogn. 29 (1996) 873-887.
A classification problem consists of assigning classes to test items. Suppose to have a population of items that belong to a set of classes,
This task is tackled by measuring some features on the items and trying to discriminate the class by the values of these features. In the letter/character example the features might be the positions and directions of the strokes of the letter.
Let the features be
Let the classes have a-priori probabilities
Given the knowledge of a feature X a better guess can be made, maximizing the a-posteriori probabilities:
It can be shown that P(e|X) ≤ P(e) .
P(e) - P(e|X) | = | ∑x p(x) ( Max P(k|x) - Max P(k) ) |
= | ∑x ( Max P(x,k) - Max p(x)P(k) ) | |
≥ | Maxk ∑x P(x,k) - Maxk P(k) |
It is easy to see that a feature is irrelevant if and only if this feature is identically distributed among the classes:
A feature X is unuseful if it does not help to reduce the classification error, ie, P(e|X) = P(e).
If a feature is irrelevant it is also unuseful. However a feature can be unuseful but not irrelevant.