K-NEAREST NEIGHBOR

M. Goldstein, "K-Nearest Neighbor classification" IEEE Trans. Inform. Th. 18 (1972) 627-630.

Nearest Neighbor

The Nearest Neighbor (NN) classification method assigns to a sample X the class Yi of the training sample Xi that is closest to X, in a suitable metric, usually the Euclidean norm or Mahalanobis.

K Nearest Neighbor

The K-NN classification method is used in binary classification tasks and assigns to a sample X the class that appears the most among the k closest training samples. k is assumed to be an odd integer. This is equivalent to assign the class for which the j-th distance training sample is closer, where j = (k+1)/2.

We denote X'i the training samples of the first class and X"i those of the second class. Given a test sample X let H'(z)=Prob( d(X',X)<z ) and H"(z) analogously. Let also X'j(X) denotes the j-th closest X' to X, and similarly for X"j(X). We want to estimate the probability

Prob( d(X"j(X),X) < d(X'j(X),X) )

The event can be decomposed into disjoint events

Ak="Exactly k X" are closer to X than X'j(X)

for k=j, j+1, ..., n

The probability of these events are

Prob( Ak ) = Int   n!/k!(n-k)!   H"k(z) Prob( j-1 X' closer to X than z, out of n-1 ) n dH'(z)

where the probability in the integral is (n-1)!/(j-1)!(n-j)! H'j-1(1-H')n-j.

From this formul ait is possible to estimate an upper bound to the probability of misclassification for the K-NN. ...



Marco Corvi - Page hosted by geocities.com.