Image Processing and Recognition

K-NEAREST NEIGHBOR

M. Goldstein, "K-Nearest Neighbor classification" IEEE Trans. Inform. Th. 18 (1972) 627-630.

Nearest Neighbor

The Nearest Neighbor (NN) classification method assigns to a sample X the class Y_i of the training sample X_i that is closest to X, in a suitable metric, usually the Euclidean norm or Mahalanobis.

K Nearest Neighbor

The K-NN classification method is used in binary classification tasks and assigns to a sample X the class that appears the most among the k closest training samples. k is assumed to be an odd integer. This is equivalent to assign the class for which the j-th distance training sample is closer, where j = (k+1)/2.

We denote X'_i the training samples of the first class and X"_i those of the second class. Given a test sample X let H'(z)=Prob( d(X',X)<z ) and H"(z) analogously. Let also X'_j(X) denotes the j-th closest X' to X, and similarly for X"_j(X). We want to estimate the probability

Prob( d(X"_j(X),X) < d(X'_j(X),X) )

The event can be decomposed into disjoint events

A_k="Exactly k X" are closer to X than X'_j(X)

for k=j, j+1, ..., n

The probability of these events are

Prob( A_k ) = Int n!/k!(n-k)! H"_k(z) Prob( j-1 X' closer to X than z, out of n-1 ) n dH'(z)

where the probability in the integral is (n-1)!/(j-1)!(n-j)! H'^j-1(1-H')^n-j.

From this formul ait is possible to estimate an upper bound to the probability of misclassification for the K-NN. ...

Marco Corvi - Page hosted by geocities.com.