Image Processing and Recognition

INFORMATION THEORY

We consider random variables X over a (discrete) space X. The probability distribution of X, p(X), can be viewed as a finite measure over X. Indeed

∑_x p(x) = 1

The entropy (or information measure) of a subset A of X is

H(A) = - ∑_{x in A} p(x) ln( p(x) ) In particular H(X) = H(X) is the entropy of the distribution of the random variable X

For two random variables X and Y we can construct

the individual entropies H(X) and H(Y)
the joint entropy H(X,Y)
the conditional extropies H(X|Y) and H(Y|X)
the information measure I(X;Y)
the conditional information measure I(X;Y|Z)

where

H(X\|Y=y)	=	- ∑_x p(x\|y) ln( p(x\|y) )
I(X;Y)	=	H(X) - E_Y[ H(X\|y) ]
	=	H(X) + H(Y) - H(X,Y)

Notice that the information measure is symmetric. Furthermore if X and Y are independent, thus p(x|y) = p(x) and p(x,y)=p(x)p(y), the conditional entropy of X to Y is equal to the entropy of X and the information measure I(X;Y)=0. Also, for independent variables the joint entropy is the sum of the entropies of the two variables.

The entropy and information measure are amenable to a measure-theoretic point of view. To any random variable we can associate a set. Next we construct the sigma-algebra (ie, we consider also all possible unions, intersections, and complements) over these sets. For two random variable this is rather small, with eight elements. For N random variables it grows considerably large.

Finally we define a measure over this sigma-algebra, by

m(X)	=	H(X)
m(X u Y)	=	H(X,Y)
m(X n Y)	=	I(X;Y)
m(X - Y)	=	m(X n Y^c) = H(X\|Y)

This point of view allows a graphical representation, via Venn diagrams, of many important information theoretic relations. For example, the conditional information measure can be expressed as

I(X;Y \| Z)	=	H(X\|Z) - H(X\|Y,Z)
	=	H(X,Z) + H(Y,Z) - H(x,Y,Z) - H(Z)

C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27, 1948, 379-423
R.W.Yeung, A new outlook on Shannon's information measure IEEE Trans. Inform Th. 37, 1991, 466-474

Marco Corvi - Page hosted by geocities.com.

H(X\|Y=y)	=	- ∑_x p(x\|y) ln( p(x\|y) )
I(X;Y)	=	H(X) - E_Y[ H(X\|y) ]
	=	H(X) + H(Y) - H(X,Y)