Image Processing and Recognition

HIDDEN MARKOV MODELS

Hidden Markov Models are used to describe systems whose state is not directly observable. Let N be the number of states of the system, and

S₁, S₂, ..., S_N

its states. Le M be the number of observables on the systems,

V₁, V₂, ..., V_M

We introduce the probabilities

A(i,j) = P[s(t+1) = S_j | s(t) = S_i]
B(j;k) = P[v(t) = V_k | s(t) = S_i]
C(j) = P[v(t) = S_j]

Three problems are addressed in the model:

given a model L{A,B,C} and observations at times 1,2,...,T, O(1), ..., O(T), how can we compute P[O | L] ?
given the model L and the observations at times 1,2,...,T, how do we choose the best states s(1), ..., s(T) ?
how do we adjust the model L to maximize P[O | L] ?

First problem

Let S = {s(1), ..., s(T)} then

P[O | S, L] = Prod_t P[O(t) | s(t), L] = Prod_t B(s(t);O(t))

P[S | L] = C(s(1)) A(s(1),s(2) A(s(2),s(3)) ... A(s(T-1),s(T))

P[O | L] = Sum_S P[O | S,L] P[S | L] = Sum_s(1)...s(T) C(s(1)) B(s(1);O(1)) A(s(1),s(2)) B(s(2);O(2)) ... A(s(T-1),s(T)) B(s(t);O(T))

To solve this problem we introduce the forward-backward variables,

a(i;t) = P[ O(1) ... O(t), s(t)=S_i | L]

for which we have

a(i;1) = C(i) B(i; O(1))
a(j;t+1) = Sum_i a(i;t) A(i,j) B(j; O(t+1))

P[O | L] = Sum_i a(i;T)

b(i;t) = P[O(t+1) ... O(T) | s=S_i L]
b(i;T) = 1

b(i; t) = Sum_j A(i,j) B(j; O(t+1)) b(j, t+1)

P[O | L] = Sum_j C(j) b(j; 1)

Second problem

There are several possible optimization criteria. Let

c(i;t) = P[s(t)=S_i | O, L] = a(i;t) b(i;t) / P[O | L] = a(i;t) b(i;t) / Sum_j a(j;t) b(j;t)

and such that Sum_i c(i;t) = 1

The most likely state at time t is the index i that maximizes c(i;t).

Viterbi algorithm (for the best state sequence)

Let

d(i;t) = max_{s(1)...s(t-1)} P[ s(1) ... s(t-1), s(t)=S_i, O(1) ... O(t) | L]

d(j;t+1) = max_i d(i;t) A(i,j) B(j;O(t+1))
f(j;t+1) = argmax_i d(i;t) A(i,j)

d(j;1) = C(j) B(j;O(1))
f(j;1) = 0

P^* = max_j d(j;T)
s^*(T) = argmax_j d(j;T)
s^*(t) = f(s^*(t+1);t+1)

Third problem

Let

z(i,j;t) = P[s(t)=S_i, s(t+1)=S_j | O, L]
= a(i;t) A(i,j) B(j;O(t+1)) b(j;t+1) / P[O | L] where the denominator is the sum over i,j of the terms at the numerator.

c(i;t) = Sum_j z(i,j;t)

Then Sum_t c(i;t), where the sum ranges from 1 to T-1, is the expected number of transitions from S_i, and Sum_t z(i,j;t), is the expected number of transitions from S_i to S_j.

The best model parameters are

C(i) = c(i;1)

A(i,j) = Sum z(i,j;t) / Sum c(i;t) (sums over t)

B(j;k) = Sum c(j;t) / Sum c(i;t) where the sum at the numerator ranges over the T such that O(t) = V(k)

Marco Corvi - Page hosted by geocities.com.