HIDDEN MARKOV MODELS

Hidden Markov Models are used to describe systems whose state is not directly observable. Let N be the number of states of the system, and

S1, S2, ..., SN

its states. Le M be the number of observables on the systems,

V1, V2, ..., VM

 

We introduce the probabilities

A(i,j) = P[s(t+1) = Sj | s(t) = Si]
B(j;k) = P[v(t) = Vk | s(t) = Si]
C(j) = P[v(t) = Sj]

Three problems are addressed in the model:

  1. given a model L{A,B,C} and observations at times 1,2,...,T, O(1), ..., O(T), how can we compute P[O | L] ?
  2. given the model L and the observations at times 1,2,...,T, how do we choose the best states s(1), ..., s(T) ?
  3. how do we adjust the model L to maximize P[O | L] ?


First problem

Let S = {s(1), ..., s(T)} then

P[O | S, L] = Prodt P[O(t) | s(t), L] = Prodt B(s(t);O(t))

P[S | L] = C(s(1)) A(s(1),s(2) A(s(2),s(3)) ... A(s(T-1),s(T))

P[O | L] = SumS P[O | S,L] P[S | L] = Sums(1)...s(T) C(s(1)) B(s(1);O(1)) A(s(1),s(2)) B(s(2);O(2)) ... A(s(T-1),s(T)) B(s(t);O(T))

 

To solve this problem we introduce the forward-backward variables,

a(i;t) = P[ O(1) ... O(t), s(t)=Si | L]

for which we have

a(i;1) = C(i) B(i; O(1))
a(j;t+1) = Sumi a(i;t) A(i,j) B(j; O(t+1))

P[O | L] = Sumi a(i;T)

b(i;t) = P[O(t+1) ... O(T) | s=Si L]
b(i;T) = 1

b(i; t) = Sumj A(i,j) B(j; O(t+1)) b(j, t+1)

P[O | L] = Sumj C(j) b(j; 1)


Second problem

There are several possible optimization criteria. Let

c(i;t) = P[s(t)=Si | O, L] = a(i;t) b(i;t) / P[O | L] = a(i;t) b(i;t) / Sumj a(j;t) b(j;t)

and such that Sumi c(i;t) = 1

The most likely state at time t is the index i that maximizes c(i;t).

Viterbi algorithm (for the best state sequence)

Let

d(i;t) = maxs(1)...s(t-1) P[ s(1) ... s(t-1), s(t)=Si, O(1) ... O(t) | L]

d(j;t+1) = maxi d(i;t) A(i,j) B(j;O(t+1))
f(j;t+1) = argmaxi d(i;t) A(i,j)

d(j;1) = C(j) B(j;O(1))
f(j;1) = 0

P* = maxj d(j;T)
s*(T) = argmaxj d(j;T)
s*(t) = f(s*(t+1);t+1)


Third problem

Let

z(i,j;t) = P[s(t)=Si, s(t+1)=Sj | O, L]
= a(i;t) A(i,j) B(j;O(t+1)) b(j;t+1) / P[O | L]
where the denominator is the sum over i,j of the terms at the numerator.

c(i;t) = Sumj z(i,j;t)

Then Sumt c(i;t), where the sum ranges from 1 to T-1, is the expected number of transitions from Si, and Sumt z(i,j;t), is the expected number of transitions from Si to Sj.

The best model parameters are

C(i) = c(i;1)

A(i,j) = Sum z(i,j;t) / Sum c(i;t) (sums over t)

B(j;k) = Sum c(j;t) / Sum c(i;t) where the sum at the numerator ranges over the T such that O(t) = V(k)



Marco Corvi - Page hosted by geocities.com.