What makes a Markov model hidden?

Andre Kempe (Andre.Kempe@xerox.fr)
Thu, 29 Jun 1995 16:25:26 +0200

On April 24 or 25, 1995, Chris Manning sent a mail to the mailing list
"corpora@hd.uib.no", as a reaction to a previous mail by Helmut Feldweg
on the same list.

The subject of both mails was: "What makes a Markov model hidden?"

Regarding Manning's mail (appended), I have some remarks and 2 questions:

(1) He writes:

> taggers like Church's are mixed.
>
> For training on a tagged corpus, one can regard the outputs of such a
> tagger as a pair consisting of a markov model state (the tag) and a word.
> Thus it isn't an HMM because you can tell exactly which state the Markov
> model is in at what time.

Even if this is not an HMM since the states can be seen it also does not
fit the definition of a Markov Model in the sense of [1].

Here a Markov Model only contains:
- states that can be seen
- arcs labeled with state transition probabilities (A)
- initial state probabilities (Pi)

In opposition to HMMs it does not contain:
- observation symbols (words) different from but related to the states (tags)
- observation symbol probabilities (lexical probabilities; B)

The training of a tagger like Church's one, however, handles (necessarily)
the two last items.

(2) He writes:

> To the extent that training is the most important part, I think the former
> class should be regarded as markov model taggers and the second as HMM
> taggers, but in reality the first kind is mixed.

QUESTION 1:

I agree that taggers like Church's one are mixed, i.e. HMM + something
different, but when their training does not fit the definition of a Markov
Model [1], how can those taggers be regarded as markov model taggers ?

QUESTION 2:

Can the process that is done before tagging with taggers like Church's one
be called 'training'?

This process does not start from a Model with default (a priori)
probabilities (e.g. derived from biases like in the XEROX tagger) which
then are automatically refined by training on a corpus.
Instead of this, we simply count frequencies from the 'training' corpus
and calculate from them probabilities.
With those probabilities we construct the HMM which never is really
'trained'.

REFERENCE:

[1] L.R. Rabiner, 1990.
A Tutorial on Hidden Markov Models and Selected Applications
in Speech Recognition.
Readings in Speech Recognition (ed. A. Waibel, K.F. Lee).
Morgan Kaufmann Publishers, Inc. San Mateo, California.

-- Andre

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Andre Kempe
Rank Xerox Research Centre phone: +33 -76.61.50.71
6, chemin de Maupertuis fax: +33 -76.61.50.99
38240 Meylan, France email: andre.kempe@xerox.fr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

----- Begin Included Message -----

From: Chris Manning <manning@lcl.cmu.edu>
To: feldweg@sfs.nphil.uni-tuebingen.de (Helmut Feldweg)
Cc: corpora@hd.uib.no
Subject: What makes a Markov model hidden?
In-Reply-To: <9504241626.AA19233@bach.sfs.nphil.uni-tuebingen.de>

On 24 April 1995, Helmut Feldweg wondered about what exactly makes a Markov
model hidden.

A Markov model is hidden when you cannot determine the state sequence it
passed through on the basis of the outputs you observed. Classifying
taggers as HMMs or just markov models is slightly complicated because
taggers like Church's are mixed.

For training on a tagged corpus, one can regard the outputs of such a
tagger as a pair consisting of a markov model state (the tag) and a word.
Thus it isn't an HMM because you can tell exactly which state the Markov
model is in at what time. However, when you do tagging with the Viterbi
algorithm, you are giving the tagger only the word and asking it to tell
you what states the machine passed through, and so you are using the tagger
as an HMM.

Taggers like the Xerox tagger are true HMM taggers in the sense that the
training is also done via an HMM -- the tagger sees only the output words,
and has to guess which part of speech sequence the HMM is moving through.

To the extent that training is the most important part, I think the former
class should be regarded as markov model taggers and the second as HMM
taggers, but in reality the first kind is mixed.

> Was the Xerox tagger by Cutting, Kupiec and Sibun the first one to use
> a *hidden* Markov model?
> If so, what makes it more *hidden* than Church's and DeRose's taggers?

The Xerox tagger is more hidden than Church's and DeRose's taggers in the
sense mentioned above. I'm not an expert on the historiography, but it
wasn't the first HMM tagger -- just the first publicly available one. The
Cutting, Kupiec, Pedersen and Sibun tagger was a reimplementation of an
earlier tagger by Kupiec at Xerox. But separately an HMM tagger was
described by G. F. Foster (1991) -- Masters Thesis, McGill, and Merialdo's
tagger (1990) was a true HMM tagger, although the situation was even more
mixed in this case in that initial parameter estimation was done in markov
model mode on tagged text, with reestimation then being done in HMM mode on
untagged text.

Chris

----- End Included Message -----