I seem to have problems to understand what exactly makes a Markov
modell *hidden* in the context of part-of-speech tagging.
The problem centers on the following questions:
Was the Xerox tagger by Cutting, Kupiec and Sibun the first one to use
a *hidden* Markov model?
If so, what makes it more *hidden* than Church's and DeRose's taggers?
Is it the fact that it is trained on un-tagged data? What kind of
model is used if one uses parameters obtained from pre-tagged
corpora with this tagger? (You don't have to use Baum-Welch
re-estimation to use this tagger.) Is it no longer *hidden* then?
I have tried to trace this questions down in the literature but
couldn't find a definite answer. I am now looking forward to the
expertise in this list.
Helmut Feldweg