Certainty of a unique permutation in a corpus

James K. Tauber (jtauber@tartarus.uwa.edu.au)
Thu, 28 Sep 1995 10:16:27 +0800 (WST)

I have a question for people familiar with hypothesis testing in statistics.

Imagine a tagged corpus of a particular language with N words each tagged
with one of |S| tags
from a set S. If need be, let the relative frequency of each tag in the
corpus be p(s) for an s in S.

Now say that I have a hypothesis about the ordering of words in a
particular construction in the language. For example, say I believe that
between a and d (members of S) the tags b and c (also members of S) only
ever appear in one order. ie [a b c d] is fine *[a c b d] is not.

If a count on the corpus reveals that my proposed ordering appears n
times and that competing orders never appear, how certain (as a function
of N, n, p and whatever else) can I be that my hypothesis is correct?

James K. Tauber <jtauber@tartarus.uwa.edu.au> currently at ALS 95
University Computing Services and Centre for Linguistics
University of Western Australia, Perth, AUSTRALIA
http://www.uwa.edu.au/student/jtauber finger for PGP key