This can be expressed in several different ways. One fairly natural
way is to use a scaled sum of the chi-squared score and average mutual
information. Another method is to use average mutual information with
an error bound which is determined by \alpha/2N where \alpha is
the chi-squared cutoff for the desired significance. This uses the
relationship
\chi^2 ~ 2 N MI = generalized log-likelihood ratio
Here MI is *average* mutual information,
MI = H(X,Y) - H(X) - H(Y) = \sum_ij \pi_ij log \pi_ij/\mu_ij
where \pi_ij = k_ij / N and \mu_ij = (k_ij / k_i*) (k_ij / k_*j).
In fact, the value \phi mentioned by Mr. Demetriou is just square root
of half of the mutual information.
>>>>> "gd" == George Demetriou <g.demetriou@dcs.shef.ac.uk> writes:
gd> George C. Canavos (1984), Applied Probability and Statistical
gd> Methods, Little Brown & Co.
...
gd> "However, it can be shown that for extremely large sample
gd> sizes, it is almost certain to reject the null hypothesis
gd> because one would not be able to specify H0 close enough to
gd> the true distribution. Thus the application of chi-square is
gd> questionable when extremely large sample sizes are involved."
..
gd> As a remedy, several statistics books propose the (not widely
gd> used) phi coefficient which compensates for the sample size:
gd> phi=square_root(chi-square/N) (N=sample size)