RE: [Corpora-List] Grammar checker for English

From: Deane, Paul (pdeane@ets.org)
Date: Thu Apr 14 2005 - 14:54:11 MET DST

  • Next message: Antti Arppe: "RE: [Corpora-List] Grammar checker for English(?)"

    The Educational Testing Service has a product, Criterion, which uses n-gram
    techniques to identify common types of grammatical errors. A good reference
    is:

    Chodorow, Martin and Claudia Leacock. (2000). An unsupervised method for
    detecting grammatical errors. In Proceedings of the 1st Annual Meeting of
    the North American Chapter of the Association for Computational Linguistics,
    140-147.

    -----Original Message-----
    From: Mike Maxwell [mailto:maxwell@ldc.upenn.edu]
    Sent: Wednesday, April 13, 2005 10:39 PM
    To: Corrin Lakeland
    Cc: D.G.Damle; CORPORA@HD.UIB.NO
    Subject: Re: [Corpora-List] Grammar checker for English

    Corrin Lakeland wrote:
    > On Tue, 12 Apr 2005 23:32, you wrote:
    >
    >> Does anyone have a technique or tool for checking the
    >> grammatical correctness of a sentence?
    >>
    >> A full parser would be computationally too expensive,
    >> so is there a computationally cheap method for this?
    >
    > I do not know of any systems which check if a sentence is
    > well-formed without parsing it, although it is
    > theoretically possible to do. However, there are many
    > parsers that are quite efficient.
    >
    > ... I'm sure there is lots of other work in the field.

    (I didn't see the original msg for some reason, but I'm
    assuming it was posted to Corpora-List, hence a reply is
    appropriate.)

    Like Corrin, I don't know of any work done on testing
    well-formedness without parsing. (Unlike him, I have a hard
    time imagining how that would work--I suppose you could do
    some sort of n-gram tests, but there would be no guarantee
    that there wouldn't be an error at n+1, or for that matter
    that back-off didn't lead to problems at larger n. But
    maybe I just lack imagination :-).)

    At any rate, there is a considerable amount of work done on
    parsing _restricted_ English, with the intention of finding
    ungrammatical sentences where the standard of grammaticality
    is precisely some computational grammar. One domain where
    this has been used is in aircraft manuals, which must be
    read by technicians who do not have English as their first
    language. As I understand it, the version of simplified
    English used in these manuals is restricted both as to its
    vocabulary and its grammar. (I'm not sure how compound
    nouns are treated, maybe there's just a limit on nesting.)

    One of the simplified-English-for-aircraft checkers was done
    by Boeing. I wrote most of the original grammar rules back
    in the mid-1980s, without the intent of restricting it, so
    that it covered all the constructions we could come up with
    (from both generative grammars and descriptive texts like
    Quark, Greenbaum, Svartvik and Leach (sp?), plus testing
    against various text corpora). I believe that after I left
    in 1987, and the restricted English application came up,
    many of the rules were removed so as to accept only the
    desired restricted language. Phil Harrison wrote the
    original parser in Lisp; I am told it was re-written in C
    (or C++?) for speed, and that after the re-write its speed
    was adequate for checking large manuals. (That was in the
    late 1980s or early 1990s. Moore's Law has, I would
    imagine, made its speed more adequate since then :-).)

    -- 
    	Mike Maxwell
    	Linguistic Data Consortium
    	maxwell@ldc.upenn.edu
    

    ************************************************************************** This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited. Thank you for your compliance.



    This archive was generated by hypermail 2b29 : Thu Apr 14 2005 - 17:08:31 MET DST