Re: [Corpora-List] distribution of types of lexical collocations

From: Stefan Evert (evert@IMS.Uni-Stuttgart.DE)
Date: Tue Jan 04 2005 - 19:52:09 MET

  • Next message: Normand Peladeau: "Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection"

    Dear Philippa,

    sorry for this late reply, but I'm very interested in your research on
    adjective-noun collocations - not least since we've been doing some
    research on the extraction of German adjective-noun collocations
    (mostly for lexicography applications) here at the IMS.

    The answer to your question (and, in fact, the difficulty of giving an
    answer) depends very much on what exactly you mean by "collocations".

    If you understand a collocation as a lexicalised word combination (or
    some subtype thereof), then you will need manually compiled lists of
    true collocations in order to answer this question. Since I am not
    aware of any systematic and comprehensive collections of such a
    nature, I believe that it is nearly impossible to give a reliable
    answer.

    If you understand a collocation in a strict Neo-Firthian sense as a
    recurrent word combination, then it's merely a matter of counting the
    types of word combinations you're interested in on a given corpus
    (such as the BNC for English). The quality of the answer you get
    depends on the accuracy of the linguistic pre-processing and the
    methods you use to extract the word combinations. While adjective-noun
    combinations can easily be identified with high accuracy, other types
    of word combinations will be much more difficult, especially noun-verb
    combinations in German.

    Another necessary clarification concerns what you mean by the
    "distribution" of collocations. Are you referring to the number of
    types (how many adj-n collocations are there?), the number of tokens
    (how often do adj-n collocations occur in the corpus?), or to the
    distribution of type frequencies/probabilities in the corpus (how many
    low-frequency collocations are there? does their distribution follow
    Zipf's law or is it more balanced?)?

    If you don't mind my asking, I'm curious what kind of research
    questions you want to address with this frequency data.

    Kind regards, and a Happy New Year,
    Stefan

    > Dear members of the Corpora List,
    >
    > I am doing research on adjective-noun collocations and I wonder if there are
    > any reliable corpus data and numbers as to the distribution of this type of
    > (English and/or German) collocations in contrast to (noun-noun /)
    > adjective-adverb / verb-adverb / noun-verb collocations.
    >
    > Thanks very much in advance.
    >
    > Philippa

    -- 
    ______________________________________________________________________
    Stefan Evert                                     purl.org/stefan.evert
    http://www.collocations.de/                             schtepf@gmx.de
    



    This archive was generated by hypermail 2b29 : Tue Jan 04 2005 - 20:03:56 MET