RE: Corpora: when does a subcorpus become a corpus

From: Pearson, Jennifer (J.Pearson@unesco.org)
Date: Fri Jan 04 2002 - 09:54:27 MET

  • Next message: Sampo Nevalainen: "RE: Corpora: when does a subcorpus become a corpus"

    If you look at the same publication, p.48, you will find that I argue that,
    given Sinclair's definitions, neither the term subcorpus nor the term
    component is appropriate for the sets of texts I was working with (and
    probably not for the EAP texts referred to in previous e-mails either). I
    chose therefore to use the term special purpose corpus, "a corpus whose
    composition is determined by the precise purpose for which it is to be used.
    While a special purpose corpus may be derived from a general reference
    corpus or from a monitor corpus it will not constitute a subcorpus in the
    sense defined by Sinclair because it will not have all of the properties of
    a larger corpus." I coined this particular term for two reasons, a) because
    the language of the texts I was working with could be classified as
    'language for special purposes' or 'LSP', two terms that already existed in
    applied linguistics to designate, for example, the language of business, the
    language of medicine, the language of economics, and b) because the term
    'special purpose corpus' implies that the corpus has been compiled for a
    particular purpose.
    Wishing you all a happy new year
    Jennifer

    Dr Jennifer Pearson
    Chief of Translation
    UNESCO
    7 Place de Fontenoy
    75352 Paris 07
    Tel:. 00 33 1 456 80 780
    e-mail: j.pearson@unesco.org
    http://www.unesco.org

    -----Original Message-----
    From: Sampo Nevalainen [mailto:samponev@cc.joensuu.fi]
    Sent: jeu. 3 janvier 2002 10:36
    To: P. Kaszubski; corpora@hd.uib.no
    Subject: Re: Corpora: when does a subcorpus become a corpus

    Here is a short citation from Jennifer Pearson's "Terms in Context"
    (Amsterdam 1998), p. 45:

    --
    Sinclair, who states that corpora can be divided into subcorpora, and that 
    corpora and subcorpora can be divided into components, defines a subcorpus 
    as having "all the properties of a corpus but happens to be part of a 
    larger corpus" (1994a:4). Thus, a subcorpus must have all the properties of 
    a larger corpus. We understand this to mean that it is representative of 
    the larger corpus. A component, on the other hand, according to Sinclair, 
    illustrates a particular type of language and is selected "according to a 
    set of linguistic criteria that serve to characterize its linguistic 
    homogeneity" (Sinclair 1994a:4). It differs from a subcorpus in that it is 
    not intended to be representative of the corpus from which it is drawn and 
    is therefore not necessarily an adequate sample of a language.
    --
    

    I did not go back to Sinclair ("Corpus Typology: A Framework for Classification", EAGLES 1994), but according to Pearson, "a subcorpus must have all the properties of a larger corpus", thus being representative of the larger corpus. Another question is how this can be achieved, although, it is, obviously, safer to state that a subcorpus is representative of the larger corpus, than argue that the larger corpus (and, consequently, the subcorpus) is representative of a language (or genre etc.). Anyways, using the terms defined above (without intention to agree fully with Pearson), the set of EAP texts detached from the BNC would probably be called a "component" rather than a "subcorpus". Personally I would like to call a "subcorpus" ANY corpus detached from another corpus - despite its content or composition. Whatever a set of texts is called, the question of representativeness remains. Here I agree with Ute Roemer, who wrote: "The important question in this context is 'What do you want to do with the (sub)corpus?'"

    sincerely, Sampo

    Ps. Please regard this as a note from a person who tends to consider the notion of "representative of a language" as an oxymoron, a "mission impossible".

    ( : ============================================= : )

    Sampo Nevalainen, M.A. Researcher University of Joensuu Savonlinna School of Translation Studies P.O.Box 48 FIN-57101 Savonlinna FINLAND

    tel +358-15-511 70 (operator) +358-15-511 7704 fax +358-15-515 096 email samponev@cc.joensuu.fi http://www.joensuu.fi/slnkvl/



    This archive was generated by hypermail 2b29 : Fri Jan 04 2002 - 12:59:12 MET