[Corpora-List] Legal aspects of corpora compiling

From: Adam Kilgarriff (adam.kilgarriff@itri.brighton.ac.uk)
Date: Tue Oct 01 2002 - 10:19:39 MET DST

  • Next message: Ramesh Krishnamurthy: "[Corpora-List] Corpus size for lexicography"

    On 25 Sept Rafal L. Górski asked:
    >
    > Does anybody know about research on legal aspects of corpora compiling
    > (copyright restrictions).

    A short answer:

    to be unequivocally, completely, totally in the clear you need to get
    copyright clearance from all copyright holders (publishers and/or
    authors, all speakers for spoken material). Some will give it to
    you, others won't, and it is a lot of work to gather. (I attended a
    rather nice talk on BNC copyright issues titled "Ladies love lupins".
    Sometimes, the only way to get the copyright clearance sought was to
    take the lady concerned a bunch of flowers.)

    HOWEVER

    the law is in its infancy and there is very little which is obviously
    right or wrong/legal or illegal. If you have an enemy with rich
    enough lawyers, you will always be found in the wrong (cf Napster -
    when you're up against the music business it's apparently illegal even
    to tell someone where they might find something) so it's
    pointless viewing the law as a set of rules. Rather, you have to
    avoid doing things which someone who is rich and inclined to sue is
    going to view as provocative.

    Considerations:

    1) PUBLISHING

    the issue is heavier if you are going to publish/ copy on the data
    than if you are not. If it's only for in-house use, then one simple
    issue is "who will ever know", and it is not clear that, eg,
    downloading a report onto you PC's desktop is any different to
    downloading it into a corpus. Copyright law is in general about the
    case where someone makes money from selling intellectual property: if
    you are going to sell a corpus, the issues need taking very seriously,
    as people will be upset by you making money out of selling their text
    (unless you give them a share).

    2) EXTRACT SIZE

    the issue is heavier, the larger the extracts you take. There is a
    traditional exemption from copyright for short extracts, so eg you can
    take brief quotes, eg in a review or academic book, without asking
    permission. There are different opinions about how much you can
    quote. If you are quoting a short poem, you couldn't quote it all on
    the grounds that there weren't many words, so the definition of 'short'
    has to do with 'as a proportion of the whole' as well as absolute
    length. As a general principle, keep extracts short. (In one project,
    we used "3000 words or one third of the document, whichever is the
    shorter")

    3) BE COOPERATIVE

    avoid including anything where there is an explicit reason not to.
    In the context of the web, 'no robots' convention allows
    authors to say they don't want their page to be viewed by
    robots. One should also read this as "keep off" from the point of
    view of corpus compilation. Some literary authors are notoriously
    litiginous.

    COURSE: Michael Rundell and I are teaching a short course on

        "Corpus Design and Use"
         =====================

    which will cover legal issues and also

    = size, balance, "representativeness"
    = text formats, data capture
    = text type information
    = corpus query programs
    = methods and measures for comparing corpora

    from linguistic and lexicographic perspectives, in Brighton, England,
    Mon 2nd to Thurs 5th December 2002. Bookings open now!

    http://www.itri.brighton.ac.uk/lexicom

    Adam

    -- 
    NEW!! MSc and Short Courses in Lexical Computing and Lexicography
    Info at
    

    http://www.itri.brighton.ac.uk/lexicom

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Adam Kilgarriff Senior Research Fellow tel: (44) 1273 642919 Information Technology Research Institute (44) 1273 642900 University of Brighton fax: (44) 1273 642908 Lewes Road Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



    This archive was generated by hypermail 2b29 : Tue Oct 01 2002 - 10:28:57 MET DST