RE: [Corpora-List] rated words

From: Ute Römer (ute.roemer@anglistik.uni-hannover.de)
Date: Sun Apr 17 2005 - 17:55:25 MET DST

  • Next message: Joerg Tiedemann: "Re: [Corpora-List] On-line concordancer for the European Constitution"

    Dear all,

    Here is the article Peet has just mentioned (available at
    http://www.newscientist.com/article.ns?id=dn7210&feedId=online-news_rss20).
    This Sentiment software sounds interesting --- though it's perhaps not
    'safe' for linguistic research.

    Best wishes... Ute

    __________________________________

    Software agents give out PR advice
    10:00 02 April 2005
    Exclusive from New Scientist Print Edition
    Duncan Graham-Rowe
    Governments and big business like to indulge in media spin, and that means
    knowing what is being said about them. But finding out is becoming ever more
    difficult, with thousands of news outlets, websites and blogs to monitor.

    Now a British company is about to launch a software program that can
    automatically gauge the tone of any electronic document. It can tell whether
    a newspaper article is reporting a political party’s policy in a positive or
    negative light, for instance, or whether an online review is praising a
    product or damning it. Welcome to the automation of PR.

    Till now, discovering whether the coverage you are getting is good or bad,
    negative or neutral has usually meant hiring a “reputation management” firm.
    Teams of people employed by the company will read through everything written
    about a chosen organisation, person, event or issue and report back on how
    favourable it is.

    As well as being expensive, this can be a long, slow process, says Nick
    Jacobi, director of research for the Corpora Software company in Surrey, UK.
    “There’s a massive information overload.” A single news agency may churn out
    more than eight articles each hour. That is almost 200 stories a day per
    news outlet.

    Machine learning
    Previous attempts to automate this kind of analysis have used one of two
    techniques. In the first, called machine learning, a program is trained by
    being given thousands of articles already determined by a human reader to be
    positive or negative in tone.

    But learning in this way can lead to mistakes. For example, if a series of
    the training articles mentions bomb attacks on a mosque in Iraq, the program
    may incorrectly conclude that all other mentions of mosques are negative
    too.

    The alternative is the lexicon approach, in which certain words are
    classified as either positive or negative. But plenty of words can be both.
    “The plot was unpredictable” and “the steering was unpredictable” differ by
    just one word. Yet the word “unpredictable” has a positive connotation in
    the first example and a negative meaning in the second.

    And even if that problem is solved, just picking up on positive or negative
    words can also lead to mistakes, as is demonstrated by the sentence:
    “Everyone told me it was terrible, that I would hate it, but in the end it
    wasn’t at all bad”.

    So Corpora has come up with a program called Sentiment, which uses
    algorithms to tease out grammatical components, such as nouns, verbs and
    adjectives, and identify the subjects and objects of verbs. It can even
    analyse pronouns like “it”, “he” and “her” to work out what words or
    concepts they are referring to.

    Having an understanding of grammatical structure makes it possible to filter
    out words that are not relevant to the sentiment of the article, Jacobi
    says. So instead of assuming certain words, such as “unpredictable” or
    “rubbish”, are positive or negative it allows the structural context to
    disambiguate them.

    Expert readers
    It does not get it right all the time, Jacobi admits, but then neither do
    people. Three expert readers are likely to agree about an article 85% of the
    time, and about 90% of non-experts will agree with this consensus. The
    Sentiment software agrees with the same expert consensus about 80% of the
    time.

    Sentiment was developed principally for Infonic, one of Corpora Software’s
    subsidiary companies, which provides clients with online media analysis of
    websites, chat rooms, bulletin boards and blogs. The company also hopes to
    use it to analyse the news for its clients.

    Sentiment will not take the humans out of the equation, says Orlando Plunket
    Greene of Infonic, because someone is still going to have to evaluate the
    software’s report on each article. But because the program will list items
    in terms of how positive, negative or neutral they are it is possible to
    skip to the most relevant items.

    “It will allow us to prioritise, and do the job much faster,” he says. While
    a person might be able to scan 10 articles an hour, Sentiment can zip
    through 10 a second.

    What makes this kind of analysis so challenging is that key words in a text
    often offer no clues as to what sentiment they carry. Some of the toughest
    challenges to comprehension, such as identifying irony and rhetoric, are
    likely to remain unsolved for some time.

    Related Articles
    Software learns to translate by reading up
    http://www.newscientist.com/article.ns?id=dn7054
    22 February 2005
    Machine learns games 'like a human'
    http://www.newscientist.com/article.ns?id=dn6914
    24 January 2005
    Voicemail software recognises callers' emotions
    http://www.newscientist.com/article.ns?id=dn6845
    11 January 2005
    Weblinks
    Corpora Software
    http://www.corporasoftware.com/
    Infonic
    http://www.infonic.com/cgi/index.php4

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    > Behalf Of peetm
    > Sent: Sunday, April 17, 2005 4:40 PM
    > To: 'Stephan Gillmeier'; corpora@uib.no
    > Subject: RE: [Corpora-List] rated words
    >
    > There's an interesting and semi-related article about this kind of thing
    > in
    > the 2nd April New Scientist: Software agents give out PR advice.
    >
    > If you've a subscription, you can find the full-text on New Scientist's
    > website of course.
    >
    > The aricle mentions a company Corpora Software
    > (http://www.corporasoftware.com/Sentiment.htm)
    >
    > peetm
    >



    This archive was generated by hypermail 2b29 : Sun Apr 17 2005 - 18:09:04 MET DST