I need a simple POS tagger (preferably freeware) for a modest corpus of
contemporary American Poetry (total corpus is about 1,500,000 words, but
the samples are mostly under 100,000 words, and I would be happy with a
program that could handle even only much smaller samples, say 10,000 words.
I am mainly interested in noun and verb statistics, and do not need to
process the tags further or to use the tagged text in any other way.
Basically, I want to determine the percentage of the text tokens that
are in the various word classes.
I'm working with a fairly robust Windows XP Professional computer, and
would prefer something that won't take a lot of extra
installation/configuration work.
I have done some research, but there are so many choices it is difficult
to know where to start.
Any favorites?
Thanks,
David Hoover
-- David L. Hoover, Director of Undergraduate Studies & Webmaster NYU English Department, 212-998-8832 http://www.nyu.edu/gsas/dept/english/"If you pick up a starving dog and make him prosperous, he will not bite you. This is the principal difference between a dog and a man." -- Mark Twain, Pudd'nhead Wilson's Calendar
This archive was generated by hypermail 2b29 : Mon Oct 18 2004 - 20:20:13 MET DST