The link to Lingua::Stem should be
http://search.cpan.org/~snowhare/Lingua-Stem-0.61/
(its a duplicate link to Algorithm::NaiveBayes right now)
I don't think trying to create mutually-exclusive categories for your articles is likely to be hugely successful. Your content is very interrelated and the categorisations you chose might not be the most natural ones.
A better approach might be to more explicitly attempt to segment the data into the "emergent categories", rather than the categories you guess you have, and assign labels to the categories afterwards.
|