Print

Print


Dear Aurélien,

The resources you have pointed to are a good starting point.  But these
stop-word lists presume that you are processing unlemmatized Latin, which I
personally find to be an approach of limited interest. If you are
generating usage statistics on lemmatized Latin, you obviously need to add
common words that appear in many inflected forms.  The lemmata I have found
necessary to add to the public lists you mention are these:

sum, possum, facio, do, dico, video, fero, facio, meus, tuus, suus, res,
ille, hic, ipse, qui, quis, venio, habeo, omnis, voco, inquam

I generated that list when looking at frequencies in a small subset of
Latin epic, so YMMV.

Best,

Peter

On 14 October 2017 at 15:31, Aurélien Berra <[log in to unmask]>
wrote:

> Dear all,
>
> When I became interested in stopwords a few years ago, I used and updated
> the lists on the Digital Classicist wiki page. I am now trying to suggest
> reasonable lists to be implemented in Voyant Tools. About a week ago, I
> opened an issue to "Add default stopwords for Greek and Latin". In the
> process I compared available lists (Perseus, CLTK and others) and tried to
> grasp on what principles such a non-specialised list should be based,
> although I am aware this is part of a broader discussion about the
> flexible, iterative use of stopwords in research.
>
> The discussion can be found there:
> https://github.com/sgsinclair/Voyant/issues/382
> https://github.com/aurelberra/stopwords/blob/master/
> elements_for_discussion.md
>
> I would be grateful for comments and advice.
>
> Best wishes,
>
> Aurélien
>