Stopword lists are really only justifiable by specific research interests. The default should always be no stopwords, with the ability of the user to implement any stopword list they wish, as long as it is readily discoverable. I'm not sure that there is any point in trying to develop a reasonable consensus about stopword lists since research interests will vary so greatly and unpredictably. With any modern inverted index full-text database speed should not be a consideration across all of digitized classical literature. 



On Fri, Jan 26, 2018 at 11:57 AM, Aurélien Berra <[log in to unmask]> wrote:

I'm not sure I see your point here, Maurizio. We probably agree that there is no ideal stoplist. The lists should be corpus-based, implementing a statistical threshold (with or without a shared static core), and iterative, in relation to successive interests. Obviously, in an environment where the user cannot choose or update the stoplist, the default list can be designed in various ways. And techniques like phrase search introduce other approaches.

Cari saluti,

Aurélien


On 26 Jan 2018, at 14:21, maurizio lana <[log in to unmask]> wrote:

so my next question arises: can one practically define/individuate the set of stopwords for own text(s)?