Print

Print


Hello,


I would say that in certain cases, some stop words lists of latin and greek words can be useful. I work on the Res Gestae Divi Augusti and I miss them too. I make some digital editions of the text, from Mommsen to Scheid, focusing on the primary sources and searching to enhance the historical features.

For an historical approach of this text, and of many others, it can be useful to easily get rid off all the "little words" such as "et", "ut" and many others, before making an automatic analysis of the content or other more complex representation.


Marion Lamé

PhD student,

University of Aix-en-Provence, France

University of Bologna, Italy


----- Message d'origine ----
De : Hugh Cayless <[log in to unmask]>
À : [log in to unmask]
Envoyé le : Lundi, 25 Août 2008, 4h11mn 13s
Objet : Re: [DIGITALCLASSICIST] Stopwords for Latin?

I don't know of one, and I wonder whether anyone's ever seen a need 
for one.  Stopwords can help as a sort of performance optimization in 
search engines with a restricted set of use cases, but once you get 
beyond a certain domain limit, they just aren't useful (you can search 
for 'a' on Google for an example of what I mean).  Philologists are 
often very interested in words that might get dropped by a stopword 
list.  I might want to find particular uses of 'et' for example, and 
be very irritated if the results told me I couldn't.

I've implemented search engines a few times now and honestly never had 
a use for stopwords in the end for any of them.  I sort of don't 
believe in them anymore...so my question would be: what's the use 
case, and do you really need one?

Hope this helps,
Hugh

On Aug 24, 2008, at 6:28 PM, Neven Jovanović wrote:

> Hello,
>
> does anybody know where could one look for a list of stop words for 
> Latin?
> I have seen an English stop words list on Perseus
> (http://www.perseus.tufts.edu/Texts/engstop.html), but have not been 
> able
> to find anything similar for Latin.  Yes, the Dartmouth Dante
> (http://dante.dartmouth.edu/help.php) mentions "stopword list" for 
> Latin,
> but does not make it available.
>
> It seems that such a list is something that always gets compiled from
> scratch.  Perhaps a version of it, made freely available, could be a
> welcome contribution to the Digital Classicist wiki.
>
> (Not to mention a Greek stop word list...)
>
> Yours,
>
> Neven Jovanovic
>
> Zagreb, Hrvatska / Croatia


Envoyé avec Yahoo! Mail.
Une boite mail plus intelligente.