Many thanks, Neven, for doing all this work. The other data set that needs somehow to be presented in a more accessible way is the wonderful work of the LASLA group from Liege, based on a reading of nearly 800,000 words, carefully hand-parsed:
L. Delatte, Et. Evrard, S. Govaerts and J. Denooz, Dictionnaire fréquentiel et Index inverse de la langue latine (Liège: Laboratoie d'Analyse Statistique des Langues Anciennes, 1981). The "LASLA" list is available in .pdf form (http://promethee.philo.ulg.ac.be/LASLApdf/Dictionnairefrequentiel.pdf) but not, so are as I am aware, in a spreadsheet.
The larger question you raise below is crucial, namely what are the purposes to which we should put such lists? My own view is that by far the best purpose is pedagogical. To the accomplished Latinist, knowledge of the relative frequency of, say, iam and nunc, might be of some passing interest; even moreso the relative frequency of words, especially synonyms, in poetry and prose (which is why both Diederich and LASLA make that distinction in their data).
But to the student at the beginning levels, knowledge of what words are most common is absolutely vital, and potentially transformative. Suddenly, vocabulary acquisition can be prioritized rationally, and the to route to confident reading significantly shortened.
How then, to put this knowledge in the brain of the student? Surely not by demanding sheer brute memorization of a list of 1,000 most common words? The Dickinson College Commentary site has a list of the top 1,000 Latin and top 500 Greek words, crafted based on the data you present as well as LASLA, but with dictionary forms and definitions suited to the beginning and intermediate levels. Our texts then gloss all words except those on the high frequency lists. The idea is that instructors can work gradually up to the core list, then start testing with sight reading, guaranteeing to the students that non-core words will be glossed. The incentive is thus to master the core.
Here's some fuller explanation, along with our edited lists: http://dcc.dickinson.edu/vocab/vocabulary-lists
All the best,
--Chris Francese
Christopher Francese
Professor of Classical Studies
Dickinson College
Carlisle, PA 17013
(717) 245-1202
[log in to unmask]
Date: Wed, 24 Oct 2012 18:58:25 +0200
From: "Neven Jovanović" <[log in to unmask]>
Subject: Three lists of Latin words data
Dear colleagues,
through my Google Drive you can access three Google Fusion Tables
containing data on Latin vocabulary. Two of them deal with word frequency
(based on G. Lodge and P. B. Diederich); the third has also data on
meaning and usage (compiled by late William Whitaker).
The addresses are:
Lodge -- Diederich (as compiled and published by J. H. Dee, cf. his article
<http://www.jstor.org/stable/3298278>)
<https://www.google.com/fusiontables/DataSource?docid=1VKDg7mVXn1EU00rPxPI-Bct3Yx4katvbmWNU5UI>
(5.498 records; Dee's page can be found through the Wayback Machine)
Diederich (as compiled on Hiberna Caroli Raetici,
<http://hiberna-cr.wikidot.com/>):
<https://www.google.com/fusiontables/DataSource?docid=1kcrVuaRV3xYoLvvT7qjHja6LD7s-RWclkVBOe1U>
(4.173 records)
Vocabulary of WORDS by Williama Whitaker
(<http://archives.nd.edu/whitaker/words.htm>):
<https://www.google.com/fusiontables/DataSource?docid=1cndzxWS7ynD7eroKhMnVAdQT3SLVC4ZcSvTpKUw>
(39.225 zapisa)
Having found the data on the internet and reformatted it, I share with you
the addresses for research purposes. Don't know exactly what these
purposes could be -- but an important step is to have something around for
experimentation.
Google Fusion Tables also have an API
(<https://developers.google.com/fusiontables/>). As dealing with APIs
surpasses my abilities, it would be wonderful if somebody felt the urge to
create some kind of interface to these Latin tables, and share the
interface with us (i. e. with me).
Also, if I'm doing what has already been done several times over, please
do inform me.
Best,
Neven
Neven Jovanovic
Department of Classical Philology
Faculty of Humanities and Social Sciences
University of Zagreb
Hrvatska / Croatia
------------------------------
End of DIGITALCLASSICIST Digest - 23 Oct 2012 to 24 Oct 2012 (#2012-129)
************************************************************************
|