-----Original Message-----
From: Mark Davies [mailto:[log in to unmask]]
Sent: 19 February 2003 18:56
To: [log in to unmask]
Subject: [Corpus del espaņol] 100 million words; wide range of searches

A new version of the 100 million word [Corpus del Espaņol] is now online.  This corpus has been created by Mark Davies of Illinois State University (with funding from the National Endowment for the Humanities), and is available for free access and use at http://www.corpusdelespanol.org/.

This searchable collection of more than 10,000 texts from the 1200s-1900s allows a wider range of searches than any other corpus of Spanish.  Users can search by:

--   synonyms [30,000 word sets]: e.g. what are the most common synonyms of [inteligente] or [rico]
--   collocations [what words occur most with others]: e.g. the most common adjectives with [
cara], the most common nouns that occur after [suave], or the most common verbs with [chistes]
--   frequency: e.g. what
new verbs have arisen since the 1800s, or what synonyms of [roto] are more common in written than in spoken Spanish
--   grammatical category: e.g. the most common infinitives occurring after [
imposible de], or the most common adjectives after [noche]
--   lemma [word forms]: e.g. the frequency of all of the forms of [decir] - in the
1200s, 1500s, or 1900s.
--   word patterns: e.g. word ending in [
-azo], or with [-camin-] anywhere in the word
--   user-defined lists: create your own lists (e.g. words related to
emotions or clothing), and then re-use them in subsequent searches
--   any combination of any of the previous searches (
example: all forms of all synonyms of [decir], followed by all forms of all synonyms of [chiste]. 

Please feel free to pass along this information to another other teachers or students who you think might be interested.
 
------------------------------------------------------------------------
 
P.S. In addition to the preceding information, I might make a note or two for the benefit of those on MEDIBER, who may be interested in using the corpus to investigate Old Spanish.  As explained at the site, there are tens of thousands of distinct words from older stages of Spanish that are annotated (part-of-speech and lemma), but the annotation is NOT complete for these older stages (obvious reason: one developer, 600,000+ distinct forms that are not in a Modern Spanish lexicon).  Also, there are a handful of texts from the 1200s-1400s (from the 20 million words for this period) that have unfortunately been modernized and which may slightly skew the results, but these will eventually be dropped from the corpus.  In spite of these limitations, the expectation is that researchers will still find the [Corpus del espaņol] to be useful for a wide range of searches that cannot presently be carried out with any other online corpus of historical Spanish.
 
=======================================
Mark Davies, Associate Professor, Spanish Linguistics
http://mdavies.for.ilstu.edu/
4300 Foreign Languages / Illinois State University
Normal, IL 61790-4300
309-438-7975 (voice) / 309-438-8038 (fax)
** Historical and dialectal Spanish and Portuguese syntax **
** Web-database scripting, design, and integration **
** Distance education ** Corpus design and use **
=======================================
 


**************************************************************************


Free exhibition at the British Library Galleries :


Magic Pencil : Children's Book Illustration Today (to 31 March) original graphic work of 13 contemporary artists


*************************************************************************


The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the [log in to unmask] : The contents of this e-mail must not be disclosed or copied without the sender's consent.


The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.


*************************************************************************