Print

Print


 
-----Original Message-----
From: Mark Davies [mailto:[log in to unmask]]
Sent: 19 February 2003 18:56
To: [log in to unmask]
Subject: [Corpus del espaņol] 100 million words; wide range of searches



A new version of the 100 million word [Corpus del Espaņol] is now online.
This corpus has been created by Mark Davies <http://mdavies.for.ilstu.edu/>
of Illinois State University <http://www.ilstu.edu/>  (with funding from the
<http://www.neh.gov/> National Endowment for the Humanities), and is
available for free access and use at  <http://www.corpusdelespanol.org/>
http://www.corpusdelespanol.org/. 

This searchable collection of more than 10,000 texts from the 1200s-1900s
allows a wider range of searches than any other corpus of Spanish.  Users
can search by:

--   synonyms [30,000 word sets]: e.g. what are the most common synonyms of
[  <http://www.corpusdelespanol.org/?ex=1> inteligente] or [
<http://www.corpusdelespanol.org/?ex=2> rico]
--   collocations [what words occur most with others]: e.g. the most common
adjectives with [  <http://www.corpusdelespanol.org/?ex=3> cara], the most
common nouns that occur after [  <http://www.corpusdelespanol.org/?ex=4>
suave], or the most common verbs with [
<http://www.corpusdelespanol.org/?ex=5> chistes]
--   frequency: e.g. what  <http://www.corpusdelespanol.org/?ex=6> new verbs
have arisen since the 1800s, or what synonyms of [
<http://www.corpusdelespanol.org/?ex=7> roto] are more common in written
than in spoken Spanish
--   grammatical category: e.g. the most common infinitives occurring after
[  <http://www.corpusdelespanol.org/?ex=8> imposible de], or the most common
adjectives after [  <http://www.corpusdelespanol.org/?ex=9> noche]
--   lemma [word forms]: e.g. the frequency of all of the forms of [decir] -
in the  <http://www.corpusdelespanol.org/?ex=10a> 1200s,
<http://www.corpusdelespanol.org/?ex=10b> 1500s, or
<http://www.corpusdelespanol.org/?ex=10c> 1900s.
--   word patterns: e.g. word ending in [
<http://www.corpusdelespanol.org/?ex=11> -azo], or with [
<http://www.corpusdelespanol.org/?ex=12> -camin-] anywhere in the word
--   user-defined lists: create your own lists (e.g. words related to
<http://www.corpusdelespanol.org/?ex=13> emotions or
<http://www.corpusdelespanol.org/?ex=14> clothing), and then re-use them in
subsequent searches
--   any combination of any of the previous searches (
<http://www.corpusdelespanol.org/?ex=15> example: all forms of all synonyms
of [decir], followed by all forms of all synonyms of [chiste].  

Please feel free to pass along this information to another other teachers or
students who you think might be interested.
 
------------------------------------------------------------------------
 
P.S. In addition to the preceding information, I might make a note or two
for the benefit of those on MEDIBER, who may be interested in using the
corpus to investigate Old Spanish.  As explained at the site, there are tens
of thousands of distinct words from older stages of Spanish that are
annotated (part-of-speech and lemma), but the annotation is NOT complete for
these older stages (obvious reason: one developer, 600,000+ distinct forms
that are not in a Modern Spanish lexicon).  Also, there are a handful of
texts from the 1200s-1400s (from the 20 million words for this period) that
have unfortunately been modernized and which may slightly skew the results,
but these will eventually be dropped from the corpus.  In spite of these
limitations, the expectation is that researchers will still find the [Corpus
del espaņol] to be useful for a wide range of searches that cannot presently
be carried out with any other online corpus of historical Spanish.
 
=======================================
Mark Davies, Associate Professor, Spanish Linguistics
http://mdavies.for.ilstu.edu/ <http://mdavies.for.ilstu.edu/> 
4300 Foreign Languages / Illinois State University
Normal, IL 61790-4300
309-438-7975 (voice) / 309-438-8038 (fax)
** Historical and dialectal Spanish and Portuguese syntax **
** Web-database scripting, design, and integration **
** Distance education ** Corpus design and use **
=======================================
 


**************************************************************************

Free exhibition at the British Library Galleries : 

Magic Pencil : Children's Book Illustration Today (to 31 March) original
graphic work of 13 contemporary artists 

*************************************************************************

The information contained in this e-mail is confidential and may be legally
privileged. It is intended for the addressee(s) only. If you are not the
intended recipient, please delete this e-mail and notify the
[log in to unmask] : The contents of this e-mail must not be disclosed or
copied without the sender's consent. 

The statements and opinions expressed in this message are those of the
author and do not necessarily reflect those of the British Library. The
British Library does not take any responsibility for the views of the
author. 

*************************************************************************