Print

Print


Dear Peter, dear all,


  1.  no. There is no such distinction. B ("Base") is the collation of GGG dictionaries (with OLD as the main reference when conflicts happen);
  2.  See the value of the s_omo column in the "lessario" table. Homographs are marked there. There is no connection with the entry in dictionaries, because (a) homographs in Lemlat are fully identical under the morphological point of view and there is nothing to ditinguish them and (b) Lemlat is not yet explicitly linked with lexical entries. I say "yet" because this is one of the objectives of the ERC-Consolidator Grant I am currently the Principal Investigator of. We will build a Linked Open Data driven Knowledge Base whose components are resources and tools for Latin connected via standard ontologies and a dictionary for knowledge description. When this will be ready, there will be a connection between the entries of Lemlat and those in various lexical resources ...I hope we can make it!

All best,

Marco
________________________________
Da: Peter Heslin <[log in to unmask]>
Inviato: lunedì 3 settembre 2018 20:23:00
A: Passarotti Marco Carlo
Cc: The Digital Classicist List
Oggetto: Re: [DIGITALCLASSICIST] Announcing the extension of Lemlat with Du Cange Glossary

Thanks!  That's excellent news, on all three points.  I have a couple of quick follow-up questions, both of which have to do with linking back to the lexicon entries from the output of the morphological analysis tool. (If you have ever used Diogenes, you will know the functionality I am thinking of.)

  1.  Within the "base" lexicon, is it possible to distinguish between lemmas that come from the OLD and those that come from the other two older lexica?
  2.  When a word has multiple separate entries in a lexicon and those are numbered, do you preserve the number with the lemma, so that one can go directly back to the correct entry?

 P.

On Mon, 3 Sep 2018 at 17:29, Passarotti Marco Carlo <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Dear Peter, dear all,


thank you for your interest in Lemlat.

Here there are my answers to your questions:

  1.  we are currently working on making the source code available. More news (hopefully) soon;
  2.  this is a request that also other scholars have raised across the years. Your interpretation is correct. Lemlat strictly reflects its lexicographic sources. If a lemma is reported by more lexica, it is recorded as many times in Lemlat, too. For instance, sometimes it is the case that the same lemma has several different entries in the Du Cange glossary: these are all recorded in Lemlat. Anyway, we understand that, although this might be informative, it can also be confusing. So, we are planning to provide users with the possibility to skip duplicate analyses, more or less following the criterion you said;
  3.  Not yet. We just had not thought about it, because one can get this information simply by crossing the CSV output with the table named "lemmario" in the lemlat_db database (which features a column named "src" providing the labale of the source), using the value in the n_id column (reported also in the CSV output). Another criterion to select only those lemmas that come from one specific source in the output is the form of the n_ids: those n_ids whose first character is a lowercase letter are from the base lexicon (GGG), those whose first letter is a number are from the Onomasticon, those whose first letter is an uppercase D are from Du Cange. I understand that this is quite tricky and your comment is right. So, in the coming days, we will include the label of the source also in the CSV and in the XML output files.

All best,

Marco
________________________________
Da: Peter Heslin <[log in to unmask]<mailto:[log in to unmask]>>
Inviato: lunedì 3 settembre 2018 17:43:08
A: The Digital Classicist List
Cc: Passarotti Marco Carlo
Oggetto: Re: [DIGITALCLASSICIST] Announcing the extension of Lemlat with Du Cange Glossary

Dear Marco,

I played around with Lemlat a while ago and I was very impressed.  Thank you for distributing the code and databases under open access licenses.  I have s few questions that might be of general interest:

  1.  The Github repo contains the database and binaries of the command-line tool for Linux, Mac and Windows, but no source code. Is the source available somewhere?
  2.  When you use "all lexical bases", which is the default option, there are lots of duplicate analyses.  I think this may be because Lemlat does not do any pruning of the results when duplicate lemmata are found in multiple lexica.  Is that right?  If so, is that a feature you plan on adding?  It would be nice to be able to say: "give me all the lemmata in the OLD and only those in Du Cange which are not in the OLD".
  3.  Is there in the output any indication of which lexicon a given lemma was found in, to enable the user to track back to the relevant entry for that word?

Thanks again for making this fantastic resource available.

Peter

On Mon, 3 Sep 2018 at 15:33, Passarotti Marco Carlo <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Dear Members of the List,

we are proud to announce the recent enhancement of the lexical basis of Lemlat with the Du Cange Glossary.

Lemlat is a morphological analyser and lemmatiser of Latin provided with a large lexical basis including:
- the collation of three Latin dictionaries (Georges and Georges, 1913-1918; Glare, 1982; Gradenwitz, 1904): 43,432 lemmas [including also relations between lemmas based on derivational morphology];
- Onomasticon by Forcellini (1940): 26,250 lemmas;
- Glossarium Mediae et Infimae Latinitatis by Du Cange (1883-1887): 82,556 lemmas.

Enlarging the lexical basis of Lemlat with the Du Cange Glossary significantly increases its coverage of a wide span of Latin texts from different eras.

Information about Lemlat can be found at www.lemlat3.eu<http://www.lemlat3.eu>.
The database and binaries of Lemlat are available at https://github.com/CIRCSE/LEMLAT3

Enjoy Lemlat!
All best,

Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa
CIRCSE Research Centre (https://centridiricerca.unicatt.it/circse_index.html)
***********************************************************
Università Cattolica del Sacro Cuore
Largo Gemelli, 1
20123 Milan, Italy
[log in to unmask]<mailto:[log in to unmask]>
tel. +39-02-72342380


[http://Static.unicatt.it/layout/img/layout/5x1000.gif]
Destina il tuo 5 per mille all’Università Cattolica
CF 02133120150
www.unicatt.it/5permille<http://www.unicatt.it/5permille/>


________________________________

To unsubscribe from the DIGITALCLASSICIST list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=DIGITALCLASSICIST&A=1


[http://Static.unicatt.it/layout/img/layout/5x1000.gif]
Destina il tuo 5 per mille all’Università Cattolica
CF 02133120150
www.unicatt.it/5permille<http://www.unicatt.it/5permille/>



[http://Static.unicatt.it/layout/img/layout/5x1000.gif]
Destina il tuo 5 per mille all’Università Cattolica
CF 02133120150
www.unicatt.it/5permille<http://www.unicatt.it/5permille/>


########################################################################

To unsubscribe from the DIGITALCLASSICIST list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=DIGITALCLASSICIST&A=1