We've done some full text information extraction work on archaeology grey literature, generating SKOS and CIDOC CRM expressions of the annotations produced. Andreas Vlachidis (now at UCL) PhD research fed into our STAR project. The pipeline is available and some key publications listed below.


We also investigated the extension of this work to other languages as part of the FP7 ARIADNE project on a broad theme relating to wooden material. An exploratory case study investigated semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages - expressing outcome as linked data (based on CIDOC CRM and Getty AAT). The main focus was to explore the technical feasibility of the semantic integration (the NLP are experimental prototype pipelines). A working web application prototype is available via  http://ariadne-lod.isti.cnr.it/description.html  - queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and some associative relationships.



Vlachidis A, Tudhope D. 2015. A knowledge-based approach to Information Extraction for semantic interoperability in the archaeology domain. Journal of the Association for Information Science and Technology, 67 (5), 11381152, Wiley. https://doi.org/10.1002/asi.23485  Evaluation of outcomes


Vlachidis A, Tudhope D. 2015. Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation. Program: electronic library and information systems, 49(2), pp. 118 134, Emerald.

https://doi.org/10.1108/PROG-10-2014-0076 currently freely available. Negation detetction


and see Andreas Phd work portal http://andronikos.co.uk/







Can anyone point me to working examples where information is automatically extracted (or enhanced) from cultural heritage-related resources?  This might be feature recognition in images; picking out named entities from full text; converting string-value structured data to Linked Data URLs; voice recognition for audio/video; etc.



