Hi Richard


We've done some full text information extraction work on archaeology grey literature, generating SKOS and CIDOC CRM expressions of the annotations produced. Andreas Vlachidis (now at UCL) PhD research fed into our STAR project. The pipeline is available and some key publications listed below.


We also investigated the extension of this work to other languages as part of the FP7 ARIADNE project on a broad theme relating to wooden material. An exploratory case study investigated semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages - expressing outcome as linked data (based on CIDOC CRM and Getty AAT). The main focus was to explore the technical feasibility of the semantic integration (the NLP are experimental prototype pipelines). A working web application prototype is available via  http://ariadne-lod.isti.cnr.it/description.html  - queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and some associative relationships.



Vlachidis A, Tudhope D. 2015. A knowledge-based approach to Information Extraction for semantic interoperability in the archaeology domain. Journal of the Association for Information Science and Technology, 67 (5), 11381152, Wiley. https://doi.org/10.1002/asi.23485  Evaluation of outcomes


Vlachidis A, Tudhope D. 2015. Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation. Program: electronic library and information systems, 49(2), pp. 118 134, Emerald.

https://doi.org/10.1108/PROG-10-2014-0076 currently freely available. Negation detetction


and see Andreas Phd work portal http://andronikos.co.uk/







Douglas Tudhope

Professor, Faculty of Computing, Engineering and Science

University of South Wales

Pontypridd CF37 1DL

Wales, UK


Tel +44 (0) 1443-483609

[log in to unmask]


Director: Computing and Mathematics Research Institute

Editor : The New Review of Hypermedia and Multimedia


From: Museums Computer Group [[log in to unmask]] on behalf of Richard Light [[log in to unmask]]
Sent: 08 September 2017 10:39
To: [log in to unmask]
Subject: [MCG] Automatic information extraction for cultural heritage


Can anyone point me to working examples where information is automatically extracted (or enhanced) from cultural heritage-related resources?  This might be feature recognition in images; picking out named entities from full text; converting string-value structured data to Linked Data URLs; voice recognition for audio/video; etc.



Richard Light

**************************************************************** website: http://museumscomputergroup.org.uk/ Twitter: http://www.twitter.com/ukmcg Facebook: http://www.facebook.com/museumscomputergroup [un]subscribe: http://museumscomputergroup.org.uk/email-list/ ****************************************************************