Hi Richard

From a technical point of view you might be interested in Europeana's media file checker which extracts metadata for size, quality, duration, colour palette etc for both images and time-bssed media
- http://www.preforma-project.eu/media-file-checker.html
- http://labs.europeana.eu/api/media-search

I've been playing around with the Google Vision api and especially face detection on a specific dataset of portraits in the IWM collections with some success. You can see example results at (this was never intended for a public audience and is just a demo of some queries I tried on the results).

A few practical examples of where we've been able to tease out interesting sets of images from this otherwise relatively impenetrable set of over 30,000 images, representing 16,000 objects:
- detection of multiple faces shows group portraits, which are relatively rare and often interesting (eg brothers who died)
- finding those wearing hats is useful for identifying military portraits
- we managed to identify a single image that we had a derivative print of but with no metadata; this was made possible by extracting a small set which had similar face size and positioning
- each image typically has a scan of the mounted portrait, one of the reverse, and a digital crop just showing the portrait, but these appear in random order on the record. Using face detection to find the 'largest face' and also the scan with no face allowed us to create metadata to set a sort order that meant the cropped, close-up digital derivatives were displayed first and the plain reverse images last (not yet in production but you can see the effect it has on the primary image for search results and object pages by comparing eg and

Happy to answer any questions

Cheers, James

From: Museums Computer Group [[log in to unmask]] on behalf of Richard Light [[log in to unmask]]
Sent: 08 September 2017 10:39
To: [log in to unmask]
Subject: Automatic information extraction for cultural heritage


Can anyone point me to working examples where information is automatically extracted (or enhanced) from cultural heritage-related resources? This might be feature recognition in images; picking out named entities from full text; converting string-value structured data to Linked Data URLs; voice recognition for audio/video; etc.



Richard Light
**************************************************************** website: http://museumscomputergroup.org.uk/<http://museumscomputergroup.org.uk/> Twitter: http://www.twitter.com/ukmcg<http://www.twitter.com/ukmcg> Facebook: http://www.facebook.com/museumscomputergroup<http://www.facebook.com/museumscomputergroup> [un]subscribe: http://museumscomputergroup.org.uk/email-list/<http://museumscomputergroup.org.uk/email-list/> ****************************************************************

http://www.iwm.org.uk/exhibitions/iwm-north/wyndham-lewis-life-art-war?utm_campaign=wyndham-lewis&utm_source=iwm.org.uk&utm_medium=email&utm_content=20160906_signature[log in to unmask]" >

WYNDHAM LEWIS Life Art War | 23 June 2017 - 1 January 2018 | Book Now **************************************************************** website: http://museumscomputergroup.org.uk/ Twitter: http://www.twitter.com/ukmcg Facebook: http://www.facebook.com/museumscomputergroup [un]subscribe: http://museumscomputergroup.org.uk/email-list/ ****************************************************************