Hi Alec, For "Early 3rd Century BC" I would have probably said "300BC - 275BC" with going all the way to 250 BC being the "First Half" or something similar. But I don't know if there really is a "correct" answer. Joe On 06/11/13 14:35, Alec Turner wrote: > This is a subject close to my own heart as I wrote and maintain the > date translation software used within MuseumIndex+, etc. You're right > about variation in the meaning of circa, not just across different > systems, but when applied to different date spans (e.g. in > MuseumIndex+ it's default meaning, which can also be tweaked per > customer, can be different when applied to months, years, decades and > centuries). Ultimately we've always allowed access to the generated > "earliest" and "latest" date values so these can be adjusted manually > if the builtin rules get it wrong. I'd be curious to know what people > would ideally like their systems to do. > > Finally, I couldn't resist testing out your "Early 3rd Century BC" > example - MuseumIndex+ interprets this as "299 BC - 250 BC". Are we in > the right ballpark? (because I'd be happy to try and improve on this > if people think we're not). > > ... Alec Turner > > On 04/11/2013 14:03, Joseph Padfield wrote: >> Hi, >> >> I had a little look at this a few years ago while experimenting with the >> conversion of free text dates to semantic searchable dates. I was >> working with the questions: Given the language used within the TMS >> DisplayDate field; which paintings/artists do we want someone to find >> when they run a date based search for a particular year or even a range >> of years? >> >> I ended up using a lot of regular expressions in Perl to create an >> internally consistent display text field and then used a set of simple >> rules to indicate what date range the display date referred to. As I >> said it was a few years ago for an in-house R&D project, but if it is >> useful you can see some of the details at: >> http://research.ng-london.org.uk/wiki/index.php/National_Gallery_Display_Date_Descriptors >> >> >> >> These ranges where just an example and the actual date range used could >> be different for different systems, ideally though you just need to add >> the description of your logic into the "help" information. >> >> Joe >> >> PS the one I always like was what years are meant by: "Early 3rd Century >> BC" :-) >> >> On 04/11/13 11:54, David Croft wrote: >>> I've been working on this problem on an off for a while now, but from >>> the other side as it were. Trying to extract the dates that the record >>> author meant from what they actually wrote. >>> There are a LOT of different date formats out there and I've yet to >>> see a really good solution. >>> I'm coming at this problem from a software angle, trying to decode >>> dates automatically, so my desires for date formats may be different >>> to yours. >>> But I really, really, really wish that date information was stated >>> explicitly and consistently. >>> >>> Plenty of collections use modifiers like 'circa', 'early' or 'first >>> half', but then don't use these consistently. >>> In one record 'late 20th century' means 1950 to 2000, in another place >>> it will mean 1975 to 2000. >>> These sort of date modifiers never seem to get explicitly defined for >>> the collection which means that what one collection means by 'circa' >>> is different to what another collection means. >>> The modifiers also mean different things to different dates, 'circa >>> 1950' may mean 1945 to 1955 but is `circa 1950s' 1950 to 1959 or 1945 >>> to 1965? >>> There are lots of records with dates like '80s' where you just have to >>> assume the century information or '1940-50s' where you assume it means >>> 1940 to 1959. >>> >>> So for me, the best way is just to provide the upper and lower bounds >>> for date period in full, i.e. not `circa 1955' but instead `1950/1/1 >>> to 1959/12/31`. >>> Or if that's not possible, define exactly what you mean by 'circa', >>> 'late', 'early' etc and make that information available where anyone >>> looking at your records can see it. >>> For example, are you going to use the word 'circa'? or just put a 'c' >>> on the front of the date i.e. 'c1950'? >>> If there are two dates in a field does the circa apply to just the >>> first one or both? i.e. is 'circa 1950 to 1960' the same as 'circa >>> 1950 to circa 1960'? >>> If you are saying 'circa 19th century' do you mean up to 25 years >>> either side? 50 years? 75? >>> Software can decode any format you use as long as we know what the >>> rules are. >>> >>> P.S >>> There are some truly interesting date fields out there and I've been >>> keeping a list as part of my really tricky testing data. >>> Some of my favourites are '25 feb ?', 'circa pre world war two', >>> 'early or late 19th or 20 century' and 'c18-1 to c--01?' >>> >>> David >>> >>> **************************************************************** >>> website: http://museumscomputergroup.org.uk/ >>> Twitter: http://www.twitter.com/ukmcg >>> Facebook: http://www.facebook.com/museumscomputergroup >>> [un]subscribe: http://museumscomputergroup.org.uk/email-list/ >>> **************************************************************** >>> . >>> >> > > . > -- *Joseph Padfield* Conservation Scientist Scientific Department The National Gallery Trafalgar Square London WC2N 5DN 44 (0)20 7747 2553 http://research.ng-london.org.uk http://www.twitter.com/JoePadfield ---------------------------------------------------------------- Facing the Modern: The Portrait in Vienna 1900 9 October 2013 - 12 January 2014 Book now: http://www.nationalgallery.org.uk/whats-on/exhibitions/vienna Sign up for news, offers and exclusive competitions from the National Gallery: http://www.nationalgallery.org.uk/stay-in-touch **************************************************************** website: http://museumscomputergroup.org.uk/ Twitter: http://www.twitter.com/ukmcg Facebook: http://www.facebook.com/museumscomputergroup [un]subscribe: http://museumscomputergroup.org.uk/email-list/ ****************************************************************