This is quite a long post - and essentially it is in support of a standard structured like the one Ed has suggested, so if you are already convinced there is no need to read it........ Nick Boldrini is right - Grey Lit. has a specific purpose and audience, however they are public documents and there is demonstrably wide interest in them outside the development control process including the public and academic audiences. If grey lit is the only documentary outcome of an archaeological intervention and it is not widely discoverable or available - contributing to a broader understanding of the past - then it's reasonable to ask what was the objective of the archaeological event in the first place? As we all know, there are tens of thousands of such reports, in paper form, un-indexed and pretty much inaccessible to the vast majority archaeologists. This is bad for all sectors of the profession and the public's/policy makers perception of it. For this reason thinking about how these documents (and the data they contain) are structured is important if this means they are more easily manageable in digital form in the future. The system as it is exists is not working well with regard to the dissemination of this material in paper form, not least because publication/dissemination is not adequately enshrined in the current planning legislation. OASIS allows the deposition of grey literature reports and their long term preservation, they are then discoverable via the Grey-Lit. library (whatever its current shortcomings, enhancements imminent). However, it is not the data in the report that is extracted via the OASIS forms it is the resource discovery meta-data for the document/event only. Our efforts to investigate natural language processing have specifically NOT been to replace this user generated metadata, but to investigate the feasibility of generating it for digitised versions of the many thousands of reports that already exist but do not have this meta associated with it (and also another forms of legacy literature btw). We did use NLP on a subset of the grey literature we archive so we could compare the results with the user 'human' generated meta for test purposes. The results of this were very interesting. Crispin rightly points out some of the challenges facing NLP and although the example Catherine and he used of 'Church lane' is relatively straightforward, there is no arguing that issues exist with the technique and for born digital material it would make little sense to use NLP when one would reasonably imagine that the document creator should do a better job at creating the meta data for the document/event than a machine. For interest and in response to Lief's point, we intend that grey lit in OASIS will be assigned DOI persistent identifiers via the DataCite project in collaboration with the British Library in the next twelve months. Getting back on topic, searches that might be useful to researchers whether, public, academic, LA or contractor go way beyond what can be expressed with resource discovery metadata. For example - it might be nice to find all events where a particular type of pottery exists in association with a particular type of faunal remain. Grey literature reports contain this information, and although they are structured or semi-structured documents they do not have a consistent structure across authorities, units or even directors. This means that this information is not easily extractable by a human researcher let alone some automated process. With the application of controlled vocabularies and a standard document structure this might actually be possible. Personally I don't think that a standard structure has to be particularly proscriptive or detailed, i.e. to the point where the entire document is essentially constructed from a series of forms, I think this would be counterproductive (and would no doubt receive a hostile reception), however a template that specifies logical document sections and what they should/must contain might be more feasible. In many ways I agree with Crispin that the data in these documents could be better handled by direct digital transfer and following from Nick's point about units not having access to data management tools this would have to be web based (ASPIRE was an attempt to at least capture more of this data in digital form). However, Grey Literature is very often the only place where field workers have any opportunity to engage in creating their own narrative of the site, both of the archaeological event and of the archaeological story of the site itself. I think it would be throwing the baby out with the bath water to concentrate solely on the data without continuing to offer highly skilled and experienced fieldworkers the opportunity to actually tell us what they think the data means, and this is probably not best done via a further series of data fields. Nick rightly points out that planning guidelines refer only to HERs, on the other hand they also (for the time being) refer to the notion of preservation by record - I would contend that neither the data on its own nor the interpretation on its own and certainly not an inaccessible handful of unstructured documents constitutes a meaningful 'record' of the event. So we need data, interpretation and accessibility to fulfil this notion (plus physical archive where appropriate of course). The fact that we have issues with this at all is due the priorities given to the outputs of fieldwork in the legislation, a pressured commercial environment as well as the difficulties of juggling priorities in Local Authority planning departments. It is also true that at least one template for minimally structured reports already exists (via the IfA), the problem we are likely to face is not one of agreeing a standard, or even in generating good will towards it as a concept, the problem is likely to be the more fundamental one of enforcement. Perhaps FAME is the organisation whose goodwill is critical, since for a whole range of very valid reasons no one can currently enforce adherence to a standard in practice not the HERs, not the NMRs and especially not the academy. A final point is that from an ADS perspective, where we have an interest in preserving grey literature and making it discoverable, searchable and accessible, the actual standard adopted is less important that the fact that there is a standard in existence and that it is adhered to (certainly true if we carry on trying to get NLP to work :-) Best wishes, Stuart -------------------------- Dr Stuart Jeffrey Archaeology Data Service Department of Archaeology University of York The King's Manor York, YO1 7EP, UK Tel: +44 1904 434990 Fax: +44 1904 433939 http://ads.ahds.ac.uk -------------------------- http://www.york.ac.uk/docs/disclaimer/email.htm --------------------------