Print

Print


This is quite a long post - and essentially it is in support of a standard
structured like the one Ed has suggested, so if you are already convinced
there is no need to read it........

Nick Boldrini is right - Grey Lit. has a specific purpose and audience,
however they are public documents and there is demonstrably wide interest in
them outside the development control process including the public and
academic audiences. If grey lit is the only documentary outcome of an
archaeological intervention and it is not widely discoverable or available -
contributing to a broader understanding of the past -  then it's reasonable
to ask what was the objective of the archaeological event in the first
place? As we all know, there are tens of thousands of such reports,  in
paper form, un-indexed and pretty much inaccessible to the vast majority
archaeologists. This is bad for all sectors of the profession and the
public's/policy makers perception of it. For this reason thinking about how
these documents (and the data they contain) are structured is important if
this means they are more easily manageable in digital form in the future.
The system as it is exists is not working well with regard to the
dissemination of this material in paper form, not least because
publication/dissemination is not adequately enshrined in the current
planning legislation.

OASIS allows the deposition of grey literature reports and their long term
preservation, they are then discoverable via the Grey-Lit. library (whatever
its current shortcomings, enhancements imminent). However, it is not the
data in the report that is extracted via the OASIS forms it is the resource
discovery meta-data for the document/event only. Our efforts to investigate
natural language processing have specifically NOT been to replace this user
generated metadata, but to investigate the feasibility of generating it for
digitised versions of the many thousands of reports that already exist but
do not have this meta associated with it (and also another forms of legacy
literature btw). We did use NLP on a subset of the grey literature we
archive so we could compare the results with the user 'human' generated meta
for test purposes. The results of this were very interesting. Crispin
rightly points out some of the challenges facing NLP and although the
example Catherine and he used of 'Church lane' is relatively
straightforward, there is no arguing that issues exist with the technique
and for born digital material it would make little sense to use NLP when one
would reasonably imagine that the document creator should do a better job at
creating the meta data for the document/event than a machine. 

For interest and in response to Lief's point, we intend that grey lit in
OASIS will be assigned DOI persistent identifiers via the DataCite project
in collaboration with the British Library in the next twelve months.

Getting back on topic, searches that might be useful to researchers whether,
public, academic, LA or contractor go way beyond what can be expressed with
resource discovery metadata. For example - it might be nice to find all
events where a particular type of pottery exists in association with a
particular type of faunal remain. Grey literature reports contain this
information, and although they are structured  or semi-structured documents
they do not have a consistent structure across authorities, units or even
directors. This means that this information is not easily extractable by a
human researcher let alone some automated process. With the application of
controlled vocabularies and a standard document structure this might
actually be possible. Personally I don't think that a standard structure has
to be particularly proscriptive or detailed, i.e. to the point where the
entire document is essentially constructed from a series of forms, I think
this would be counterproductive (and would no doubt receive a hostile
reception), however a template that specifies logical document sections and
what they should/must contain might be more feasible. In many ways I agree
with Crispin that the data in these documents could be better handled by
direct digital transfer and following from Nick's point about units not
having access to data management tools this would have to be web based
(ASPIRE was an attempt to at least capture more of this data in digital
form). However, Grey Literature is very often the only place where field
workers have any opportunity to engage in creating their own narrative of
the site, both of the archaeological event and of the archaeological story
of the site itself. I think it would be throwing the baby out with the bath
water to concentrate solely on the data without continuing to offer highly
skilled and experienced fieldworkers the opportunity to actually tell us
what they think the data means, and this is probably not best done via a
further series of data fields. 


Nick rightly points out that planning guidelines refer only to HERs, on the
other hand they also (for the time being) refer to the notion of
preservation by record - I would contend that neither the data on its own
nor the interpretation on its own and certainly not an inaccessible handful
of unstructured documents constitutes a meaningful 'record' of the event. So
we need data, interpretation and accessibility to fulfil this notion (plus
physical archive where appropriate of course). The fact that we have issues
with this at all is due the priorities given to the outputs of fieldwork in
the legislation, a pressured commercial environment as well as the
difficulties of juggling priorities in Local Authority planning departments.
It is also true that at least one template for minimally structured reports
already exists (via the IfA), the problem we are likely to face is not one
of agreeing a standard, or even in generating good will towards it as a
concept, the problem is likely to be the more fundamental one of
enforcement. Perhaps FAME is the organisation whose goodwill is critical,
since for a whole range of very valid reasons no one can currently enforce
adherence to a standard in practice not the HERs, not the NMRs and
especially not the academy.

A final point is that from an ADS perspective, where we have an interest in
preserving grey literature and making it discoverable, searchable and
accessible, the actual standard adopted is less important that the fact that
there is a standard in existence and that it is adhered to (certainly true
if we carry on trying to get NLP to work :-)

Best wishes,
Stuart

--------------------------
Dr Stuart Jeffrey

Archaeology Data Service       
Department of Archaeology              
University of York                             
The King's Manor
York, YO1 7EP, UK

Tel: +44 1904 434990
Fax: +44 1904 433939
http://ads.ahds.ac.uk
--------------------------
http://www.york.ac.uk/docs/disclaimer/email.htm
--------------------------