Hi Nick
Units do use complex software tools for collecting and manipulating digital data, and usually have significant skills in this area. However what they lack is any standardised means of preparing the reports and gazetteers and piping the data back to the HERs and digital archives. So much of this data gets converted back into relatively useless document formats or paper... the grey lit. Equally there is no widely adopted standard way of taking in digital data from the HERs at project start, or tools for viewing/enhancing/analysing this data during the project. Each unit does its own thing here, and sometimes each project manager within each unit does their own thing. And they work with the capabilities (for providing/receiving digital data) of the HERs in their patch, which are also pretty variable (even among those who use HBSMR). This situation prevails from the smallest watching brief up to massive regional EH-funded projects worth £LOTS - I've rarely seen this dealt with efficiently, and many current large projects still seem to get stuck on this when it comes to transferring the results, undermining their stated purpose.
So perhaps rather than just saying this is a long way off, one of the outcomes of this discussion might be to agree that this situation can and should be improved for the benefit of all sides. We are discussing standards for grey lit, but (I feel) only because we see these as the way at getting the real data, which in turn is because we cannot get at that real data before it got put in a document format; so we could expand the debate to cover the entire means of transmission of data from fieldwork to HER/archive.
I'm not advocating a single imposed solution for all units (whether a desktop software package or a web site like Oasis), but a clear definition of expectations/standards of what data should go where and in what structure (from event spatial data through to summary synthesis texts), in order to achieve the intended outcomes of the fieldwork (which often involves destruction of the primary resource). Then useful software tools will be required to achieve these standards (on all sides), and such tools could also make life easier and more productive for the unit staff in other ways, but that's another matter (and I'm not saying this would be easy to achieve by the way).
As an aside, although it is an interesting topic, I'm finding computerised document language processing a bit of a red herring here. These datasets and documents all go through intensive expert human mediation and indexing, and the backlogs are really not that enormous. Do we need a computer to learn that "Church Lane" means a street and not a church, when HER staff already know that instantly, plus where to find it? And pushing an NLP approach might even undermine the role of us humans (oops now at serious risk of being labelled a luddite). Clarify... if a digital fieldwork dataset has an element for the event location, and that contains "Church Lane" - fine, let the computer import that into the right place, geocode it, and make it searchable* as locational info (once it has been validated by a human), but do we really need the computer to try to figure this out from raw waffle and get it wrong half the time? Isn't that only needed when you have a true mountain of digital documents with no humans to process them? Over-stretched HER officers might say "yes please" at this stage, I'm not saying don't give it a whirl, but I suspect this phase of having digital documents being the main transport mechanism for fieldwork data is going to be mercifully brief.
Anyway that's now more than enough from me,
yours
Crispin
* on semantic web, domestic web, desktop system etc.
________________________________
From: The Forum for Information Standards in Heritage (FISH) on behalf of Nick Boldrini
Sent: Wed 18/08/2010 13:30
To: [log in to unmask]
Subject: Re: [FISH] HEGEL - access and standards
I think Crispins comments are interesting, but this scenario is a way off. Not all units (especially the one man bands) have fancy data management tools, which undermines the vision of a brave new world somewhat. This is likely to become increasingly useful, though.
The problem is though, that the more diffused the data sources are, the harder it is going to be to ensure standardisation.
The idea of automating the indexing, with, VERY importantly, a human interface is a good one. However, leaving this to contractors may be problematic - experience from OASIS on how well this works would be very important to draw on e.g. how well Contractors use Thesauri etc and how comprehensively they index.
I also think Martin Locock hits the nail on the head - what are HER's and curators needs and how would a standard meet those needs?
Bearing in mind that PPS5 good practice is meant to apply to all archaeological endeavours (as I understand it) and not just DC work, and that PPS5 refers repeatedly to HER's (and notably not NMR's, ADS, OASIS etc etc) then it would seem to me fairly clear where the focus of finding out user needs should lie. And yes I am an HER officer so that helps guarantee my post, but it is also Government Policy.
-----Original Message-----
From: The Forum for Information Standards in Heritage (FISH) [mailto:[log in to unmask]] On Behalf Of Leif Isaksen
Sent: 18 August 2010 12:18
To: [log in to unmask]
Subject: Re: [FISH] HEGEL - access and standards
Hi all
I'm generally very much in agreement with Crispin's point here. I
suspect that the limitations of print-bound literature (space,
linearity, etc.) will see it ultimately replaced by more flexible
digital formats. General social trends suggest that it's likely to be
a question of how long this process will take rather than whether it
will happen. (For the horrified, no doubt a vestigial printed volume
will XSLT'd, printed and filed as well ;-) ). I'm not suggesting that
there will be no editorial process but its concerns may differ
significantly from those today.
At risk of moving to the technical however (apologies Ed, I know you
want to hold that discussion tomorrow) I'd recommend strongly that the
emphasis should be on making the grey-lit-slash-data directly
available, ideally as XML (albeit with server access restrictions
where appropriate). There will be a vital need for tools and
mechanisms which can index, parse, search, browse, visualize and
analyze what will inevitably become a digital mountain, but we should
try to avoid walled gardens that require specialist technical
knowledge or software to use them. Grey literature is valuable
precisely because anyone can engage with and understand it without
additional apparatus. This would also have the additional benefit of
making persistent HTTP identifiers (URIs) easier to introduce which
are more or less fundamental to making any of the ontology/semantic
approaches mentioned machine-readable and thus viable on a large
scale.
Best
Leif
On Wed, Aug 18, 2010 at 11:35 AM, Crispin Flower <[log in to unmask]> wrote:
> Hi all
> I agree with the Martin's comments and similar from other writers, and will forward some remarks I posted to Ed off-list yesterday, but with apologies that I've only had time to read a small proportion of the messages, so may be behind the curve.
> I'd ask whether the unpublished/able grey lit report is a useful thing at all here. Is it the correct target for this debate, or just a by-product of the process? Of course the report is necessary at the point in time of assessing and signing off a project, and it fulfils an essential purpose for producer/clients at that time, but for the medium and longer term, as the means of communicating the results of a project from those who undertook it (the contractor), to those who need the data both within and beyond the immediate casework scenario, it is rather inefficient. Perhaps instead of trying to promote the importance of this stuff with beefed-up technical standards etc, we could acknowledge how ephemeral it is, and find better ways of moving the real data around; we could aim for a scenario in which the grey lit thing can be dropped in the bin without loss, or perhaps retained only as part of the planning or project management history, because all the significant data it contained has been transmitted to the HER/NMR (or other accessible repository) in a more efficient manner (by digital transfer with human quality control and enhancement of indexing). We have in the UK a very strong network of organisations and professional staff positioned to do this essential human part, and this would work even better if the spadework could happen automatically, rather than them wasting time retyping stuff and piling up backlog. Then we achieve truly accessible data, without having to worry about the medium. And to see this from another angle, the grey lit report can be generated almost automatically from the tools the contractor is using to manage their data, as a glossy by-product that brings out the essentials for the primary consumers (e.g. planning archaeologists, EH project managers, etc).
> I do agree there must be standards governing what should be the output from fieldwork, and IfA is a good place for this particularly if it can truly encompass build heritage recording. But for making the primary data available where it's needed, I think it may be more useful to improve direct data transfer mechanisms between HERs and fieldworkers (in both directions). Incidentally, I don't know if anyone has mentioned the Scottish "ASPIRE" project, which aims to do precisely this. I'm note sure new standards are needed here, just new tools (after all the data content is all covered by MIDAS isn't it?).
> And then keep up the progress on getting all HERs online and cross-searchable (which has come on in leaps and bounds recently).
> Yours
> Crispin
>
>
>
>
> -----Original Message-----
> From: The Forum for Information Standards in Heritage (FISH) [mailto:[log in to unmask]] On Behalf Of Martin Locock
> Sent: 18 August 2010 10:00
> To: [log in to unmask]
> Subject: Re: [FISH] HEGEL - access and standards
>
> There has been some overlap in the discussions between metadata about
> grey literature (for cross-searching etc) and data: the bulk of GL
> contents is data, not metadata.
>
> For metadata we can fairly freely identify elements that might promote
> searchability and re-use, but for data, we must accept that the prime
> determinant of a project report contents will be the *project's* purpose
> not the *report's.*
>
> One concern I would have from the GLADE user comments is that they
> assume that searching a corpus of grey literature is the best way to
> find out about archaeological data. We should, I hope, recognise that
> this is a workaround arising from the ease with which GL can be added to
> OASIS. In the long term, the best way to find archaeological data
> should be by examining the structured, consistent and validated data
> sets comprising the HERs, online or not. If there is currently a
> problem that needs fixing, I would say the problem is that HERs have
> backlogs of published and unpublished sources which have not been
> analysed and added to the record, of which GL is only a subset, if the
> most visible. Therefore we should be looking to HERs to tell us what
> *they* find most troublesome about current GL reports.
>
>
> Martin
>
>
>
>
> --
> Martin Locock
> Rheolwr Cymorth y Project Project Support Manager
>
> Llyfrgell Genedlaethol Cymru National Library of Wales
> [log in to unmask] Ffôn / Phone 01970 632885
>
> Un o lyfrgelloedd mawr y byd One of the great libraries of the world
> http://www.llgc.org.uk/
>
Help protect our environment by only printing this email if absolutely necessary. The information it contains and any files transmitted with it are confidential and are only intended for the person or organisation to whom it is addressed. It may be unlawful for you to use, share or copy the information, if you are not authorised to do so. If you receive this email by mistake, please inform the person who sent it at the above address and then delete the email from your system. Durham County Council takes reasonable precautions to ensure that its emails are virus free. However, we do not accept responsibility for any losses incurred as a result of viruses we might transmit and recommend that you should use your own virus checking procedures.
|