Hi Nick
Thinking about when I've done similar things in the past, to be honest I suspect that doing this 'manually' will be the only reliable method.
Basically you need to assess the likelihood that this pile of paper actually contains anything useful, and is not just a rehash of something that is already contained within your record. Then if it does appear to do this, you also need to evaluate whether the gazetteer is sufficiently reliable/authoritative information to be used to add to your record. Many such reworkings of existing data have very high error rates (because they were generally retyped with no validation procedures). It helps if you know who created it when, how, and for what purpose.
To do the first part, just work through a sample of your gazetteer (perhaps 10%) searching record by record for a match in the SMR. Go to the grid ref stated, and review any monuments in the vicinity, and decide whether your gazetteer entry contributes significantly to the record, either because it is completely unrecorded or because it contains new information. Record whether each entry does or does not, then assess the overall value.
If the answer is that you feel that the gazetteer contains sufficient new information to be worthwhile, and if it sufficiently authoritative, then chuck it on the 7 foot backlog pile; if not, chuck it in the bin/archive ;-)

A completely different, possibly complementary, approach would be to create the grid references for the whole gazetteer as you suggest, giving each a unique id and page number within the gaz, add the Gazetteer as a Source within HBSMR, then use MapInfo to create a cross-reference between each gazetteer point and any monument record within a certain distance of it, and add these into HBSMR. (If you have additional info to narrow down the matching then you could use this, but I doubt if such information will be in a suitable form in the paper gazetteer.) This will end up with loads of monuments being cross-referenced to the gazetteer, with a page number/id. Thus anyone consulting the record can then quickly check out the gazetteer whenever necessary, on the off chance that it contains useful info. The downside is that this will be an irrelevant (and potentially misleading) cross-reference for many monument records, and this annoys users, but you can always delete cross-refs when found to be unhelpful.


An interesting one this NIck,

The assumption is being made so far that two entries are likely to relate to
the same site if the grid references are (approximately) the same. A useful
additional approach might be to check that they relate to the same (or
similar) period or type of site as well. This assumes that your original
gazetteer is a consistent list of e.g. lithic scatters or Roman villas. Not
so much use if your gazetteer is just of 'sites in North Xshire'.

In effect there are two tasks here: getting the gazetteer data into a
digital form (which can then be worked on by computer), then the actual work
to concord that data with your own SMR. The second task is a common one in
all sorts of data exchange scenarios. e.g. it would be the same if you were
dealing with imported digital Defence of Britain project data, or PAS data,
LBS data etc.

What we could do with as a community is a general purpose tool for
concording records from different sources based on their similarities
(geospatial, subject matter, reference numbers etc). Perhaps there are
parallels with the techniques used to analyse the simillarities between
artefacts in a collection? It would be a useful adjunct to the various data
exchange initiatives that are our current concerns.

Possibly a task for fish.technical

English Heritage DSU

> Yes, I'd expand that suggestion a bit more and suggest you
> run a buffer
> search on each NGR to accommodate any variance between
> unmatched NGRs of
> what might be the same site in your system.  The trick is of
> course getting
> the NGRs in digital format in the first place: probably best
> done manually
> without messing around with scanning and OCR.
> > any ideas on a good way to approach the following problem would be
> > appreciated.
> >
> > Over the eccentric development of our SMR I have been left
> with a number
> > of paper gazetteers of sites, without being able to easily
> tell if these
> > have/have not been digitised.
> >
> > I have a pile of them lurking beside me and need to think
> of a way of
> > dealing with them.
> >
> > The obvious way is to pick a random sample of them, bash in
> the NGR and
> > see if there is anything similar to the Gazetteer entry on
> the digital
> > SMR. This is not a task I relish having better things to do
> with my time
> > (such as watching paint dry....), and would take a long time.
> >
> > So I am wondering if anyone can think of a better/quicker way to do
> > this??
> >
> > I have thought of scanning the Gazetteers in, converting
> the NGR's into
> > Mapinfo table and running a find unmatched type query to see if any
> > aren't overlapped by a monument. For those that are, I would pick a
> > random sample to check that it is not another monument that
> by chance
> > overlays them, but the one I would expect from the Gazetteer.
> >
> > Hopefully by doing this we'll be able to get an idea of
> whether we need
> > to go through one by one to check each record, or can be
> confident that
> > the data has been put in.
> >
> > Does anyone have any experience of trying anything like
> this (scanning
> > etc) or can think of a better way?? Also, what would be a
> representative
> > random sample to use to get a good idea of how many records
> were/weren't
> > digitised (whichever method I use), or that the overlaying
> monuments are
> > in fact the same ones?? 2% ?? 5%?? More ?? Less??
> >
> > Any ideas appreciated.
> >
