I thought that members might find it useful to see the following which
was on a recent GENBRIT file. Jan Crowther
Background
==========
Ever since the PRO announced their plans for digitisation and Internet
access for the 1901 census of England and Wales, there has been a lively
debate in this news group and across the country in other forums. This
debate has focused on 2 main issues, access and quality - issues that
are inevitably linked.
On 28 Jan 2000 I had a meeting with Iain Watt (Head of the Reader
Information Services Department) and David Annal who is closely involved
in census access projects at the FRC (Family Records Centre). The
objective of the meeting from my side was to raise these issues face to
face with them, to see if a way forward could be found. From the PRO's
side they were keen to discuss these issues, and to build public
confidence in what they see as an important project.
For the record I have sent a copy of this message to the PRO in advance
for them to correct any factual inaccuracies before I publish it, but
the opinions expressed are my own.
The remainder of this message is a summary of these discussions.
News group/Internet focus
=========================
The staff at the PRO read this newsgroup <soc.genealogy.britain> as well
as all the messages sent to them at <[log in to unmask]>. They are
clearly stung by the criticisms of the project that have been voiced
here and want to find ways of working with the Family History community
towards mutually satisfactory conclusions. I got the impression that
this is a sincere commitment on their side. This has to be good news for
the future of the project and is an endorsement of the value of all the
many expressions of concern that have been aired in the news group.
Quality issues
==============
The initial concern that many of us felt was that the prison workforce
would be insufficiently motivated, skilled, experienced and trained to
become effective transcribers. To compound this there was a fear that
insufficient checking for adequate quality (where quality is defined as
fitness for purpose) would be done in QC (Quality Control) mode,
resulting in an unusable index. Couple that with the restricted access
to the microfiche facsimile of the 1901 census data (compared to the
similar data for the 1891 census say), and you have a recipe for
disaster. I was also, and remain, concerned about the focus on QC rather
than QA (Quality Assurance), and stressed the need for independent
assessment of quality in addition to internal QC. I was not given full
details of the transcription process, mainly I suspect because its final
aspects are yet to be settled. The process in outline as I understand it
will be:
1) Two independent prison inmates will transcribe a section of a CEB
(Census Enumerator's Book) into a database file.
2) The two transcripts will be compared by a program to produce a
difference file that will be used by an assessor to produce a master
file for that section. DERA will also carry out checks (as described in
previous press releases) on quality, supervised by an experienced family
historian.
3) The PRO will take a sample of the master files from each section
transcribed and compare it to the original CEB. If it falls below the
defined quality level (I am not yet sure what that will be), the entire
section will be rejected and returned for reworking by different
transcribers. This process (steps 1-3) will loop until the PRO's samples
meet the quality standard.
4) The PRO have now added an additional QC stage that will be
independent of themselves and DERA. They will approach Family History
groups such as the FFHS (Federation of Family History Societies), SoG
(Society of Genealogists) and BALH (British Association for Local
History) and have offered their representatives, under the terms of a
suitable non-disclosure/release agreement, the opportunity to measure
transcription accuracy on a sample of their own selection. This new move
has to be welcomed.
I was given further information on transcriber recruitment. The PRO
estimate that less than 1% of the prison population will be involved.
These people should be to a large extent self-selecting as the more
intelligent and interested part of the prison population. Prison Service
staff responsible for the checking process will be trained by the PRO.
The final output of the data transcribing and checking is aimed to be a
set of database tables that will be searchable via the Internet. The
concern I repeated was that if the data sample quality meets the PRO's
standards in 3) and 4) above, yet still gives too long a list of search
results, we still have a problem. Perhaps a repetition of my previous
example here will explain this.
My great grandfather was John William SMITH (1860-1912). If he follows
the pattern of his previous entries he will appear as John SMITH. I
suspect that he could have been in either Bradford YKS or London MDX in
1901, and his precise whereabouts is of interest to me as he was in the
process of separating from my great grandmother at this time. If I were
to be restricted to a free search for John SMITH in London or John SMITH
in Bradford, I'd be presented with a hit list of >1000 for the first and
probably around 200 for the second. How would I be able to find the
right one even if the index was 100% accurate?
Free searches / Paid searches?
==============================
The original PRO suggestion was for a 3 tier approach to Internet page
access. The lowest level would be a free search on restricted numbers of
fields, the second level would be a paid for search on probably any
field at 50p a search, and the final level would be a copy of the
facsimile CEB page for around 80p found by links from either of the two
lower level searches. There was no clear definition of what we would be
able to get from a free Internet search. The PRO had at one time
suggested that it would include as a minimum the person's name and
census place, and some of us had feared the worst that this would be all
we'd get when looking for our John SMITHs. The PRO's position is that,
what we will get for a free search should be set at the level that
enables us to identify the people we want to find, and that from that
free search it should be possible for us to go directly to the facsimile
page of the CEB. I suggested therefore that there really was no need for
an intermediate paid search, if this free search was to be organised in
this way.
So, on this issue of what will be accessible to us in a free search on
the Internet, the PRO is still open to receive feedback on what we as
family historians need. Here is an opportunity for all of us,
individually and more importantly collectively, to influence the PRO's
final decision on this. My opening shot on this is that as family
historians we'd most like to be able to search on combinations of name
(both parts), age, occupation, census place and birthplace, with the
ability to use either wildcards or some kind of synonym searching
(similar to the LDS 1881 CDs).
Personally I think it's unlikely that we'll get every field returned
from a free search, but again the PRO would like to know what would be
the minimum useful set of fields to be returned by a free search. My
opening shot was that we would need relationship to head and name of
head in addition to the search field data itself.
Sending feedback on this question to the PRO
============================================
Any of us is free to send our views to the <[log in to unmask]>
address at any time. However, the PRO would particularly welcome the
majority opinions of larger groups and societies as well. If you are a
member of a local society, can I ask that you take a copy of this
message and copy it to members of your society so that they can consider
the issues and respond on behalf of that society to the PRO? Similarly
for this news group, I think it would be useful if we set up a series of
options that participants could vote for, so that the PRO could be sent
the balance of opinion over time. There'll undoubtedly be more of this
in the near future.
Pilot project
=============
The PRO plans to send out further information on the whole 1901 project
in the next few days, both on their website and to those of us who have
registered with them to receive updates by e-mail. (Let them know on
<[log in to unmask]> if you'd like to receive this). One of the
important things they'll be announcing is that a whole county (Norfolk)
of the 1891 census will be transcribed, checked, quality controlled,
indexed and placed on the Internet for search. The aim of this is that
the county will be of a reasonable size and representative of both rural
and industrial/urban areas. Those with Norfolk ancestors will be
ecstatic to learn that they'll soon have census indexes that are
computer searchable for 1851 and 1881 (LDS CD-ROMs) and 1891 and 1901
(PRO online indexes).
When this whole county 1891 data is up on the Internet, it will serve as
a testbed for the kind of free searches and charged facsimiles that will
be possible from the whole country 1901 census. The target is to have
this county up and running early in 2001, so that there will be ample
opportunity for us all to give the PRO feedback on what works and what
does not. They have then committed to analysing this feedback and coming
up with a free search/charged facsimile system that will give us what we
want as family/local historians. I've little doubt from my discussions
that this is a sincere proposal from the PRO and a practical way
forward. I hope at the end of the day that their view of what is needed
and the Family History communities' views coincide, but I'll reserve
judgment on that till we see the pilot project in action.
Overall though I think this is a useful step forward, and one we should
all support with evaluation and feedback a year from now.
Access issues
=============
One of the biggest issues in the minds of many family historians is the
right of access to microfiche copies of the 1901 census. The statements
from the PRO so far show that they were prepared to put a single copy of
the whole thing on public access at one PRO site (probably Kew), and
would supply CROs (County Record Offices) and other archives with copies
(sold) of their local areas. I repeated the view that I think they
should provide sold copies to any bona fide library or research group
that wanted to buy copies from them, since this was the established
precedent in the case of the 1891 census microfiche, and anything less
would be a disenfranchisement of those family historians who do not or
cannot use the Internet.
I'm afraid the PRO has made no commitment to do this as yet, although
the positive side is that they will be willing to listen to the
arguments for it from individuals and groups (see feedback above). The
argument I put to them is that the Family History community can be
divided into two camps that have little overlap: those who will use the
Internet, and those who cannot/will not. If there genuinely is little
overlap between these two groups, what have the PRO to lose in providing
fiche to the latter? The fiche will be purchased just as they are for
the 1891 census.
The PRO does not, I think, disbelieve this argument, it's just that it
seems to me that it is not high on their agenda. If we want it to become
so, we'll have to let them know with cogent arguments and expressions of
the numbers of us that think so. I got the impression, although this was
not stated, that the PRO is so focused on getting the Internet access
right first time, that this was either on the back burner or ignored. I
put it to them that if they did not have the resources to make and
distribute fiche copies, then another independent group could be found
who could do this (SoG, FFHS, etc).
CD-ROM indexes?
===============
Similarly many of us have felt that it would be useful and desirable to
have CD-ROM versions of the indexes with similar search capabilities to
those that will be finally settled for the free Internet searches.
The PRO indicated that this is by no means impossible and that they will
be in discussion with DERA to see if this is practical and desirable.
The CD-ROM would replicate the free element of the Internet service.
Again, it's up to us as individual and collective family historians to
let the PRO know our views on this (see feeback above).
Personally I feel that CD-ROM indexes (that would need to be purchased
just as the LDS 1881 ones are) are a great boon. They allow one to sit
and think and use quality time in doing difficult searches - something
that is often difficult when the telephone clock is ticking online.
Public consultation events
==========================
In the next announcement that the PRO will be making on their website
and via e-mail, there will be a list of public family history events at
which the PRO will be attending with the specific aim of soliciting
discussion and feedback on the proposals. They have chosen venues that
are in all parts of the country as the initial meetings had more of a
London and SE bias. If you get the chance to go along to one of those,
I'd urge you to do so. Your views are important and the PRO need to hear
them.
Summary
=======
On the positive side the PRO have made moves towards improving the QC
side of the transcription process, and will be providing us with a real
testbed of a whole county 1891 census pilot project on which to evaluate
it. They have also expressed a willingness to engage in dialogue with
the Family History community to agree what kind of free searches will be
ideal for finding people and families.
On the negative side the questions of access to microfiche copies of the
1901 census have still not been addressed.
The PRO have indicated that they will be soliciting and evaluating
further feedback, and I therefore feel it's up to us individually and
collectively to provide that to them so that together we can get it
right. Yes folks, I do feel that these are people we can work together
with rather than faceless bureaucrats, but it's up to us to play our
part and contribute too.
I apologise if there were other issues that some of you may have liked
to see raised, but there simply wasn't time to go through everything. If
you'd like to follow these yourselves with the PRO, the ball is in your
court.
--
Barney Tyrwhitt-Drake
Drake Software web site: http://www.tdrake.demon.co.uk
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|