The range checks are a good idea but don't help with incorrectly entered
data that is within range. I was involved in a clinical trial where we
entered data from paper sources (Case Report Forms and free-text medical
records) into a database, and in line with Good Clinical Practice (ICH
GCP) we did double data entry for everything with 2 separate data
enterers, and used a compare program to highlight differences (you can
do this in Excel using Compare documents). Any discrepancies were
checked against the source data and amended by the original enterer with
a 3rd witness, who documented the audit trail. We then did a 5%
iterative check of the entered data against the source data i.e.
randomly selected 5% of records, dug out the paper documents and
compared them - this process continued until a full 5% check came up
clean (so potentially you could end up doing a 100% if you keep picking
up errors). This applied mainly to numeric data, which had to be an
exact match - free text entries that were flagged as discrepant were
only amended if the meaning was qualitatively different e.g. hypotensive
It sounds like a lot, but not as much work as trying to put something
right if you spot a problem later down the line, and we were mandated to
do it - you have to way up the effort in checking versus the
implications of getting it wrong.
Hope that helps,
On 09/Dec/2010 11:41, Mícheál wrote:
> Hello all,
> I have collected questionnaire data from about 500 people. For each
> person, there are about 400 data points. The whole data set has been
> entered into excel once, and a subset of 10% has been entered a second
> time by a different person. I've just compared the data sets for
> accuracy and 95% of the time the two data sets agree.
> So my question is, is this an acceptable level of accuracy?
> I could enter the data a second time and correct discrepancies against
> the paper copies, but its going to take 2/3 tedious weeks. Most of the
> items are part of longer scales so an odd mistakes is not going to
> change the final score much, but at the same time I don't want to miss
> anything interesting in my data!
> Any advice welcome.