Dear All,
Thanks very much to all who replied to my query, I have had numerous helpful suggestions, and also requests for further information. I have to confess to being somewhat naive in my thinking that what we had done was 'straightforward'!!
Our audit consisted of 2 data managers looking through 5 patients notes each, and filling out a questionnaire for each patient. They then checked the questionnaire data against that which was previously collected and entered into a database. So, I inadvertently misled everyone into thinking we had conducted an audit of 'typing-in' data! The reality is that we conducted an audit that combined data collecting accuracy with data entry accuracy - we found 3 'major' diagnostic errors, 4 'minor' errors, and 16 typos or inconsequential date / time errors. As one responder pointed out, it is the 'clinical relevance' of the errors that is important. On that basis, we had 3/504 serious errors (0.6%). The point of the audit was to set a benchmark (against external sources if possible), and also as an internal control for the future. In fact the audit was very informative, for although we did "achieve" an overall error-rate of 4.6% (and hence considerable room for improvement next time), it did highlight problems of 'trusting' clinicians summaries and the need for using primary sources of information.
Thanks once again - I've learnt a lot!!
Richard
Below are a couple of good papers and other contributions to the discussion...
The recent study below, goes into good detail with regards to the different sorts of errors / accuracy etc and provides a 'best' error-rate of 2.3%, "a number more consistent with previous literature reports."
J Am Board Fam Med. 2007 Mar-Apr;20(2):151-9.
The "Measuring Outcomes of Clinical Connectivity" (MOCC) trial: investigating data entry errors in the Electronic Primary Care Research Network (ePCRN).Fontaine P, Mendenhall TJ, Peterson K, Speedie SM.
Department of Family Medicine and Community Health, University of Minnesota Medical School, 925 Delaware Street Southeast, Minneapolis
INTRODUCTION: The electronic Primary Care Research Network (ePCRN) enrolled PBRN researchers in a feasibility trial to test the functionality of the network's electronic architecture and investigate error rates associated with two data entry strategies used in clinical trials. METHODS: PBRN physicians and research assistants who registered with the ePCRN were eligible to participate. After online consent and randomization, participants viewed simulated patient records, presented as either abstracted data (short form) or progress notes (long form). Participants transcribed 50 data elements onto electronic case report forms (CRFs) without integrated field restrictions. Data errors were analyzed. RESULTS: Ten geographically dispersed PBRNs enrolled 100 members and completed the study in less than 7 weeks. The estimated overall error rate if field restrictions had been applied was 2.3%. Participants entering data from the short form had a higher rate of correctly entered data fields (94.5% vs 90.8%, P = .004) and significantly more error-free records (P = .003). CONCLUSIONS: Feasibility outcomes integral to completion of an Internet-based, multisite study were successfully achieved. Further development of programmable electronic safeguards is indicated. The error analysis conducted in this study will aid design of specific field restrictions for electronic CRFs, an important component of clinical trial management systems.
Whilst for 'real' typing errors the study below sets a good benchmark,
Is double data entry necessary? The CHART trials. D Gibson et al MRC, Cambridge, England.
Abstract - There is some controversy over the need for double data entry in clinical trials. In particular, does the number and types of errors identified with this approach justify the extra effort involved? We report the results of a study carried out to address this question. Our main outcome measure was the frequency and types of errors involved in the entry of data for the CHART (continuous, hyperfractionated, accelerated radio-therapy) trials. Data were reentered for a sample of 44 patients by a data manager other than the one making the initial entry. The second entry was then compared with the first entry. The error rate for the two entries combined was 14 per 10,000 data items (fields) (95% confidence interval 10,19). The error rate for the initial entry alone was 15 per 10,000 fields (95% confidence interval 9.5, 22), and the vital/important error rate (defined as any error on a principal outcome measure or a major error on any other endpoint or variable) was 2.5 per 10,000 fields (95% confidence interval 0.68, 6.4). On this evidence double data entry is not performed for the CHART trials.
Some other contributions....
You might find some useful information in the subject area of genetic linkage mapping. I did some work on the impact of errors on linkage maps a few years ago, and plenty of other people have worked on this too. As far as I remember, a good starting point is KH Buetow 1991 Influence of aberrant observations on high-resolution linkage analysis outcomes, Am J Hum Genet 49: 985-994.
A thought - have you considered the possibility of monitoring the error rate at regular intervals and plotting the data on a Shewhart control chart? Surely the important thing is to improve rather than compare your performance with that of others?
I guess that your problem was due to the use of the key-word "typing error" (if you did that) that would lead to many articles in genetics. I rather searched, in scopus (www.scopus.com), among the available titles, the words "data entry error" and I ended up with the following references. I suppose you will have similar results searching on different engines.
Fontaine, P., Mendenhall, T.J., Peterson, K., Speedie, S.M.
Erratum: The 'Measuring Outcomes of Clinical Connectivity' (MOCC) trial: Investigating data entry errors in the Electronic Primary Care Research Network (ePCRN) (2007) Journal of the American Board of Family Medicine, 20 (4), p. 426.
Kaneko, H., Fujiwara, E. A Class of M-Ary Asymmetric Symbol Error Correcting Codes for Data Entry Devices (2004) IEEE Transactions on Computers, 53 (2), pp. 159-167. .
Kawado, M., Hinotsu, S., Matsuyama, Y., Yamaguchi, T., Hashimoto, S., Ohashi, Y.
A comparison of error detection rates between the reading aloud method and the double data entry method (2003) Controlled Clinical Trials, 24 (5), pp. 560-569. .
Kohler, H.-P., Rodgers, J.L. DF-analyses of heritability with double-entry twin data: Asymptotic standard errors and efficient estimation (2001) Behavior Genetics, 31 (2), pp. 179-191.
PDM lets equipment maker say good-bye to data-entry errors
(2001) Machine Design, 73 (10), p. 38.
"Sorry, I don't have a refference, but as far as I know 5% is pretty standard."
If you type 'keystroke error rates' or 'data entry error rate' into Google, you'll find a fair bit.
One important thing to note is that, for fairly obvious reasons, what is normally measured is 'keystroke error' rate - whereas it sounds as if you may be talking about error rate in terms of entry fields (most of which probably consist of several characters/keystrokes).
Single-entry keystroke error-rates are usually in the range 2% - 5%,
depending on the skill of the operators, the nature/quality/clarity of the data and to some extent the nature/quality of the user interface. If you are talking about keystroke error rate, then your 4.6% is therefore just about within the 'common range' - but if you are talking in terms of entry field errors, your keystroke error rate is probably considerably less than 4.6%, and hence in the lower part of the common range.
Reconciled double-entry error rates are often in the range 0.02% - 0.04% in terms of keystrokes, which usually equates to around 0.1% -0.2% in terms of field errors (assuming an average of 5 keystrokes per field, with only one keystroke error per field). Those figures are obviously in roughly the right ball-park, given 2-5% error for single entry, but one needs to remember that first- and second-entry errors are obviously not going to be totally independent (in fact may be highly correlated in some cases of unclear/ambiguous source data.
We tend to use a <1% level for acceptability, however this is a cell wide error rate rather than key stroke (so saying someone is 22 rather than 11 is one error not two). Our usual approach is to do a sample of double data entry and re -enter all data if above 1%. However sometimes it can be clearly identified that there is one variable where the errors are caused in that case we would usually only resolve the problem with that variable. Afraid that I don't have any references, just thought I would share what we do!
I just checked with the IS unit of the University where I used to work (Aga Khan University, Karachi) and there the error rate acceptable for analysis is 0.03%. This is after data cleaning. May be it is of help
I don't know of any references but I work in randomised controlled trials in Africa and we aim for less than 0.5% of data values as being acceptable. So, based on this your entry clerks would need re-training and, or reassignment.
However, sometimes if there are lots of string variables or a data collection form is really complicated, error rates can end up being artificially high because caps lock wasn't on or something.
I'm not sure how much sense it makes to seek a single standard for data entry. Your figure would be appalling in a life-critical situation but compares very favourably with me typing in my PC password. You need to consider the importance of the task and the incentives.
If as I suspect you want to encourage your data managers to do a better job then I'd suggest this is a motivational issue not a statistical one. Try to involve them in the task, illustrate its importance, thank them when they do a good job. I have tried telling people that they are substandard, but found that on its own it is not a very constructive approach!
Typists and secretaries used to take RSA qualifications that specified maximum acceptable error rates for copy typing. I can't find this now on the RSA website but try your library - or ask a secretary if such now exist outside VC's offices.
|