I have read with interest some of the exchanges on this site on
discrimination indices. Can I add my pennyworth, specifically a warning?
The indices are measures of the correlation between total test scores and
individual item scores. They can be calculated as correlation coefficients
or (easier in the old days) by data from the papers with the highest and
lowest scores (e.g. the top and bottom 27%). Either way, the indices are
dangerously unreliable if they are only based on a few hundred examinees.
I.e. good questions may be thrown away, with truly poor discriminators being
unrecognized as such.
Tables of criterion values have been published, against which to judge the
acceptability of the indices. These are not valid.
Where blind guessing of answers is not discouraged (e.g. by negative
marking), the indices are even less reliable.
I do not think that significnce levels for the correlation coefficients are
helpful.
I have written this up in a paper soon to be published:
'Do Item Discrimination Indices really help us to improve our tests'.
Assessment & Evaluation in Higher Education 26 (3) June 2001.
Improving the reliability of multiple choice and true/false tests is of
particular interest to me and to my colleague David Miller - see also the
next two papers:
Burton R.F and Miller D.J. (2000) Statistical modelling of multiple-choice
and true/false tests: ways of considering, and of reducing, the
uncertainties attributable to guessing. Assessment & Evaluation in Higher
Education, 24 (4), 399-411.
Burton R.F (2001) Quantifying the effects of chance in multiple choice and
true/false tests: question selection and guessing of answers. Assessment &
Evaluation in Higher Education, 26 (1), 41-50.
|