Anita, you ask:
"What evidence is there that teaching CAS has the ability to change doctors'
gut instincts about certain treatments (if the gut feelings don't match the
evidence)?"
I evaluated the Critical Appraisal Skills Programme (CASP) in Oxford. As
part of this evaluation I did a quick systematic review of the evidence
about CAS teaching in order to
7 compare the performance of CASP with other CAS teaching interventions
7 learn from others about ways of improving the effectiveness of critical
appraisal skills teaching
7 examine the methods used to assess impact: in particular to review the
outcome measures used and their validity, reliability and utility.
A comprehensive systematic review was commissioned a couple of years ago by
the HTA and was undertaken by Julie Parkes under the supervision of Ruairidh
Milne and Jon Deeks but I don't know if is finished and have not yet seen
its findings. A very brief summary of what I did is below. Full details of
the report can be obtained from the CASP office in Oxford (01865 226968). I
will tell you about the evaluation of CASP itself in another message this
one is enormous already!!! There is also of course a systematic review
about the best way of changing professional practice!
Search:
7 7 electronic databases were searched: MEDLINE: Embase; Cinahl; HealthSTAR;
DHSS Data; ERIC; LISA.
7 Contact others working in the field of teaching CAS.
7 A forward search on the Science Citation Index for all relevant published
studies identified by the other methods.
Selecting relevant papers:
7 Papers were selected if the objective of the study was to measure the
effect of teaching CAS on the knowledge, skills, attitudes or behaviour of
people who make healthcare decisions.
Papers identified:
16 primary published studies, 2 unpublished evaluations of CAS teaching
programmes, 1 published and 1 unpublished review were identified.
Results:
Most of the studies were of appalling quality and useless.
Gehlbach (1980)
is a methodologically weak study (small, no pre-intervention test, groups
are not comparable), which produces equivocal results about CAS training.
There are no methodological lessons about how to measure the effectiveness
of CAS teaching.
Riegelman (1983)
a cross-sectional study that describes and compares 1st year and 4th year
medical students' skills, attitudes and behaviour. It was undertaken before
the introduction of a course teaching CAS and does not address the question
of the impact of critical appraisal teaching. There are no important
methodological insights into the reliability or validity of different
outcome measures for knowledge, attitude or behaviour from this study.
Likert scales were used to measure attitudes towards the value of original
research and the importance of different criteria for judging the value of
an article. These are reported using the mean and the standard error of the
mean (SEM) and compared using Student's t-test. There is no discussion of
the methodology, the practical significance of differences, the fact that
there were multiple comparisons, or any attempt to look at the reliability
or validity of these measures.
Cuddy (1984)
a prospective non-randomised trial comparing the effectiveness of
slide-tapes with traditional lecture for teaching CAS. The trial had 9
students in each arm. This methodological weak study, with poor
presentation of the results, gives no compelling evidence about the
effectiveness of CAS teaching and no methodological insights into how this
can be measured.
Gehlbach (1985)
reports a non-randomised controlled trial comparing lecture, seminar and
self-instruction for teaching CAS. There was no significant difference on
objective exam results but 60% of students rated self-instruction successful
compared to 37% of seminar and 19% of lecture students (p<0.001).
Self-assessed CAS were also significantly higher in this group. There were
no methodological discussions to inform future evaluation of CAS teaching
Radack (1986)
is a non-randomised controlled trial of five 50 minutes sessions teaching
CAS for diagnostic tests and interventions. The sessions were problem-based
using a scenario and relevant published paper. This was appraised using
published structured methodological criteria.25 26 This methodologically
poor study (small numbers, non-randomised, groups not comparable, no blind
assessment, large drop out rates) failed to show a statistically significant
difference in CAS between intervention and control groups. Assessment was
by an objective test with a written scenario and the results section of a
relevant paper. Students had to (a) calculate the specificity, sensitivity
and predictive value of a test and (b) decide whether to use the test. There
were no details of the scoring, range of results, reliability or validity of
the instrument. There was no relationship between perceived relevance of
the course content and improvement in scores on the test exercise.
Riegelman (1986)
reports a prospective study comparing the pre and post test scores of 1st
year medical students after receiving 12 hours of lectures and 4 hours of
supervised seminars on how to read medical literature. Students' perceived
effectiveness of the course and self-assessed competence and knowledge were
measured on Likert scales (analysed as parametric data and compared using
Student's t-test). Four questions were used to objectively measure
knowledge of basic study design and statistics (and the proportions compared
using a chi-squared test). Students rated the intervention as effective and
there was a statistically significant improvement in self-assessed
competence and knowledge. Self-assessed knowledge and competence declined
over the next three years (although it remained above pre-intervention
levels). Nonetheless as 4th year students they assessed their knowledge and
competence at a higher level than 4th year students that had not had a
similar course. Objectively assessed knowledge also increased immediately
and, although it declined over the next 3 years, was still statistically
significantly higher than before the intervention. Intervention students in
their 4th year scored better on these tests than 4th year students who had
not had the course. This paper suggests that critical appraisal teaching is
effective in improving knowledge and self-assessed competence in CAS but
does not advance the methodology of developing or validating a measuring
instrument for CAS or the analysis of scores.
Bennett (1987)
is a non-randomised controlled trial of CAS teaching (for diagnostic tests
and intervention studies) by trained tutors. This study provides evidence
that critical appraisal teaching is effective. It is methodologically
superior to previous studies. Apart from having not being randomised, the
trial was methodologically rigorous, it had
7 reasonable power (n>30 in each arm)
7 a good response rate (86%)
7 evaluation by objective test
7 a measuring instrument that was validated in pilot studies
7 pre-defined criteria for assessment
7 pre-defined level of what is an important change (but unspecified)
7 blinded assessment of results
7 examination of reliability of inter- and intra-observer variability.
Results:
1. the ability to critically appraise an article on diagnostic tests
increased from 21% to 58% of maximum possible score (p<0.001) in the
intervention group while control students' ability fell from 32% to 27%
(p>0.05)
2. the ability to critically appraise an article on therapy went from 27% to
35% (p<0.01) in the intervention group and from 28% to 23% in the control
group (p>0.05)
3. 67% of the intervention group compared to 35% of the control group
(p<0.01) achieved an important improvement in CAS for diagnosis
4. 40% of intervention group compared to 18% of control group (p<0.01)
achieved an important improvement in CAS for therapeutic papers.
Linzer (1987)
reports a non-randomised controlled trial looking at whether a journal club
co-ordinated by a general medicine faculty member (Group 1) is better for
teaching clinical epidemiological skills than one co-ordinated by a chief
resident (Group 2). Group 2 members attended more sessions on average than
Group 1 (12.9 vs 7.0 p<0.02). Group 2 members read more articles than Group
1 (23.5 vs 14.9 per month p<0.02). There was no significant difference in
the thoroughness with which these articles were read. Group 1 reported that
their reading habits had changed more than group 2 (62 vs 32% p=0.055).
Group 1 enjoyed the experience more (p<0.001). There was no difference in
the objective test results for epidemiological knowledge. The author
concludes that both formats are equally effective in teaching CAS. This
study does not provide evidence that journal clubs improve CAS because there
was no true control group. There were no methodological advances in the
assessment of the impact of CAS teaching. The self-assessment of ability to
critically appraise an article did not correlate with performance on an
objective test.
Linzer (1988)
is a well-designed RCT . An effort has been made in the design, piloting
and validation of a CAS measuring instrument. Double-blinding was used when
possible. An important improvement in skills was pre-defined as a 15%
improvement. Scores were calculated as a percent of possible positive
change in score. Results:
1. a 25% improvement in objectively measured biostatistical and
epidemiological knowledge (p=0.02)
2. no improvement in objectively measured critical appraisal skills
3. 86 % reported an improvement in self-reported reading habits compared to
0% in control group (p<0.001).
Scores were treated parametrically and compared by Student's t-test.
Kitchens (1989)
reports a non-randomised controlled trial of CAS teaching compared to
standard seminars in ambulatory care for doctors. The intervention was 8
CAS teaching sessions using standard McMaster guidelines25,27 to appraise an
article. It is methodologically weak: there was no randomisation;
participants were not pre-tested to see if they were comparable; the two
groups were different because of a different phase 1 intervention; there was
no reference to the validity or reliability of the test instrument; scoring
of test was not blind. An important change in score was defined as an 18%
improvement in score (achieved by 21% of the intervention group and 5% of
the control group).
Romm (1989)
reports an RCT comparing the impact of small group teaching or lecture
format for teaching CAS. Assessment of test scores was blinded. There is
no discussion about the validity or reliability of the test instrument. No
difference was found between the two groups on the test. A questionnaire
was used to measure students' self-assessed ability to read and understand
the medical literature, and the success, importance and overall quality of
the teaching. While intervention students had no difference in their
self-assessed ability to read the paper, they rated CAS as more important,
the teaching more successful and the overall quality of the course higher
than the control group (all at P<0.02). These differences in attitude and
satisfaction were thought to be more conducive to long term retention and
self-directed learning and the format of courses was changed.
Frasca (1992)
reports that medical students who had a course on CAS performed
significantly higher on tests of CA and library skills than those who did
not. It is a methodologically weak study with no randomisation, no blinding
of evaluator, no validation of the measuring instrument, no re intervention
measurements and no discussion of what would be an important change.
Seelig (1993)
contrasts general attitudes and behaviours reported by internists and
medical residents about keeping up with current medical knowledge and
investigates the effect of a one hour seminar on CAS on attitudes, behaviour
and skills. This paper does not provide evidence for the effectiveness of
CAS teaching as there is no comparison group. Internists deteriorated on
their mean objective test scores. There are no methodological advances:
Likert scales are used to measure self reported attitudes and behaviour and
are treated as interval data; the knowledge test is not described in detail
and results are reported as proportion correct; there is no consideration of
what would constitute an important change.
Domholdt (1994)
is a cross-sectional survey looking at the factors influencing CAS rather
than the impact of CAS teaching. The response rate was only 26.7% and the
numbers are very small so no conclusions can be confidently drawn from this
paper.
Landry (1994)
is a controlled trial that showed a significant (p<0.0001) increase in
objectively assessed knowledge and improved self-reported approach to
appraising scientific literature (p<0.05) in the intervention group. A
blinded assessment of patient write-ups by students showed no increased use
of the medical literature. Apart from not being randomised, this was a
well-conducted study, with pre and post tests, and an effort to validate the
objective test instrument. Sufficient details are not given to inform
future CAS teaching evaluation.
Stern (1995)
is a cross-sectional study evaluating the CAS of internal medicine
residents. Of 62 questionnaires (evaluating a sample article) distributed,
28 replies were received (45% response rate). The composite score of
residents was 63% of the "gold standard" and was not significantly
correlated with post-graduate year, prior journal club experience or
self-assessed CAS. This paper provides no evidence about the effectiveness
of CAS teaching. It is interesting in that there was an effort to develop
and validate an instrument for objectively assessing CAS and then comparing
them against a "gold standard". The instrument has eleven statements (based
on the McMaster criteria with Likert scale responses) for evaluation of an
intervention study. A "correct" response for each statement was determined
by sending the questions and an article to a panel of 5 "experts". 98% of
the panel's responses were within one point of the median. Scores were
calculated for students by quantifying the deviation of each score from the
"correct" score and then adjusting these as a proportion of worst possible
score to give a percentage score. Data was treated as interval data. The
mean score for medical residents was 63% of the gold standard (range 24 -
94%).
Audet (1993)
is a systematic review of CAS teaching which identified 10 studies (all
included here). Only 4 were judged to be methodologically sound (Riegelman
(1986), Linzer (1988), Romm, Bennett). It concludes that the effectiveness
of CAS teaching remains uncertain and that more rigorous methods are needed
in evaluation.
Hyde (1995)
is an unpublished internal evaluation of 15 CASP workshops. These were
evaluated with the same questionnaires as the workshops in this study. Thus
the findings are particularly relevant. It is methodologically interesting
because it attempts to produce a summary measure of impact and adjust for
confounding using information from control questions. The main outcome
measure is computed as a weighted change in score (actual change/maximum
possible change). Test and control questions on both knowledge and
attitudes improved after the workshop but the test questions improved more
than the control questions (p=0.02 for attitude, p<0.001 for knowledge).
Attitude terms improved by 0.76 (possible range -14 to +14) and knowledge
terms by 3.24 (possible range -18 to +18) on average. These scores are hard
to interpret and the author points out that while these results are
statistically significant it is hard to judge whether they are also
important.
Taylor (1996)
summarises 15 published papers on CAS teaching but provides no further
analysis.
Crowe (1997)
reports an 8-workshop CASP project. Analysis of satisfaction and
knowledge/attitude questionnaires show statistically significant changes in
the desired direction for all items. There was a low response rate (48.6%).
There is no discussion about what constitutes an important change and no
attempt made to produce a summary measure of impact or adjust for
confounding using information from control questions. The report
concentrates on a qualitative evaluation of the project and has many
interesting lessons for CAS teaching but no methodological insights for
quantitative methods of measuring impact.
Amanda
_______________________________________
Dr Amanda Burls
Senior Clinical Lecturer in Public Health & Epidemiology
Department of Public Health and Epidemiology
University of Birmingham
Edgbaston
Birmingham B15 2TT
Tel. 44 (0) 121 414 7508
Fax. 44 (0) 121 414 7878
Sec. 44 (0) 121 414 7450
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|