Dr Doggett brings up a good point. Study designs are tailored to the
question that needs to be answered and are chosen to minimize bias and cost
of study conduction and maximize the power of the study to find an effect
(this was stated as fact but a 'should" should be used instead of all the
"are"s).
However, all sensible study design features that limit bias in a specific
research area should be present in all studies to answer that specific
question.
To make it simple lets restrict the discussion to studies of therapy and
limit the discussion to define which are the study design features that
limit bias in this type of studies. There are some conflicting reports
about the role of blinding. Allocation concealment in the context of
appropriate randomization seems pretty solid (but one could think of a
previously incurable disease in which a new treatment cures 100% of the time
and a before-after design maybe all that is needed). For most drug trials
searching for a mid-size effect, a large RCT with appropriate allocation
concealment, complete follow-up, blinding of most if not all of those
participating in the study, and intention-to-treat principle limits
opportunity for bias.
Studies of empirical evidence of bias assume that they will show a
systematic effect of bias on the outcomes (ie, lack of blinding will
overestimate a drug's effects). Interestingly, bias may deviate results
from the truth through a systematic mechanism but in an unsystematic
direction. So, bias may be present and we may not be able to tell. One
could hold the most rigorous study as the gold standard if one is available
and compare to other studies, but one would find that many other factors
other than research design are also different. Thus the call for a
heterogeneity assessment (not the statistical type) in those instances.
Nonetheless, sometimes when reviewing a certain topic, one could imagine the
ideal study to answer the question of interest, decide on the features that
should be present in all studies in that specific topic, and then judge what
is missing in the published ones. I have done this and found that just as I
am giving up on a specfic literature, there comes a report that has all the
bells and whistles I hoped for showing me that my dreamt study was FEASIBLE
and therefore valid to use it as a yardstick to compare other studies. This
clearly does not replace the heterogeneity assessment. However, this
qualitative approach further describes the studies one is looking at.
Scoring systems oversimplify and blur this more qualitative description of
study designs and may dissapoint as an instrument of measure.
The instrument we are using at this time combines quantitative and
qualitative elements. We are currently using the scoring component to help
categorize studies according to their quality for a specific project, not
for pooling, and to evaluate agreement across reviewers for quality control
purposes. The detailed analysis of study design is still a big long
paragraph describing the research design.
Cheers,
V
> ----------
> From: Doggett, David[SMTP:[log in to unmask]]
> Reply To: Doggett, David
> Sent: Monday, August 13, 2001 4:16 PM
> To: [log in to unmask]
> Subject: Re: validated instruments for critical appraisal
>
> May I interject a word of caution concerning "validated" evidence
> hierarchies. Over the years we have from time to time looked into the
> literature on the validity of evidence hierarchies. A related question,
> for
> which there is more literature, and upon which the concept of evidence
> hierarchies depends, is the question of the effect of study design on
> research outcomes; i.e., whether double-blind RCTs are always necessary,
> or
> whether in some situations more convenient study designs are adequate. In
> general we have always found that the literature shows that the effect of
> study design on research outcomes is topic specific. Because of this, the
> search for a universally valid quality rating system appears to be futile.
>
> When study design does not correlate with outcome differences, it may be
> for
> one of two reasons. In some areas there is so much subjectivity, bias
> (particularly publication bias) and fraud that the apparently best study
> designs give results just as flawed as worse study designs. This may be
> the
> case in some areas of pseudoscience where research is carried out by
> proponents. On the other hand, in some research areas there are hard
> outcomes, and conscientious researchers are sophisticated in research
> design
> and data analysis, so that the better study designs may not improve
> reliability over simpler designs. Some areas of cardiology come to mind
> here. It is not uncommon in technology assessment to find RCTs that are
> fatally flawed in terms of internal or external validity, and on the other
> hand less rigorously controlled studies that are well done and reliable.
>
> If study design does not invariably affect research outcomes, then it
> follows that there can be no universal validation of evidence hierarchies
> based on study design. In particular, whereas double-blind RCTs are in
> general more reliable than less rigorous designs, the precise points
> assigned to various study design aspects by a quality rating system are
> not
> universally appropriate, and adjusting or weighting outcomes according to
> such quality rating scores cannot be justified. Blind belief in these
> rating scales applied to uncharted areas of research is simply not
> appropriate.
>
> A more reasonable approach is to use heterogeneity analysis to empirically
> assess whether study design substantially affects outcomes in the
> particular
> set of studies at hand. Heterogeneity analysis should not be merely an
> inspection of heterogeneity test p values, because small sets of studies
> may
> not have sufficient statistical power to detect clinically significant
> differences in results. Regardless of p values, different study designs
> that give results that appear to have clinically significant differences
> in
> outcomes might best be grouped separately.
>
> This is not a simple subject. Unfortunately going into our files and
> putting together a comprehensive bibliography on this subject is beyond my
> time constraints at the moment.
>
> David L. Doggett, Ph.D.
> Senior Medical Research Analyst
> Health Technology Assessment and Information Services
> ECRI, a non-profit health services research organization
> 5200 Butler Pike
> Plymouth Meeting, Pennsylvania 19462, U.S.A.
> Phone: (610) 825-6000 x5509
> FAX: (610) 834-1275
> http://www.ecri.org
> e-mail: [log in to unmask]
>
>
> -----Original Message-----
> From: Gero Langer [mailto:[log in to unmask]]
> Sent: Monday, August 13, 2001 4:32 AM
> To: [log in to unmask]
> Subject: validated instruments for critical appraisal
>
>
> Hello,
>
> I am looking for some validated instruments to critically appraise
> studies. It is important to find the 'best' studies, and a (validated)
> rating system for all kinds of questions (intervention, diagnosis,
> qualitative etc.) should be used.
>
> Currently I am using the JAMA users' guides, but they are not validated
> (or?) and comparisons between studies are difficult and subjective. For
> RCTs I am working with the Jadad score. With all of those I could get a
> 'result', but not a 'comparing' (e.g. rated) solution.
>
> We are developing a database for nurses in Germany and trying to offer the
> best available evidence for some nursing problems -- but what is unbiassed
> the best? A scoring system would be very useful... Does anyone know of
> anything in this field?
>
> Thanks in advance,
> Gero Langer
>
> --
> Martin Luther University Halle-Wittenberg
> Institute for Nursing and Health Sciences
> German Center for Evidence-based Nursing
> Website: www.EBN-Zentrum.de E-Mail: [log in to unmask]
>
|