Another point here-and I think the crucial one- is that (grades of)
recommendations are directly linked to a level of evidence system, which
means different recommendations will be derived depending on the hierarchy
of study types used. For example, SIGN researchers may end up with different
grades of recommendations if they used Oxford ranking scheme instead of
"old" AHRQ system. Ditto goes for the instruments for critical appraisal-
many have been developed, but I am yet to see a validated one taking into
account empirical evidence associated with a given methodological dimension
(instead of normative beliefs of a good trial design). (SIGN folks claim
that they did validate their checklists; unfortunately their instrument
appears to be available only to the members of the group or on special

ben d

-----Original Message-----
From: Martin Dawes [mailto:[log in to unmask]]
Sent: Tuesday, August 14, 2001 2:24 AM
To: Djulbegovic, Benjamin; [log in to unmask]
Subject: Re: Re: validated instruments for critical appraisal & levels
of recommendation

There is some confusion regarding levels of evidence and recommendations.
Levels of evidence were initially developed to help people grade the
evidence already written. That is, when using the ebm approach, which sort
of articles would most likely give me the truth. It let people new to
medical research papers know that there were different types of research,
and that the results from these differed in terms of effect size etc. This
is quite a new concept (& remains largely unknown) to a lot of health care
professionals and totally new (and totally misunderstood) to most of the
medical press. So they were largely educational in purpose guiding clincians
to seek the most believable published evidence.

There has then been an incorrect assumption that this means that researchers
must always  perform the highest level of evidence as their research
methodology - and get criticised when they dont. But clearly there will
always be situations where it is impossible, unethical, to do that.  That
does not mean that that research is less valuable. It maybe the highest
level of evidence there is (and will ever be).

Grades of recommendation also then sit uneasily side by side with levels of
evidence. This was a major leap from just the believability level to making
a recommendation. Realistically to make a grade A recommendation you need to
have clear clinical benefit - large RCT or SR,  AND economic and decision
analysis, AND large cohort data for harm (sorry cerivastatin). But that
almost never happens.

From being on the side of our table of levels of evidence grades of recommendation have moved to the
bottom of the page - and perhaps should be removed from this page
altogether. If the best evidence is a case control study and an RCT is
unethical should the grade of recommendation be lowish? Yes the
believability index maybe quite low BUT there is herd instinct in all of us.
It is still reassuring in the light of weakish evidence to have a firm

Particularly in critical conditions it is reassuring for patients to know we
are all doing the same thing - while still thinking about why we are doing
it. So with weak evidence you might have 100% consensus (I know - very
rare) - so the recommendation should be strong n'est pas? Whilst reminding
ourselves that any treatment in any patient is an experiment and we can
predict all we like but for people there are only three outcomes ,benefit,
no benefit & harm.

So levels of evidence remain a tool to show us what sort of research has
been done

Martin Dawes

----- Original Message -----
From: "Djulbegovic, Benjamin" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Monday, August 13, 2001 11:31 PM
Subject: Re: validated instruments for critical appraisal

> By strange coincidence the exchange at this discussion group comes at the
> time when SIGN (Scottish Intercollegiate Guidelines Network) published "A
> new system for grading recommendations in evidence-based guidelines". In
> paper (see the link the
> authors state that they developed validated checklists for the assessment
> the quality of evidence. If so, this indeed will represent the major
> accomplishment. I am CC this message to the lead author of the SIGN paper,
> who, I hope, will find this exchange stimulating enough to get involved in
> this crucial debate within EBM movement.
> looking forward to interesting discussion
> ben
> Benjamin Djulbegovic, MD,PhD
> Associate Professor of Oncology and Medicine
> H. Lee Moffitt Cancer Center & Research Institute
> at the University of South Florida
> Interdisciplinary Oncology Program
> 12902 Magnolia Drive
> Tampa, FL 33612
> Editor: Evidence-based Oncology
> e-mail:[log in to unmask]
> phone:(813)979-7202
> fax:(813)979-3071
> --
> -----Original Message-----
> From: Doggett, David [mailto:[log in to unmask]]
> Sent: Monday, August 13, 2001 5:16 PM
> To: [log in to unmask]
> Subject: Re: validated instruments for critical appraisal
> May I interject a word of caution concerning "validated" evidence
> hierarchies.  Over the years we have from time to time looked into the
> literature on the validity of evidence hierarchies.  A related question,
> which there is more literature, and upon which the concept of evidence
> hierarchies depends, is the question of the effect of study design on
> research outcomes; i.e.,  whether double-blind RCTs are always necessary,
> whether in some situations more convenient study designs are adequate.  In
> general we have always found that the literature shows that the effect of
> study design on research outcomes is topic specific.  Because of this, the
> search for a universally valid quality rating system appears to be futile.
> When study design does not correlate with outcome differences, it may be
> one of two reasons.  In some areas there is so much subjectivity, bias
> (particularly publication bias) and fraud that the apparently best study
> designs give results just as flawed as worse study designs.  This may be
> case in some areas of pseudoscience where research is carried out by
> proponents.  On the other hand, in some research areas there are hard
> outcomes, and conscientious researchers are sophisticated in research
> and data analysis, so that the better study designs may not improve
> reliability over simpler designs.  Some areas of cardiology come to mind
> here.  It is not uncommon in technology assessment to find RCTs that are
> fatally flawed in terms of internal or external validity, and on the other
> hand less rigorously controlled studies that are well done and reliable.
> If study design does not invariably affect research outcomes, then it
> follows that there can be no universal validation of evidence hierarchies
> based on study design.  In particular, whereas double-blind RCTs are in
> general more reliable than less rigorous designs, the precise points
> assigned to various study design aspects by a quality rating system are
> universally appropriate, and adjusting or weighting outcomes according to
> such quality rating scores cannot be justified.  Blind belief in these
> rating scales applied to uncharted areas of research is simply not
> appropriate.
> A more reasonable approach is to use heterogeneity analysis to empirically
> assess whether study design substantially affects outcomes in the
> set of studies at hand.  Heterogeneity analysis should not be merely an
> inspection of heterogeneity test p values, because small sets of studies
> not have sufficient statistical power to detect clinically significant
> differences in results.  Regardless of p values, different study designs
> that give results that appear to have clinically significant differences
> outcomes might best be grouped separately.
> This is not a simple subject.  Unfortunately going into our files and
> putting together a comprehensive bibliography on this subject is beyond my
> time constraints at the moment.
> David L. Doggett, Ph.D.
> Senior Medical Research Analyst
> Health Technology Assessment and Information Services
> ECRI, a non-profit health services research organization
> 5200 Butler Pike
> Plymouth Meeting, Pennsylvania 19462, U.S.A.
> Phone: (610) 825-6000 x5509
> FAX: (610) 834-1275
> e-mail: [log in to unmask]
> -----Original Message-----
> From: Gero Langer [mailto:[log in to unmask]]
> Sent: Monday, August 13, 2001 4:32 AM
> To: [log in to unmask]
> Subject: validated instruments for critical appraisal
> Hello,
> I am looking for some validated instruments to critically appraise
> studies. It is important to find the 'best' studies, and a (validated)
> rating system for all kinds of questions (intervention, diagnosis,
> qualitative etc.) should be used.
> Currently I am using the JAMA users' guides, but they are not validated
> (or?) and comparisons between studies are difficult and subjective. For
> RCTs I am working with the Jadad score. With all of those I could get a
> 'result', but not a 'comparing' (e.g. rated) solution.
> We are developing a database for nurses in Germany and trying to offer the
> best available evidence for some nursing problems -- but what is unbiassed
> the best? A scoring system would be very useful... Does anyone know of
> anything in this field?
> Thanks in advance,
>   Gero Langer
> --
> Martin Luther University Halle-Wittenberg
> Institute for Nursing and Health Sciences
> German Center for Evidence-based Nursing
> Website:   E-Mail: [log in to unmask]