Ioanna Gioni asked for advice about calculating sample-size for a
non-inferiority trial with Normal data. Specifically, she stated "I am not
sure if I need to set my type error I (alpha) at half the conventional
type I error used in two-sided confidence intervals." This would be better
asked on the MedStats group, http://groups.google.com/group/MedStats, but
some interest has already been sparked on Allstat.
The regulatory authorities such as the FDA and CHMP usually require the
Type I error in a non-inferiority trial to be 0.025. This corresponds to
the lower limit of a 95% confidence interval for the difference between
the means of two arms in a clinical trial being greater than a
pre-specified tolerance.
Jay Warner asked for a quick guide to the medical stats jargon involved in
this query, so here's an attempt. I am writing this from memory, which is
an increasingly faulty storage mechanism in my case, so I may have made
slips; but I think it explains the main terminology.
In a clinical trial with two groups (usually referred to as "arms")
receiving different drugs (one of which may be a placebo), the primary aim
is usually to compare the means (m1 and m2, say) of a specified response
variable measured on all patients. Of most interest scientifically is an
estimate of the difference between those means and the precision of the
estimate, to show potential benefits to the patients. But in order to
satisfy the rules of regulation when making a new drug application, a
hypothesis test is carried out at a prescribed level of significance. The
most common test is usually referred to as "superiority" of one drug over
the other; in fact, this is a test of difference, with H0: m1 = m2. The
FDA requires this test to be carried out with alpha=0.05, which means in
practice that the Type I error associated with claiming one drug to be
better than another is 0.025, because no-one is interested in a new drug
if the comparator works better.
When a drug is re-formulated, there is a requirement to demonstrate that
the new formulation behaves like the old (say m1 is the mean for the new
drug). A test of "equivalence" is then performed, using a pre-specified
and medically accepted level of "tolerance" on the response scale: call it
t. The hypothesis tested is H0: (m1 > m2+t) OR (m1 < m2-t). The method is
referred to by the abbreviation TOST (two one-sided tests) because it is
carried out by testing the two components. If each component is tested
using the same alpha, the Type I error of the full test is also alpha
because the two cases are mutually exclusive. During the early development
of a drug, equivalence tests of the pharmacokinetics of the drug (referred
to as "bioequivalence tests") in small trials are usually accepted with
alpha=0.05, so each component test is carried out with alpha=0.05. But in
later development, in large confirmatory trials, alpha=0.025 is usually
required.
A non-inferiority test is carried out when all that is needed is to
demonstrate that a new drug or formulation is no worse than another. The
hypothesis again relies on a tolerance level, and is H0: m1 < m2-t. The
regulators usually require alpha=0.025.
The rules for significance levels are relaxed in some disease areas in
which it is hard to recruit patients, and therefore hard to achieve
statistical significance when a drug achieves a clinically important
effect. But apart from this, and from the small trials used for
bioequivalence testing, there is consistency in that there is a 1 in 40
chance of a trial being a "success" from the drug company's point of view,
if the drug actually has no efficacy at all. To satisfy the FDA about a
new drug, at least two trials have to succeed.
Peter Lane
Research Statistics Unit, GlaxoSmithKline
|