Print

Print


David
-- 





blog: http://communication.org.au/blo <http://communication.org.au/blo>g/
web: http://communication.org.au <http://communication.org.au/>

Professor David Sless BA MSc FRSA
CEO • Communication Research Institute •
• helping people communicate with people •

Mobile: +61 (0)412 356 795
Phone: +61 (03) 9005 5903
Skype: davidsless

60 Park Street • Fitzroy North • Melbourne • Australia • 3068

> On 13 May 2018, at 2:57 am, Don Norman <[log in to unmask]> wrote:
> 
> Having just written a comment on the proper use of statistics in
> determining Risks (for autonomous vehicles) for the mailing list RISKS (
> http://www.risks.org) I was inspired to comment on the recent interactions
> on this mailing list:
> 
> "Improving design methods (was Re: "What is Design Thinking" and
> "Improvement In and Through Design Thinking")"
> 
> Some of the discussions demonstrated a weak understanding of statistics.
> Not surprising: the normal training of designers does not include this.
> Worse, when we are taught statistics, it is often the wrong kind. (See the
> discussion "*Why designers need a special kind of statistical tests" *at
> the end of this note.)
> 
> First of all, many fields have developed reliable methods of assessing
> reliability of the impact of experimental manipulations. To quote Ali Ilhan:
> 
> Education researchers do these types of
> analyses all the time with controlled experiments in classrooms, that is,
> do a random assignment (or use a sampling strategy), try your "new" method
> in one group, do nothing "special" in another group, compare the end
> results statistically.
> 
> 
> Ali is correct and his description captures the spirit of appropriate
> testing. Note that the real test requires more sophistication
> than
> 
> simple
> -
> random assignment, but nonetheless,
> that is the major
> basis.
> 
> There are potential other biases, so it is important to control for them.
> It is often necessary to do double-blind
> 
> studies where neither the recipients nor the people doing the tests know
> what condition they are in. It is also important to ensure that the various
> test
> sites were (statistically) equal prior to the test.
> 
> There are several phenomena that can bias results, one of which is called
> "The Hawthorne Effect" and another is "Pygmalion."  The first refers to the
> fact that if people know they are being tested, their performance changes.
> The second refers to the fact that if the people doing the test know what
> is being tested, they are biased. (In the classic experiment, teachers were
> told the names of some students who are "usually gifted." Those students
> outperformed the others, even though they were randomly selected and were
> not actually special: the teachers' beliefs influenced how the students
> were treated and evaluated).
> 
> David Sless says:
> 
> it’s a bit like clinical practice in medicine where you look for symptoms
> of pathology and then apply a treatment. You then look to see if the
> symptoms disappear.
> 
> Unfortunately, this is a dangerous practice. This kind of test is badly
> flawed, even though many physicians follow it. First, it is not blind, so
> both physician and patient are biased toward a good result. For the
> patient, this is  "the placebo" effect. The placebo effect is real -- give
> a patient a fake pill, and if they believe it to be a powerful new drug,
> they might very well get better (the mechanism for this is still not well
> understood). For the physician, it is the Pygmalion effect.  And in any
> case, a single experiment is statistically unsound: The person might have
> gotten better with no treatment (this is the case for many back pain cases).
> 
> Most physicians are not scientists (even if the public thinks they are).
> Many do not know statistics and do not know how to do proper experiments.
> That's not in their training.
> 
> 
> David's comments also illustrate what is called N=1 (or" n of 1")
> experiments where "n" refers to the number of people being tested: a single
> person rather than the hundreds or thousands often used in RCT - Randomized
> Clinical Trials, which is today's gold standard. N of 1 trials can be done,
> but the best way is to do a sequence of trials.
> 
> Consider my situation. For the past several decades, I take a statin pill
> daily to treat cholesterol. Statins have as a possible side effect, muscle
> weakness or soreness.  Now, after years of taking the statin, I have muscle
> soreness. So I stop taking the Statin. If the soreness goes away, does it
> mean the statin was the cause? No. I have to be careful in assuming the
> statin was responsible. So I reintroduce the statin and see if the soreness
> comes back. I may have to do this serval times before I can have
> confidence. (One of the graduate students in the UCSD Design Lab has
> designed a simple method of assisting people in doing n of 1 experiments on
> themselves that yield reliable results: doing this that allows people to
> run their own trials on themselves.
> https://arxiv.org/pdf/1609.05763.pdf
> 
> Ali sums it up well:
> 
> There are a multitude of factors that may affect the way kids learn reading
> and writing (gender, being a minority, problems at home, quality of
> teachers, peer effects, age in months etc.), and our design intervention
> here, is just one among these many things. Even the fact that they are
> using a new digital thing might make kids spend more time working on
> reading and writing. But then this is a placebo effect, it is not our
> design per se. I cannot envision any scenario that excludes using
> statistics in this example, albeit very simple tests, nothing fancy. With
> this many different possible sources of variations, five or ten
> participants will never help us to understand the role of the app and its
> design here.
> 
> 
> *Why designers need a special kind of statistical tests *
> 
> Designers need a set of simple statistical methods that can inform our
> work.
> 
> Note that we do NOT need the care and precision normally followed in
> science and medicine. Why? Because they are looking for small effects
> whereas we are looking for large ones.
> 
> To the practicing designer, if the change we are advocating does not make a
> large difference (a factor of anywhere between 2 times and 10 times
> improvement), it is not worth pursuing.
> 
> Scientists look for statistical significance, which does not mean practical
> significance. Statistical significance means it is not likely to have
> occurred by chance, but it may be a small effect.
> 
> We are looking for large effects.  Even so, let us not be reckless. Doing
> something and seeing a large impact by itself tells us nothing. Try doing
> something that has zero relevance and presenting it to the
> client/customer/user. It might very well have the same large impact.
> Placebo effect.
> 
> We need double-blind studies. We need better research methods, ones suited
> for looking for large effects (which can, therefore, be simple, quick,
> etc.) but which nonetheless controls for factors that could otherwise
> confound the results.
> 
> We need a good statistician to work with a good designer to develop a set
> of methods.
> 
> 
> Don
> 
> 
> ​
> Norman
> Prof. and Director, DesignLab, UC San Diego
> [log in to unmask] designlab.ucsd.edu/  www.jnd.org  <http://www.jnd.org/>
> Executive Assistant:
> Olga McConnell, [log in to unmask]  +1 858 534-0992
> 
> 
> -----------------------------------------------------------------
> PhD-Design mailing list  <[log in to unmask]>
> Discussion of PhD studies and related research in Design
> Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design
> -----------------------------------------------------------------



-----------------------------------------------------------------
PhD-Design mailing list  <[log in to unmask]>
Discussion of PhD studies and related research in Design
Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design
-----------------------------------------------------------------