David -- blog: http://communication.org.au/blo <http://communication.org.au/blo>g/ web: http://communication.org.au <http://communication.org.au/> Professor David Sless BA MSc FRSA CEO • Communication Research Institute • • helping people communicate with people • Mobile: +61 (0)412 356 795 Phone: +61 (03) 9005 5903 Skype: davidsless 60 Park Street • Fitzroy North • Melbourne • Australia • 3068 > On 13 May 2018, at 2:57 am, Don Norman <[log in to unmask]> wrote: > > Having just written a comment on the proper use of statistics in > determining Risks (for autonomous vehicles) for the mailing list RISKS ( > http://www.risks.org) I was inspired to comment on the recent interactions > on this mailing list: > > "Improving design methods (was Re: "What is Design Thinking" and > "Improvement In and Through Design Thinking")" > > Some of the discussions demonstrated a weak understanding of statistics. > Not surprising: the normal training of designers does not include this. > Worse, when we are taught statistics, it is often the wrong kind. (See the > discussion "*Why designers need a special kind of statistical tests" *at > the end of this note.) > > First of all, many fields have developed reliable methods of assessing > reliability of the impact of experimental manipulations. To quote Ali Ilhan: > > Education researchers do these types of > analyses all the time with controlled experiments in classrooms, that is, > do a random assignment (or use a sampling strategy), try your "new" method > in one group, do nothing "special" in another group, compare the end > results statistically. > > > Ali is correct and his description captures the spirit of appropriate > testing. Note that the real test requires more sophistication > than > > simple > - > random assignment, but nonetheless, > that is the major > basis. > > There are potential other biases, so it is important to control for them. > It is often necessary to do double-blind > > studies where neither the recipients nor the people doing the tests know > what condition they are in. It is also important to ensure that the various > test > sites were (statistically) equal prior to the test. > > There are several phenomena that can bias results, one of which is called > "The Hawthorne Effect" and another is "Pygmalion." The first refers to the > fact that if people know they are being tested, their performance changes. > The second refers to the fact that if the people doing the test know what > is being tested, they are biased. (In the classic experiment, teachers were > told the names of some students who are "usually gifted." Those students > outperformed the others, even though they were randomly selected and were > not actually special: the teachers' beliefs influenced how the students > were treated and evaluated). > > David Sless says: > > it’s a bit like clinical practice in medicine where you look for symptoms > of pathology and then apply a treatment. You then look to see if the > symptoms disappear. > > Unfortunately, this is a dangerous practice. This kind of test is badly > flawed, even though many physicians follow it. First, it is not blind, so > both physician and patient are biased toward a good result. For the > patient, this is "the placebo" effect. The placebo effect is real -- give > a patient a fake pill, and if they believe it to be a powerful new drug, > they might very well get better (the mechanism for this is still not well > understood). For the physician, it is the Pygmalion effect. And in any > case, a single experiment is statistically unsound: The person might have > gotten better with no treatment (this is the case for many back pain cases). > > Most physicians are not scientists (even if the public thinks they are). > Many do not know statistics and do not know how to do proper experiments. > That's not in their training. > > > David's comments also illustrate what is called N=1 (or" n of 1") > experiments where "n" refers to the number of people being tested: a single > person rather than the hundreds or thousands often used in RCT - Randomized > Clinical Trials, which is today's gold standard. N of 1 trials can be done, > but the best way is to do a sequence of trials. > > Consider my situation. For the past several decades, I take a statin pill > daily to treat cholesterol. Statins have as a possible side effect, muscle > weakness or soreness. Now, after years of taking the statin, I have muscle > soreness. So I stop taking the Statin. If the soreness goes away, does it > mean the statin was the cause? No. I have to be careful in assuming the > statin was responsible. So I reintroduce the statin and see if the soreness > comes back. I may have to do this serval times before I can have > confidence. (One of the graduate students in the UCSD Design Lab has > designed a simple method of assisting people in doing n of 1 experiments on > themselves that yield reliable results: doing this that allows people to > run their own trials on themselves. > https://arxiv.org/pdf/1609.05763.pdf > > Ali sums it up well: > > There are a multitude of factors that may affect the way kids learn reading > and writing (gender, being a minority, problems at home, quality of > teachers, peer effects, age in months etc.), and our design intervention > here, is just one among these many things. Even the fact that they are > using a new digital thing might make kids spend more time working on > reading and writing. But then this is a placebo effect, it is not our > design per se. I cannot envision any scenario that excludes using > statistics in this example, albeit very simple tests, nothing fancy. With > this many different possible sources of variations, five or ten > participants will never help us to understand the role of the app and its > design here. > > > *Why designers need a special kind of statistical tests * > > Designers need a set of simple statistical methods that can inform our > work. > > Note that we do NOT need the care and precision normally followed in > science and medicine. Why? Because they are looking for small effects > whereas we are looking for large ones. > > To the practicing designer, if the change we are advocating does not make a > large difference (a factor of anywhere between 2 times and 10 times > improvement), it is not worth pursuing. > > Scientists look for statistical significance, which does not mean practical > significance. Statistical significance means it is not likely to have > occurred by chance, but it may be a small effect. > > We are looking for large effects. Even so, let us not be reckless. Doing > something and seeing a large impact by itself tells us nothing. Try doing > something that has zero relevance and presenting it to the > client/customer/user. It might very well have the same large impact. > Placebo effect. > > We need double-blind studies. We need better research methods, ones suited > for looking for large effects (which can, therefore, be simple, quick, > etc.) but which nonetheless controls for factors that could otherwise > confound the results. > > We need a good statistician to work with a good designer to develop a set > of methods. > > > Don > > > > Norman > Prof. and Director, DesignLab, UC San Diego > [log in to unmask] designlab.ucsd.edu/ www.jnd.org <http://www.jnd.org/> > Executive Assistant: > Olga McConnell, [log in to unmask] +1 858 534-0992 > > > ----------------------------------------------------------------- > PhD-Design mailing list <[log in to unmask]> > Discussion of PhD studies and related research in Design > Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design > ----------------------------------------------------------------- ----------------------------------------------------------------- PhD-Design mailing list <[log in to unmask]> Discussion of PhD studies and related research in Design Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design -----------------------------------------------------------------