Dear David and Terry,
Thanks for your replies. These shed further light on the topic.
David’s post points to the issue of diagnostics, what a physician would label clinical research. For diagnostic research in medicine, the number of test participants is commonly as small as 1. This is often enough to enable wise professional judgment in specific cases. Terry’s post describes different ways to get more information by making richer use of available data. Both approaches make good sense to me.
The one kind of data that we generally miss in design is the larger kind of data that permits generalization and general conclusions. I can’t imagine conducting massive clinical trials on most questions in design — we have too few resources, too little funding, and too few researchers to undertake that kind of work. But once in a while, it would be useful. A richer foundation for the field would provide the basis of better specific decisions by working design professionals.
This foundation should also include a better collection of well developed case studies: careful analysis of specific cases contributes to the knowledge of the field by making richer use of available data. It is also possible that researchers with appropriate skills could undertake different kinds of quantitative analysis of case-level data.
As for the rest, your two posts are clear and convincing, so I’ll stop here and paste your comments below.
Yours,
Ken
--
David Sless wrote:
—snip—
I agree, very much with your post on appropriate epistemologies, and would merely add a few footnotes. Some are from earlier posts of mine to the list and some are based on other reflections and papers.
The number of six participants is interesting. Testing with larger numbers as a way of validating six has been done. We did a number of studies (unpublished) that did just that. We then took random selections of six from the entire sample and found that each sample contained the same results. There is also published work by a number of researchers that seems to broadly confirm this.
see for example:
Faulkner L 2003 Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, & Computers 35 (3) 379-383.
Beime et al (2007) provided an interesting study specifically in the field of medicines information design which confirmed the validity of small scale testing.
Beime B, Lawrence M, & Rieke C 2007 The practical application of readability user tests in national and international marketing authorisation procedures. Regulatory Rapporteur – May Issue 2007, 8-13.
Neilsen (2000) provides a mathematical rationale for small test numbers.
https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
None of these studies really explain why the numbers are so small and I have given some thought to this question. In a paper of mine to be published shortly in the Information Design Journal I suggest that our testing involves us watching people using our design and that it is like watching a dance. We notice where people step nimbly from one task to the next and also when they trip up. As designers we are like choreographers writing the dance steps. Thus the people using our designs are highly constrained by the context—the dance we ask them to perform. This then constrains the variations of the steps they can make.
Think of a staircase that creaks. How many people do you need to walk up the staircase to find the creaking rung, and do you need a world class athlete at one end of your sample, and a paraplegic at the other end? No, the carpenter or her mate will do. And how do you know when the rung has stopped creaking? Again, a small sample of even one is enough. This is the practical epistemology of design.
And there is another aspect of this that makes the small numbers OK. We do a lot of our testing to find out what is dysfunctional about our designs. We are like doctors looking for symptoms of pathology. When we find these symptoms we try to correct our designs so that the pathology disappears. Our ‘evidence’ or ‘proof’ that the ‘patient’ (our design) is cured is that the symptoms are no longer present in the next round of testing. Again an example of practical epistemology.
BUT BUT. None of what I am describing seems to us to be Research, let alone Design Research. It is the routine application of investigative methods, much in the way that a doctor uses routine blood tests. However, developing these investigative methods and the techniques for applying them is, like developing blood tests, a contribution to knowledge in that wider more generalisable way that we think about scientific knowledge.
Terms like ‘testing’ give a false sense, a scientistic sense, of what we do. We tend to use the e term ourselves when writing about our work for a business audience. My own view is that what we do in this type of work is have conversations with people from which our findings emerge.
I would like to say more on this matter to round out and elaborate on Ken’s excellent post, but perhaps for later. Enough of my voice in the conversation.
—snip—
--
Terry Love wrote:
—snip—
Great post. I enjoyed reading it
The issue of the statistics of the 6-10 samples is not well addressed in classic university research thinking.
The most common (and often assumed only) approach to doing statistics on data is the equivalent of 'going to a posh restaurant and only eating the bread roll' (Tawnee, Thud, Pratchett pp.266-267).
The most common approach is to gather the data, apply the single research question to them, and see which way the numbers point and with how much confidence.
Any data, however, contains much more information about many other things. Typically, the majority of the information in data lies outside what is tested by the statistics.
This enormous amount of extra value in the data is usually ignored - and often with good reason, as it is considered irrelevant to the question at hand.
But not always.
Here are three examples.
1. In academic assessment, the data of students' exam marks is an indication of the learning of each student. That is the most common use of that exam mark data.
However, the data also contains all sorts of other information including: the relative ability and biases of the markers, and how they change over time; the relative homogeneity of the student body and their learning; the relative quality, homogeneity and bias over time of exam setters and a whole load of other factors. Drawing these other factors from the data enables accurate individualised correction of student marks and assessment of how valid the examination was as an examination.
This way of drawing more out of the data using Multifaceted Rasch Analysis is now commonplace where examinations have to be accurate and unbiased. Rasch analysis does not intrinsically depend on large sample numbers because comparison between the measures elicited also gives a measure of confidence in their reliability (in effect three analysis bites at the same data cherry). Interestingly this extended use of data via statistical methods is relatively rare in universities and hardly ever taught to research students outside Education.
2. Deanonymisation methods offer ways of building huge bodies of tightly linked data about individuals. Again it does not need large datasets. Again the approach uses the additional implicit information embedded in data. SCL/CA use\ exactly these methods to get the data to influence the outcomes of national elections.
3. Data surveillance and information gathering in the standard ethical hacking model uses collected data about network responses from devices in multiple ways to infer even larger bodies of information about targets.
In the case of design research data collected from a small cadre of representative participants, this is very similar to the use of the conventional Purposive Sampling approach in statistics.
The data available from the participants is firstly used in multiple roles in terms of the problem itself (see for example
http://dissertation.laerd.com/purposive-sampling.php
)
Second is the secondary data explicitly or implicitly inferable from the decisions made about the membership of the purposive sample compared to populations.
Third are Rasch-like analyses that can derive meta-data from comparisons between elements of the first data that inform not only the confidence in the data and what can be derived but also identify biases in both the data and its collection
Fourth are the analyses that compare and contrast the above with more general data about populations.
Five are analyses that test, given the above, whether the sample size was big enough to infer the above findings.
Six...
Seven...
To recap, seen over-simplistically in terms of the research methods of large-sample size, associative, single-question analysis, the use of small samples in design research may appear to be faulty.
However, seen in terms of methods of analysis that make use of the rich complex body of information available in each datum and between data, small sample analysis methods can be both more reliable and offer better information.
Two final comments. There is increasing criticism of statistical analysis of big data as resulting in false findings. I've an excellent paper on this but can't put my hand on it at the moment.
Second, the statistics providing the evidence of the Higgs-Boson particle are in essence the same kind of statistics used in David's design sampling. It is statistical analysis on a relatively small number of particles with characteristic behaviours that are very complex (and which is where the maths is). I'm assuming the behaviours of David's research participants are a little less mathematically complex :-)
—snip—
-----------------------------------------------------------------
PhD-Design mailing list <[log in to unmask]>
Discussion of PhD studies and related research in Design
Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design
-----------------------------------------------------------------
|