I do not think that random sampling is required for significance
testing. If it is then I have been wrong all my life! Clinical trials
, for example, are never done on random samples, but on the patients who
happen to turn up and agree to take part. Agricultural experiments,
where Fisher started, are not done on random samples. We would hire a
corner of a field and set up a series of plots. Treatments were applied
to plots chosen randomly, but they were a random sample only of plots,
not the field, let alone the crop across the nation. We might say that
the results apply to the population of which the sample might be
considered a random sample. Where random sampling is more important is
in estimation, as a confidence interval can apply only to a population
of which this sample is representative. Even so, when estimating the
difference between means of two groups, as in both clinical and
agricultural experiments, we assume that as the two randomised groups
come from the same population before treatment, a difference is likely
to apply to other bits of the field too. Even in non-randomised
studies, when it was found that people with lung cancer were
significantly more likely to have smoked cigarettes than people who had
other diseases, we did not worry that they were not a random sample.
The point was that we would be unlikely to get a difference of this size
in a sample if in the population which they represent smoking and lung
cancer were not related. In these studies we are most concerned about
patients we haven't yet met, crops we haven't yet planted, people who
have not started to smoke. The population is an infinite one stretching
into the future and we could never take a random sample of it.
Where your students would go wrong is in estimating a mean or a
proportion from a convenience sample and then applying it to the general
population. If you ask what proportion of the population think there
should be a congestion charge, a convenience sample at a bus stop might
give a very different answer to a convenience sample at a car park. The
confidence intervals would be meaningless for the general population.
Martin
Demack, Sean wrote:
> Hi All
>
> I am primarily seeking advice regarding the use of tests of statistical significance to generalise from social surveys. My concerns relate to the use of the survey method and the assumption of a random sample. Many surveys (most?) do not use random sampling. This may be due to practicalities such as a lack of (or difficulty of obtaining) a sample frame. Student feedback 'surveys' are often attempts at a census - questionnaires are emailed out or suck on a website (or virtual learning environment) and all are encouraged to participate - but response is never (even close to) 100%. The national student survey also uses this census approach. Within my department, psychology students and staff (in particular) use some fairly sophisticated statistical techniques (awash with p-values) on non-random (often, convenience / self selecting) samples.
>
> These approaches are pragmatic. Their widespread use and seemingly lack of concern from those that use them has made me ponder on my own dogmatic perspective. The psychology degree has (a highly valued) accreditation from the British Psychology Society and the degree is designed / developed in consultation with this society's guideline. Am I precious for being concerned? - I see psychology as a subject area with an increasing social influence (on social policy for example) and a lack of concern for fundamental assumptions (or cursory consideration) makes me wonder.
>
> My background is in applied statistics; I am part of the social science research methods group and have responsibility for teaching (primarily quantitative) research methods across the undergrad and postgrad programs. A few years ago it became fairly apparent that our students commonly had a rather underdeveloped idea of randomness - and limited appreciation of how this relates to statistical inference. As a group we focused on this - stressing the need of a sample frame and some form of random selection and that standing on a street / part of the campus selecting (perhaps in a haphazard way) the sample was not random.
>
> As well as developing students understanding of sampling (and how it relates to generalisation through statistical inference), we really wanted to deter students from undertaking a final year (undergrad) dissertation based solely on a student designed (non-random sampled) survey. To try to get students to appreciate that statistics from such surveys could not be (validly) generalised from. Students are now (strongly) encouraged to supplement a survey with another methodology (such as in-depth interviewing, focus groups or secondary data analysis), honestly discuss their sampling and avoid entering the world of test-led analyses. Students who wish to undertake a dissertation with an essentially quantitative methodology are directed towards data archives and secondary data sources. At one point I became so inundated with queries from students asking about 'which test to use' on their (non-random) samples I put together a 2-page handout that attempted to (fairly strongly) dissuade and explain why (I have attached this).
>
> Things seemed to be fairly successful (although the widespread media use of 'margins or error', 'significant difference' etc. on clearly non-random samples must serve to confuse/scupper this). Within the dissertation we run drop-in workshops (at the design and analysis stages) - within one of the analysis ones the discussion (inevitably) came round to statistical significance and generalisation. The result was a number of (psychology joint) students who became anxious about how this impacted on their final submission. This was followed by a plethora of emails from the supervisors and students in which the assumption of random sampling was regarded as 'philosophical', the student should not be concerned about it and proceed with their MANOVA or whatever. I see it differently but also did not see it as a reason for bringing a student's marks down (as they were following their tutors' advice). I then went to discuss the issue informally with the head of methods for psychology students - who stated that much (most) psychological peer-reviewed quantitative research ignored the random sampling assumption but still went ahead using tests of statistical significance (even calculating power) . I thought this may be an issue that related to differences between experimentalists and survey researchers but it became clear that surveys and generalisation were the main reasons for the use of p-values etc.
>
> It seems to be a tension between the pragmatic and the dogmatic - and my main reason for emailing is to seek comment
>
> - does it matter?
>
> - am I over obsessing about something that is so widely ignored?
>
> There seems to be a (kind of macho) perspective that quantitative analyses need to be complex and p-value heavy for it to be regarded as 'quality' - and hence attract high marks. This runs counter to my perspective - simplicity, clarity and critical thinking is all; p-values (when appropriate) can be useful additions but the main story lies within the descriptive analyses. The most complex technique our students use is (binary) logistic regression - p-values are present in assessing the model but the story comes from the (simpler and clearer) odds-ratios. If they used this technique on a non-random sample they would not use confidence intervals and stress that the findings related solely to their sample; if they used the British Crime Survey, Youth Cohort Study etc. they include the intervals and talk about statistical significance and generalisation.
>
> Sorry this is a long one - this has been a nagging issue and I would really appreciate members perspectives as the new academic year arrives.
>
> Best Wishes
>
> Sean Demack
>
> Senior Lecturer in Sociological Research Methods
>
>
>
>
>
>
> ******************************************************
> Please note that if you press the 'Reply' button your
> message will go only to the sender of this message.
> If you want to reply to the whole list, use your mailer's
> 'Reply-to-All' button to send your message automatically
> to [log in to unmask]
> Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk.
> *******************************************************
--
***************************************************
J. Martin Bland
Prof. of Health Statistics
Dept. of Health Sciences
Seebohm Rowntree Building Area 2
University of York
Heslington
York YO10 5DD
Email: [log in to unmask]
Phone: 01904 321334 Fax: 01904 321382
Web site: http://martinbland.co.uk/
***************************************************
******************************************************
Please note that if you press the 'Reply' button your
message will go only to the sender of this message.
If you want to reply to the whole list, use your mailer's
'Reply-to-All' button to send your message automatically
to [log in to unmask]
Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk.
*******************************************************
|