JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  November 2008

ALLSTAT November 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Sample size/percentage

From:

Ken Masters <[log in to unmask]>

Reply-To:

Ken Masters <[log in to unmask]>

Date:

Sun, 30 Nov 2008 08:59:28 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (419 lines)

Hi All

I posted a query to the list regarding sample size, and received several
replies, for which I am really grateful.  As I understand it, the idea
of the list is for people who receive replies to then post a compendium
of the replies back to the list.  I have produced that here, in the
order in which I received the replies (these are initial replies only -
in 4 cases, there were follow-up discussions).  I have removed the names
of posters from their postings, but have listed (in alphabetical order)
the posters' names at the end.

Thanks very much for all who participated - you helped a great deal, and
I hope that others on the list will find the comments very useful.

Ken


----

I think that you (or anybody [no disrespect meant]) deserves hammering
for
regarding the 259 units as a random sample. The 2,600 would have been,
but you did not get what you planned to get from about 90% of them. This
is probably not your fault, but not knowing that the response rate (10%)
would be low ... is arguable.

The appropriate sample size is the one that uses the resources you have
(money, time, equipment, etc.) wisely. That is, the information you get
for the resources would be worth more than the investment. If the
sampling and recording (measurement) were cheap (requiring minimum of
resources), then the whole population should be in the sample.

The example about the Gallup poll is not a good one; the sample size
reflects the need to have estimates with a particular (prescribed)
precision AND the survey has to be done quickly, because the information
(ratings) are a perishable good -- it is valuable only when timely
(e.g.,
for the next day's papers). In your case, the perishability may not be
an
issue, but regard the necessity to have the results as 'time being of
essence' -- time is a valuable resource that is in short supply [in
Gallup's case].

I think that your database is not exactly the population that you would
like to survey, but that is the best [the closest thing] that you have.
I
conjecture that you have a list of contacts (customers/patients), and
you
would like to know about customers/patients (current and prospective) in
general. So your bosses are also imperfect ... (but don't tell them
because that would be yet another [costly] imperfection).

Should have done: Try to contact everybody in your database, because
only
about 2000 would respond. But this is (my) dishonest hindsight. Even if
you get the 2000 responses, which may appear to be a lot, they may be a
poor representation of the 18,000, because there may be systematic
differences between respondent and non-respondents. What to do then --
look for a competent statistician. This is a non-trivial business.

----

I think the only way you can salvage anything from this is it do a BIG
study of non-responders and show that the responding group is very
similar in all
characteristics relevent to your study.

------

The problem is the low response rate, not the proportion of the
population which you have. You have only 10% of your random sample
providing information. This could cause a huge bias. For example, you
might have a survey where only people who hold strong views bother to
respond and hence conclude quite wrongly that most people have strong
views. You need to consider whether there is a possibility of such bias
in your survey. You will find this discussed in books on survey
sampling. As for your Ph.D., you should be OK if you understand this
and discuss it in your thesis.

------

A couple of relevant points, one helpful to you, one rather
less so.

Your basic view is consistent with the mathematics behind
estimation from sample surveys. If N is the population
size and n is the sample size, the variance of an estimate
of the population mean is

( 1 - n/N ) SS / n ,

where SS is the square of the population standard deviation.
This also applies to estimates of proportions (which are in
effect special cases of means).

The effect of the sample size *relative* to the population
size comes in the ( 1 - n/N ) component, which varies only
very slightly with n, unless n is a very substantial fraction
of N (which almost never happens). The real effect of n itself
comes from its position in the denominator.

To make this point yourself, you can point out to those doing
the hammering that the Gallup example (1200 out of 250m) would
give essentially the same accuracy as it would if the US
population were 2.5m or 25000m, since (1 - 1200/2500000) and
(1 - 12000/25000000000) are both effectively 1.

The negative point is that your response rate (259 out of 2600)
does mean that you are pretty reliant on those who *did*
answer being representative of the population. In practice,
it's very hard to do this in a way which would satisfy an
academic statistician. I put it that way because lots of
real-world surveys do have awful response rates, so you do
seem to be in quite good company. And your query makes it
plain that you have done all the sorts of check which are
reasonable.

On a *very* minor point, you say in your "current response"
that a random sample (by definition) is a small group [from]
a population. Strictly, this isn't right; there's nothing
in the definition to require it to be small. Also, the
"strict" statistical term is "simple random sample", when the
population is finite - but there's no need to be too fussy
about that point.


----

I guess you've probably been swamped by responses about why, what you've
got is not particularly useful.


----

a priori there is no problem with the fact that your usable sample size
1,5%. however there are two issues. will you still get significant
results? there may be selection bias and this relates to the 10%
response rate given your random sample of 2,600. i understand that you
are happy with your confidence intervals. but are they based on
asymptotic theory (using the t test) or are they exact. in the latter
case there is no problem, in the former case 259 could not be large
enough in particular if you have covariates. in case you use the t test,
do you test for normality?

so i cannot advise you on defences until i learn more about whether
there is sth objectionable. give me more info.

-----

I don't normally reply, but was moved by your sad face :-(.

What % of the population you have surveyed is pretty much irrelevant (it
is only important when the population size is small). What are important
are (a) that the sample was truly random (which you say it was), (b) the
sample size you obtain (259 is okay) and (c) the response rate. If
someone was going to criticise your study, then they should have picked
on the response rate: 10%, which is low to be honest for a social
survey, although might be more usual for what you're doing.

Given that you have a database of 18,000 cases, you have, I assume, good
'population' estimates - i.e. you have good estimates for everyone that
was eligible for your survey. You could therefore use that to generate
post-survey adjustment weights so that the distributiona of the
characteristics of your survey match those the population. By making
your sample look more like the population for the measures for which you
know the population estimates, you (hope to) make it more likely that
the other estimates derived from your survey sample are accurate. Google
rim weighting for more information about this.

----

It is not possible to answer without knowing more about why you are
sampling in the first place. What is it you want to know from the data
that you couldn't find out form the entire data set? I suppose your
random sample of 2600 is okay but when response rate is included you
only get 10% of the random sample and that sample is not representative
of the population (being younger).

---

Just a quick bit of support and only minor help, but possibly not good
news (?).

259 can be an excellent sample size, especially when fitting simple
statistical models: constant means, regression with few explanatory
variables, proportions, logit models, simple time series, etc. 259 can
lead to very tight confidence intervals on parameter estimates and
strong conclusions. No problem there.

The problem is low the response rate I'm afraid. You chose 2600
randomly, which should nicely represent the population of 18K, but only
259 responded. This is effectively a self-selected sample (the 259). It
is likely to be biased in how it responds to the questions asked, even
if not in any demographic characteristics (like Age, location, etc).
Your attackers may feel this survey is like a TV survey question (e.g.
Is Jesus relevant in modern society?), to which only highly interested
people (i.e. Catholics and atheists) are likely to bother responding to:
thus giving a highly biased sample and set of responses (but since
atheists and Catholics are of all ages, locations, etc this will appear
as a demographically accurate sample!). Can you see the similarity with
your survey? Having a very low response rate usually points to bias and
self selection issues.

Can you do anything to increase the response rate somewhat
(reward/incentive of chocolate, raffle tickets, free pass to the zoo,
something?, etc)? Such considerations, and methods/tactics to help
ensure an acceptable response rate, should have been considered at the
absolute birth of the decision to use surveys (before construction of
questions, etc) and in consultation with an experienced statistician.
Sorry to point this out at this time, but it is an oft repeated theme,
something that users of this list see every day unfortunately. Perhaps
this actually was done and you PhD supervisor was not deficient on this
issue, in which case I apologise for my speculation. Regardless, I
sincerely hope you can resolve the low response rate: increasing it is
the only action I could recommend.

The only way to check validity of your current 259 sample, in regards to
your research questions, is to somehow get the non-responders to answer
the survey and compare results L. Which is a bit silly, since you would
just combine the samples and analyse the lot together in that case.

-----

By coincidence, I read about just this question yesterday morning and
attach the cover and sample page from "Teaching Statistical Concepts".
I'm sure your inquisitors will be pleased to hear they are naïve and
confused in their understanding of statistical inference.

Moore(1990) is "The skills challenges of the nineties". JRSS A 153(3)
265-85

Your comment about non-responders should be followed up, to look for
evidence that they are "missing at random" (MAR) or might indicate a
direction of bias.

----

>"A random sample, by definition, is a small group of a population."

No, it isn't. A random sample can be large or small, and can be a large
or small proportion of the population. In the (fairly rare)
circumstances where you do sampling with replacement it can even be
larger than the population.

>"A sample’s size (in this case 15%) relative to the population is not an indication...

Good.

However, the big problem is not the sample size as such, but the
possibility of BIAS. You selected a random sample, but the people who
responded are probably *not* a random sample. With a high rate of
non-response this has to be a real concern.

-----

Firstly this is a PR problem that many, many statisticians have
struggled with.

You are right in saying that it is the sample size, rather than the
sample fraction, that is the key determinant of the accuracy of a random
sample.

One non-technical explanation is the following. The validity of an
opinion poll (etc) doesn't depend on asking everyone. It depends on
whether the people who weren't asked would have given similar answers to
those who were asked. If the question is a simple yes/no or
Labour/Conservative/Democrat choice, then 2500 randomly chosen people
are more than enough to get a reasonably accurate estimate of the
population proportion of 'yes' answers, etc.

What is however of greater concern is that the response rate was quite
low. Again it's not the fraction of people who responded that is of
concern; rather it's the possibility that the people who did respond
were different from those who didn't respond. This is actually a far
greater problem than the sample size.

----

Ken, what you say here seems generally reasonable to me. There is
nothing wrong in seeking to
recruit a random sample, nor in assessing whether responders differed
systematically from
non-responders - which is a useful check on whether non-response was
random.

But, of course, a 10% response rate of those invited is very
regrettable. There would be nothing
wrong in sampling 1.5% of a very large population, all of whom respond -
but this is different, with
much greater risk of differential non-response. Since you sent out your
email last week, a colleague
of mine has mentioned to me a follow-up study to see how much that
dentists learned on a day course
on radiation protection was retained 6 months later. In that study, they
attempted to contact 284
dentists who had done the course, only 65 (23%) responded to an
invitation to do the follow-up, even
after a reminder, and even though a randomised carrot (£100 store
vouchers) was offered. So you're
not alone in getting a low response rate. My reaction to my colleague
was, the results are unlikely
to appeal to a highly rated journal, nevertheless let's see what they
tell us. Even the fact of the
low response rate is useful information - it should be possible to get
some idea of the main reasons
for it, and to take these into account in future studies - if only by
upgrading the size of the
random sample you invite to take part. Best practice is to plan the size
of the sample in such a way
that the confidence interval on the outcome of primary interest is
narrowed down to a specified
size. If you anticipate a low response rate, the number you invite to
participate should take this
into account, e.g. if you anticipated a 50% response this would prompt
doubling the size of sample.

Arguably, you can get a slightly purer check on differential response by
comparing the 259 vs. the
2600-259, rather than vs. the 18000-259. These two comparisons will be
practically identical in
sensitivity to detect a difference.


----
The posters (in alphabetical order)

Adrian Baddeley
Eryl Bassett
Martin Bland
Ben Carter
Blaise F Egan
Richard Gerlach
Nick Longford
Robert G. Newcombe
Zoann Nugent
Kevin Pickering
Allan Reese
Karl Schlag
Paul R. Swank

----

Regards

Ken

---------------------------
Ken Masters
IT Health Education
http://www.ithealthed.com
____/\/********\/\____

> -------- Original Message --------
> Subject: Sample size/percentage
> From: Ken Masters <[log in to unmask]>
> Date: Wed, November 19, 2008 6:10 pm
> To: [log in to unmask]
>
>
> Hi All
>
> (Thanks to Claire and Ali for their clarification).
>
> I have conducted a survey from a data base.  The total size of the data
> base is roughly 18,000.  My random sample size was 2,600.  My usable
> response rate was 259.
>
> When I presented my results, I was hammered (and I mean HAMMERED) on the
> fact that my usable sample size is 1,5% of the population, and that the
> number is 259.  I'm not a statistician, so my response is probably
> ham-fisted, and all comments are welcome.
>
> To the 1.5%, my response is currently: "A random sample, by definition,
> is a small group of a population.  A sample’s size (in this case 15%)
> relative to the population is not an indication of the statistical
> validity of any arguments on the data obtained from a sample.  For
> example, Gallup polls typically have a sample size of some 1200 people
> to represent the opinions of more than 250 million Americans.    If the
> American public is take as the database, then this means than the sample
> size is 0.0005% of the data base.    What is important in the sample
> size of this nature is the number of individuals (n).   When the data
> are presented, the confidence interval at the confidence level is given,
> and gives an indication of the applicability of the statements to the
> wider population."
>
> Can anyone advise on this response?  If this is a reasonable argument,
> does anyone also have examples that are "better" (i.e. academically
> effective) than a Gallup poll?
>
> To the low number of 259, I acknowledged that this was a low number, and
> ran comparisons from my sample against the data base based on
> geographical location, rural/urban, age and gender which, research has
> shown, might affect the type of response.  Geographical location,
> rural/urban and gender were consistent, and the age was statistically
> different, but in a a way that would over-state rather than under state
> the case.  (i.e. the survey measured usage, my sample was statistically
> younger than the data base, and, in this case, other research indicates
> that usage amongst the younger population is higher).  Is this a fair
> way to check the validity and representivity if the response rate is so
> low?
>
> (I also contacted a number of non-responders to ask their reasons for
> non-response, but that's a question for another time, I think).
>
> BTW - not to put any pressure on anyone, but my PhD is riding on
> this.....:-(.
>
> All comments will be much appreciated.
>
> Regards
>
> Ken
>
> ---------------------------
> Ken Masters
> IT Health Education
> http://www.ithealthed.com
> ____/\/********\/\____

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager