Folks,
The term "representative" is commonly used, especially in
specifications for sampling industrial processes (particularly
in pollution control) and occurs, for instance, in many
British Standards for sampling procedures.
Nevertheless, it lacks a general and precise definition.
At best, a sample of a particular type of process will be deemed
"representative" provided it was obtained by a procedure carried
out according to a detailed and precise specification.
This still leaves hanging in the air what "representative" means
as a term, since in such a case it simply means "the sampling
was done according to these specific rules". Often, however, the
rules are not sufficiently definite to obviate interpretation of
"representative", quite often only going as far as saying that
"the sample shall be representative" as if it were generally
understood what this meant.
Engineers and process technologists have a notion of "sample"
which has little to do with a statistician's notion, being
essentially "a part extracted from the whole"; and their notion
of "representative" basically means that the technical procedures
used to extract the part should be such that anything which happens
in the whole should give rise to a corresponding happening in
the part.
For a statistician, accustomed to the idea that taking a part
from a whole means that you risk not observing things present
in the whole, and therefore you have to quantify this risk,
such notions are elusive. Nevertheless they are ingrained in
regulations. So what is a statistician to do? I have yet to
see a definition of "representative" in the statistical world
which really gets hold of the question in a satisfying way.
But -- if for instance a statistician were an expert witness
on whether a sampling procedure was satisfactory -- then the
statistician would be required to take a definite statistical
view of the term "representative"
So I am putting the question to the statistical community: What
do you think "representative sample" means? Do you know any
clean and sensible general definition of it?
A current mailing-list discussion of the question recently
elicited the statement:
"In the survey sampling literature the term "representative"
has a different meaning than the comments I have read indicate.
A sampling procedure is "representative" if (and only if) each
unit in the population has a fixed non-zero probability of
selection."
Well, that's definite enough, but falls a good bit short of
encapsulating the properties envisaged in process-sampling
regulations, for instance. And I'd be interested to learn of
references to the survey sampling literature which discuss
"representative" in terms of such a definition.
In due course I will summarise to the list. Meanwhile, thanks in
advance for your input.
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <[log in to unmask]>
Fax-to-email: +44 (0)870 167 1972
Date: 16-Oct-02 Time: 11:39:02
------------------------------ XFMail ------------------------------
|