Dear all
I would like to thank all those who took the time to reply to my email.
Please find below a list of the replies. I particularly like the
solution proposed by Richard Chandler.
The question was:
In theory, the more fish there is on a reef, the more species are
expected to exist by chance (if fish of different species have a random
chance >of joining the community). My own hypothesis is different so I
am trying to
disprove the hypothesis that different fish species appear at random in
a fish community.
So he took 150 samples, each containing up to 12 fishes.
For example he has (out of a maximum of 7 species)
sample 1 6 fish 2 species
sample 2 4 fish 1 species
sample 3 8 fish 5 species
He needs to determine the expected number of species in each sample or
the probability of observing sample 1.
According to me (and him) he is testing the hypothesis prob(getting
species i)=1/7. So in principle in a sample of 6, he should get 6
species.
This seems intuitively wrong.
------------------------------------------------------------------------
----------------------------------------------------------------
I think you need to consider the biological reality of the community of
interest. You need a model of the species-abundance curve for this
particular community, where the relative abundance of each species on
the
reef is known. So, instead of writing "Sample 1, 6 Fish, 3 Species" you
need
to determine how many of each species there are. Simply noting how many
species in a catch doesn't help. You can then use the resulting
species-abundance curve to determine the probability of getting one fish
of
each of the seven species in any one sample.
Dan
____________________________
Mr. Daniel P. Bebber
Department of Plant Sciences
University of Oxford
South Parks Road
Oxford OX1 3RB
Tel. 01865 275000
Fax. 01865 275074
------------------------------------------------------------------------
----------------------------------------------------------------
You need to consider the relative sizes of the initial populations of
the
different species. For example if there are 100 red fish for every blue
fish, you wouldn't expect to find many blue fish in a sample of 6. So
you
can't assume necessarily that different species have a random chance of
joining the community--it depends on prior probabilities.
Sara Pearson
------------------------------------------------------------------------
---------
A couple of people have replied to you, quite correctly, pointing out
that
your null hypothesis is wrong and you need to account for the different
population sizes of each species. There is another error, I think, which
is
that you are supposed to be studying the number of species observed. You
seem to be implicitly relating the expected number of species to the
expected number of fish of each species using something like E(1/X) =
1/E(X), which is incorrect. You need to find the probability
distribution
of the number of species in a sample. I thought this was straightforward
when I started replying to you but now I'm not so sure! Will get back to
you if/when I figure it out,
Richard E Chandler
------------------------------------------------------------------------
----------------------------------------------------------------
This is a multinomial probability problem. You should be able to find
it
described in most introductory statistics texts. With three species A,
B, C
the probability of counting X of species A, Y of species B and Z of
species Z
where X+Y+Z = N and P(A) + P(B) + P(C) = 1 is
[(N!)/(X!Y!Z!)]*P(A)^X*P(B)^Y*P(C)^Z
You should be able to see how this expands to seven species without too
much
trouble. The only criteria is that you know the probability of
observing
each species. If one is more predominant than the other than the
assumption
that each species has 1/7 probability of being observed fails.
Larry MacNabb
Statistician
NWT Bureau of Statistics
Phone: (867) 920-3147
Fax: (867) 873-0275
Email: [log in to unmask]
------------------------------------------------------------------------
----------------------------------------------------------------
As a postscript to my last reply, I have now thought a bit about your
problem and got some results. The problem is this: S fish species occur
in
a large population, with relative frequencies p_{1}, ..., p_{S} say. A
random sample of size n is drawn. X_{n} is the number of species
represented in this sample. You want to find the expected value of
X_{n}.
I have to admit, I can't see a way of doing this analytically unless the
relative frequencies are all equal (and therefore equal to S-1). In this
case the problem can be regarded as an occupancy problem (see Feller "An
Introduction to Probability Theory and its Applications", third edition,
published by Wiley, 1968, pp.38- for example). It's equivalent to
putting n
balls into S boxes and then counting how many boxes have got balls in.
Using the non-obvious but standard results in Feller, you find that
P(X_{n} = k) = (S choose k) x ( (n-1) choose (k-1) )
-------------------------------------
(S+n-1) choose (S-1)
for k=1,...min(n,S). Having got this far, I don't feel so bad about not
being able to immediately see the answer earlier on!
With this expression, you can calculate probabity distributions and
expectations numerically. I attach an Excel spreadsheet which will do
the
calculations for you. Here's a table of expected numbers of species when
there are 7 species to choose from:
Sample size Expected number of fish species
1 1
2 1.75
3 2.33
4 2.80
5 3.18
6 3.50
7 3.77
8 4.00
9 4.20
10 4.38
11 4.53
12 4.67
Hope this is useful. You will appreciate that it took me a bit of time
and
is not entirely trivial (although it was an interesting problem, and
obviously entirely voluntary on my part). If you end up using these
results
in any publications relating to this, it would nice to be acknowledged
;-)
Best wishes,
Richard E. Chandler
^^^^^^^^^^^^^^^^^^^
Room 115, Dept of Statistical Science, University College London,
Gower Street, London WC1E 6BT, UK
Tel: +44 (0)171 419 3650 (from 22nd April 2000: +44 (0)207 679 3650)
Fax: +44 (0)171 383 4703 (from 22nd April 2000: +44 (0)207 383 4703)
Internet: http://www.ucl.ac.uk/Stats (department)
http://www.ucl.ac.uk/~ucakarc (personal)
email: [log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|