JISCMail - ALLSTAT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
ALLSTAT Archives

allstat@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		ALLSTAT Home
		ALLSTAT January 2008
Options

Subscribe or Unsubscribe
Get Password
Subject:
SUMMARY: A-level statistics question
From:
Charles Taylor <[log in to unmask]>
Reply-To:
Charles Taylor <[log in to unmask]>
Date:
Tue, 22 Jan 2008 15:45:34 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (886 lines)
As promised, I have collated the responses to my email of
last week on the Statistics A-level exam question.  Further down
this list is a summary followed by all the responses (so it is rather
a long email).

Over 20 responded on allstat, and I had verbal responses from a
further 5 or so. I do not have the "model solutions", but I am sure that
less than half of the respondents gave the right answer. Perversely,
I think the "experts" who "knew" what frequency density" means were
less likely to get the answer correct than those who started from
first princples. However, there was  nearly unanimous agreement that
this was a poorly set question (though  for a variety of reasons).

I am sure the term "frequency density" did not exist when I was a
student....  We had "histograms" which either had equal width classes
(in which case, it was acceptable to use "frequency" on the y axis)
and histograms which could have unequal classes (in which case the
y axis had to be labelled "density" (always understood by reference
to the units of measurement given on the x axis) so that the total
area was one.  Incoming A-level students educated me in the use
of "frequency density" which was defined so that the total area
under the histogram was equal to the sample size  - again I
understood this area in terms of the units given on the x-axis.

My belief is that many A-level students were taught this way
(and it seems to be prevalent in text books), and so could be "thrown"
by this question. Note that there are no units of measurement
on the x axis but if times are given to the nearest minute, does
that imply that the unit of measurement is 30 seconds?  However, if I accept
this as a valid question then I am left with the following questions:

- are "frequency density" or "density" (with no units specified) acceptable
lables on a graph?
- if no units are given, is there any information in the scale, or do
we ALWAYS have to check that the histogram integrates to the right thing?

It seems to me that if this question is valid, then frequency density
is equivalent to "scaled density", as the units can be completely arbitrary.
In this case, it is merely a convenient measure for integer arithmetic.

I realize that I may have started another discussion here for which I may
get into trouble with Allstat.  I am happy to receive answers, but make no
promise to summarize again!

Charles


Contents of the the rest
------------------------

0. Original post

1. A summary of the answers

2. A summary of the reasons given

3. A summary of the comments


Appendix A:  (Almost) verbatim  responses

Appendix B:  Previous Discussion of Frequency Density on Allstat (Oct 2007)


###################################################################

0. Original post
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

To understand this email requires reading the following link

www.maths.leeds.ac.uk/~charles/S1Jan08Q3.pdf

which is a scanned page from yesterday's maths A-level paper (Statistics 1).
(allstat does not allow attachments).

I am sending this as a follow-up to the discussion on "frequency density"
which appeared a few months ago on allstat, but anyone who teaches first
year students should find it of interest.

If you have any comments, or would like to submit your answer, please send it
to me and I will collate and circulate.  However, for starters I offer the
following comment from an allstat colleague:

     "Placing trick questions in an exam in inexcusable, and the
      examiner and scrutineer (person who checked the paper) should
      be sacked."

###################################################################

1. A summary of the initial answers (freq in brackets)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

No answer given (6)
One (1)
6 (7)
12 (8)
21.8 (1)
24 (2)

###################################################################

2. A summary of the reasons given
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

(i) Some people "knew" (or looked up on the web) the definition of
frequency density as given by

frequency = frequency density * width (which gives 6)

(ii) Some found the area under the graph to be 70 and so doubled
the above answer (to get 12).  Many of these people were (I think)
unfamiliar with the term "frequency density", and so worked out the
answer from first principles.

###################################################################

3. A summary of the comments
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Nearly all respondents thought the question was a poor one - for
reasons which included:

Labelling of BOTH axes
The jagged x-axis (which some thought could mean missing histogram bars)
Terminology of "to the nearest minute" - why state this?
Why use unequal class widths for these data?
Given that frequencies are used to create a histogram, this would
be the natural way to retrieve them
A histogram which does not integrate to 1 has a "strange measure"
This style of question will put people off statistics
Perverse to use 30 seconds for a UNIT of time, rather than a minute
The term "frequency density" was unfamiliar to many

###################################################################

Appendix A:  (Almost) verbatim  responses
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 From [log in to unmask]  Thu Jan 17 14:01:54 2008


Yes, I agree that =22frequency density=22 is an unusual term - should be
frequency only (or probability density for a curve).

Using the standard formula frequency =3D frequency/width would indeed
give 6. But I would still say that one should check/work it out to see
whether, given the total N of 140, the answer makes sense.

Clearly the frequency presented is =C2=BD frequency (maybe this is why the
term frequency density - i.e. not actual frequency just a representation
of it). Silly and can=27t see the point of presenting the data this way
but one should always check.

It is a 5 mark question though, so perhaps those who answered 6 got =C2=BD
marks=21=21=21

 >>> Charles Taylor <charles=40maths.leeds.ac.uk> 17/01/2008 13:50 >>>
Unfortunately, almost any textbook will say that a =22frequency density=22

scale is such that the area in each bar is the frequency.  That is, the

frequency density =3D frequency/width.  Using this standard formula gives

the answer 6.  I have to say that most of my colleagues (all
statisticians=20
by profession) have never heard of the term =22frequency density=22 - one
even=20
called it an oxymoron.  Those that look it up in a book give the answer
6=20
(and conclude that there is missing data cut off on the left of the
plot,=20
or that the 140 runners is a mistake), those that =22work it out=22 using
the=20
information given, give the answer as 12.  But the real question is:
=22what=20
is a frequency density=22 - we are told that times were recorded to the=20
nearest minute, so surely it is frequency per minute on the scale
rather=20
than frequency per 30 seconds?  An alternative suggestion is that it
was a=20
three-legged race=21



On Thu, 17 Jan 2008, Helen Doll wrote:

 > I have recently been helping my daughter with her maths A level S1
module (taken this week).
 >
 > I actually don=27t think this is a trick question at all. The answer is
clearly 12. If the student answers 6 then they have not used all the
information they are given and they have not shown that they understand
the fundamental nature of a histogram. They are told that the total
number of individuals is 140. If the frequencies are totted up (taking
the smallest width as 1) then you get an answer of 70. Clearly then you
need to double the answer from 6 to 12. This took me a couple of minutes
to do.
 >
 > I would say this is an entirely fair question=21 Sorry=21
 >

###################################################################

 From [log in to unmask]  Thu Jan 17 13:20:22 2008

Charles,
I recall the discussion - mainly that it was voluminous and so do not recall if 
it reached a
conclusion! Part of it I do recall was about the abilitiy of the statistics 
'profession' to
communicate informatively with the general public. In part this question 
reflects its inability to
do that and thereby generates employment for statistics teachers and an argument 
as to why dealing
with that subject's conventions needs to be included in a general curriculum so 
that all will
understand!
I eventually sussed the answer to be 24 reinforced by the evidence that the area 
under the 'curve'
adds up to 70, a simple relationship to the total of 140 runners!  The weak 
communication lies
prinicipally in the labelling of the vertical axis, which I find is a 
fundamental conceptual
weakness of the output of our current education system! Frequency density is 
found in textbooks
because it is a generic concept and textbooks try to cover many situations.  For 
example, in a
different context, were the axis labelled 'Speed' this would be uninformative 
since the reader would
not know whether the scale read kilometres per second or miles per hour.  Given 
the application of
this question, the vertical axis should read something like 'Arrival rate per 
quarter-minute'. I
suggest that the level of communication is thereby so improved as to make the 
question almost a
matter of common sense i.e. that over a particular period of 12 minutes people 
finished the race at
the equivalent rate of 0.5 people per quarter-minute. The  diagram records the 
differing arrival
RATES across race 'duration', itself a better word in this application than 'time'.
The question as written merely panders to bogus obscurity.
Regards,


###################################################################

From:         Robert Newcombe <[log in to unmask]>


Pretty appalling really. Not a trick question, simply a bad one. A good exam 
question should either
demonstrate good practice or expect the candidate to critique bad practice. Here 
the candidate is
expected to draw an inference from a badly-constructed diagram. Well, I suppose 
that's a real-life
skill we all need, and which would be hard to examine in any other way - but 
surely this isn't what
an A level exam is about!

###################################################################

 From [log in to unmask]  Thu Jan 17 14:00:05 2008

if 12*0.5=6runners is a wrong answer, then i blame my teacher, thats what we 
were taught .
   height * width.

###################################################################

 From [log in to unmask]  Thu Jan 17 14:05:34 2008

Hi Charles and Graham,

I'm not meant to respond to mailing lists, newsgroups etc. from work, so I
hope you'll excuse this personal e-mail, rather than to the group (which
IIRC is not intended for discussion anyway).

I couldn't see where you got an answer 6. I reasoned thus:

In a frequency density histogram the (frequency) count represented by a bar
is the area of the bar, not it's height
[i.e. area of that bar = count of runners taking {78.5:90.5}]
Frequency density = count/range
  bar height = count/ width
  I measure the height as approximately 0.5
  width = 90.5-78.5 = 2 (minutes)
  area =  2 x 0.5 = 1
  (only) correct answer = 1

The units of frequency density cannot be other than count/x-axis-units; the
x-axis-units are clearly (albeit implicitly) minutes; the frequency density
units are count/minute - I see no need to specify them.

I agree that a histogram with bars of unequal width is unusual (and
potentially misleading) but I don't see this question as ambiguous. It would
be unfair if such unusual histograms had not been covered in the course, but
(assuming they had been) I see this as a penetrating question testing the
interpretation of histograms and their distinction from bar charts.

Or have I got hold of totally the wrong end of the stick (again!)?

Best regards,

Keith Jewell
mailto:[log in to unmask] telephone (direct) +44 (0)1386 842055
------------------------------------------------------------
From: "Graham Upton" <[log in to unmask]>
To: <[log in to unmask]>
Subject: Re: A-level statistics question
Date: 17 January 2008 11:35

I agree that this is an unfair question. It would be acceptable (I
suppose)
if frequency density was spelt out as individuals 0.5 minute range ---
but
since the times were measured to the nearest minute, this would be a
very
curious choice.

I hope that students who reply 6 will get full marks!

I was mortified to find that in my own A level books I too have written
frequency density without any units --- so I cannot be too critical!

g

###################################################################

 From [log in to unmask]  Thu Jan 17 11:59:29 2008


Hi Charles,=0D=0A=0D=0AOk was a wee bit tricky. I remembered insisting on t=
he density analogy=0D=0Aof frequency charts when ones wants to get them wit=
h varying category=0D=0Aranges when teaching (to undergraduate not at schoo=
l). The only=0D=0Aimportant thing is that pupils/students are aware that "a=
rea under the=0D=0Acurve" has to reflect frequency. And understand what it =
means!=0D=0AIn courses I taught (in France), we would never have pictured f=
requency=0D=0Abut instead a density (in order to make the full area under t=
he curve=0D=0Aequal to 1 and not 70 like here if you assume no unit on the =
Y-axis).=0D=0A=0D=0AI am curious: any other way to get "the" result (12; Ha=
l 9000 of a space=0D=0Aodyssey wouldn't have liked it) than summing over th=
e area to have the=0D=0Arelationship with the total population=3F=0D=0A=0D=0A=
Don't worry, I am sure that examiners will have to accept many different=0D=
=0Aanswers given the reasoning is (nearly) consistent! Even a disrespectful=0D=
=0Aanswer like "I cannot say, unit is missing and I am too lazy to do YOUR=0D=
=0Ajob" should do. But don't say it to the young person who brought you the=0D=
=0Aexercice!!=0D=0A=0D=0ACheers.=0D=0A=0D=0AMatthieu

###################################################################

 From [log in to unmask]  Thu Jan 17 11:16:59 2008


I think sacking might not be a bad idea, actually.

                                                     Nick Longford


###################################################################

 From [log in to unmask]  Thu Jan 17 09:49:04 2008


My answer is 12 runners!! It is a bit of a shocking question that would
trick most students.



###################################################################

 From [log in to unmask]  Thu Jan 17 10:07:01 2008


Hello Charles

At first glance I said the answer was 6 - the width of the bar from 78.5 to=
  90.5 is 12, the height is 0.5, and 12x0.5=3D6.

I felt that five marks was a little generous for such a small amount of wor=
k, and on closer inspection realised that the area of all the bars was 70.

Therefore, my answer would be '12 runners took between 78.5 and 90.5 minute=
s to complete the fun run'.

Personally I think this is a bit of a cheat. I had always thought that wher=
e frequency density is measured on the vertical axis the area of the bar is=
  equal to the frequency.

I hope this is the sort of feedback you were after.

Regards,
Paul Newell

Research Assistant - Applied Statistics
University of Plymouth

###################################################################

 From [log in to unmask]  Thu Jan 17 09:13:03 2008

Charles, it seems to me whether this is a fair question depends on what
they've been taught.  Provided they have been taught this way of
presenting frequency data, then the question is very straightforward.
(I'm assuming that the number is 0.5 times 12, or 6 runners.)  Our
software for distance sampling (http://www.ruwpa.st-and.ac.uk/distance/)
routinely plots histograms in this way, because we can then plot on the
fitted curve, to provide a visual assessment of fit.

Steve Buckland
  ### thread continues ###

 From [log in to unmask]  Thu Jan 17 10:14:08 2008


Oops - yes, I don't see how this can come to 140 ... I get 71 too.

We don't label our y-axis 'frequency density', but your textbook defn
doesn't make sense to me.  For the histogram shown, this would be 6,
which I would take to be the frequency, not the frequency density.  But
suppose you plot the frequencies - assuming this is 6 for the last bar,
and width 12, isn't frequency density 6 DIVIDED BY 12 = 0.5?  And isn't
this what they have plotted?

Steve
 > On a quick sample of colleagues, most have never heard of the term
 > "frequency density" (there was a brief discussion on allstat in
 > October), but _nearly_ all textbooks uniformly define it such that
 > the frequency ###density  Woops! ## is simply the area (that is width x height)
 > in a histogram bar.  I would have no problem with the question if
 > the word "frequency" was changed to "scaled", or if the y-axis scale
 > was multiplied by 2.  One of the respondents said that the answer was
 > 6 and noticed that the total area was 71, so they concluded that the 140
 > was a simple mistake...
 >
 > Best wishes,
 > Charles
 >
 >
 > On Thu, 17 Jan 2008, Steve Buckland wrote:
 >
 >> Charles, it seems to me whether this is a fair question depends on
 >> what they've been taught.  Provided they have been taught this way of
 >> presenting frequency data, then the question is very
 >> straightforward.  (I'm assuming that the number is 0.5 times 12, or 6
 >> runners.)  Our software for distance sampling
 >> (http://www.ruwpa.st-and.ac.uk/distance/) routinely plots histograms
 >> in this way, because we can then plot on the fitted curve, to provide
 >> a visual assessment of fit.
 >>
 >> Steve Buckland



ps perhaps we're intended to assume that 69 of the obsns have been
truncated at the left end of the distribution - this could explain the
lack of a left tail.


###################################################################

 From [log in to unmask]  Thu Jan 17 10:02:00 2008


Dear Charles,

How are you?=20

I don't have measuring implements about my person, but my guess is that
the intended answer is '12'. (Total area of the histogram blocks looks
to be 70 units, so all areas need to be doubled to arrive at
frequencies).

The sadness is that in the context of the surreal world which is the
statistics part of A-level Maths this really is not a 'trick' question
for the candidates at all. When my son took statistics at A-level, I
abandoned any attempt to help him with his work. Much of it was very odd
when it was not wrong.

Cheers, Kevin

##### thread continues ###

 From [log in to unmask]  Thu Jan 17 10:18:24 2008


I've seen worse.

I once came across an A-level question which specified a form of
probability density function, involving 2 or 3 (cannot remember the
detail) unknown constants. These constants had to be determined by
forcing continuity (!!!) onto the pdf as well as a requirement that it
integrate to one.=20

The joke was that the resulting 'pdf' went negative !!!!

Beat that !!

Best, Kevin

-----Original Message-----
From: Charles Taylor [mailto:[log in to unmask]]=20
S

Dear Kevin,

I am fine thanks.  I agree that your answer is probably what was
intended.
However, the responses I get (for the answer) are either 6 or 12.  Those
people who (think!) they know that "frequency density" means that the
area of a bar gives the frequency (as it is defined in nearly all
textbooks that use this dreadful term) have immediately responded 6 (as
I did when I first did the question).  Another person who then bothered
to compute the total area (using this definition) got 71 and concluded
that there was a simple error in that 140 should have been 71.  I have
wondered if this "fun run" was actually a three-legged race, but over an
hour sounds a bit more like agony than fun!

Best wishes,
Charles

###################################################################

 From [log in to unmask]  Thu Jan 17 10:35:08 2008


Which exam board?  Would this make a TES story?  Points to add to what I wr=
ote yesterday could be:
* dividing the time axis into irregular intervals may be a distinguishing f=
eature of histograms, but here it appears to be done only to make the quest=
ion more difficult
* histograms and other graphs are used to display patterns.  If the expecte=
d use is to look up actual values, a table should have been used.

On both counts, the example suggests that graph use is being taught and exa=
mined inappropriately. =20

Let me know if you would like to approach the TES yourself, jointly, or not=
  at all.  Any chance of getting the marking scheme from the exam board?  I =
could approach them as someone not involved in teaching.

Allan

###################################################################

 From [log in to unmask]  Thu Jan 17 10:56:25 2008


I did not see the discussion, but the question seems straightforward to
me.  Frequency density is the number of observations per unit of the
variable, the frequency density = 0.5, the interval is width 12, there
are 6 runners.

It is an awful histogram, which no sane person would draw, and which
standard software such as SPSS or Stata would not do, but I don't see a
trick.  Please enlighten me.

Martin

#### thread continues ######

 From [log in to unmask]  Thu Jan 17 14:03:27 2008


I didn't look at it carefully enough, clearly.  However, what other
definition could there be for frequency density?   So what other answer
could there be to the question?  I entirely agree that it is a
thoroughly bad piece of educational material.  I do not think we should
ever use invented data.  There is so much of the real thing around.

Martin

Charles Taylor wrote:
 > The "trick" is that if you use your definition, then there are only 71
 > observations (assuming that each bar must give an integer answer, so the
 > frequencies are 6, 7, 8, 12, 17, 10, 5, 6) and not the stated 140.
 > I not also that you did (no need to) use the information of 140.
 >
 > On Thu, 17 Jan 2008, Bland, M. wrote:
 >
 >> I did not see the discussion, but the question seems straightforward
 >> to me. Frequency density is the number of observations per unit of
 >> the variable, the frequency density = 0.5, the interval is width 12,
 >> there are 6 runners. It is an awful histogram, which no sane person
 >> would draw, and which standard software such as SPSS or Stata would
 >> not do, but I don't see a trick.  Please enlighten me.
 >>
 >> Martin
 >>


###################################################################


 From [log in to unmask]  Wed Jan 16 21:28:05 2008


24?

Good graphics are supposed to convey information with a minimum amount
of decoding and effort by the viewer.  This is not a good graphic.

It probably doesn't help that I was one of those who was mystified by
the term "frequency density".

I've been a student and practitioner of statistics for some 20 years
now, but I guess I won't pass my A-levels.

Thanks,
Scott

###################################################################

 From [log in to unmask]  Wed Jan 16 22:07:07 2008


As I remember it, on a density scale, the area under the histogram over
an interval equals the number of cases (frequency) in that interval. The
total then should be 140 but I get 71 by my estimation so the graph is
incorrectly drawn. Is this a trick question? Depends on whether one
feels a statistician should be able to recognize an incorrectly drawn
graph.

Paul

Paul R. Swank, Ph.D.
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center - Houston


###################################################################

 From [log in to unmask]  Thu Jan 17 08:42:19 2008


Dear Charles,

I bet you get a lot of responses to your allstat post. Anyway, I think the
answer is 12.

It's a horrible question: not because it's difficult, just because it's so
inane. That's the kind of question that will turn a student off statistics,
perhaps forever. I agree with your colleague that those responsible for the
question should never be allowed to do it again.

Alex

-- 

###################################################################

 From [log in to unmask]  Fri Jan 18 09:29:25 2008


Hi Charles

The width of the right hand bar is 12 and its height appears to be around 0=
.5 so presumably 12*0.5 =3D 6 is what the examiners were looking for.  It s=
eems to me unfair to use the word "calculate" when one can only estimate th=
e height of the bar.

Best Wishes

Robin

###################################################################


 From [log in to unmask]  Fri Jan 18 08:36:41 2008


Dr Taylor,

I do not subscribe to ALLSTAT, but came across the question in a
cross-posted reply.

I think the answer is ... 21.8
I was expecting an integer.

###################################################################



From: "Allan Reese (Cefas)" <[log in to unmask]>

Bits of my body as well as my mind are boggling.  I'm glad you pointed out the 
columns total to 70ish.  In fact, if you assume the column heights are integer 
or half, they total to 70 but the group 67.5-70.5 total to 16.5, so the 
frequency density scale is pairs of runners and the correct answer must be 12.

The example appears therefore to show that "frequency density" is not an 
ignorant oxymoron, but is a deliberate linguistic trap used by those who would 
lie with statistics (being economical with the truth).  I would strongly mark 
down that graph for the x-axis labelling as well, since the data are times to 
the nearest minute.  Labelling the half-minutes, especially only alternates, is 
just perverse.  The broken axis is an affectation (as are the arrowheads), as 
the histogram demonstrates the shape of the distribution, not the relative sizes 
of the x values.  The y axis should be accurately titled, the units given, and 
scaled better to avoid 30% wasted plotting space.

Placing trick questions in an exam in inexcusable, and the examiner and 
scrutineer (person who checked the paper) should be sacked.

Will you forward this example to allstat, asking for solutions, and send the 
results to the exam board concerned?  Add my comment above if you wish - I'll 
stand by it.

Many thanks
Allan

###################################################################

 From [log in to unmask]  Fri Jan 18 11:36:01 2008


Hi.
I think the answer is 12. I'm not familiar with what the syllabus would
expect from the term frequency density - but a student presumably would.

It is a bit odd to have to add all the areas up to discover that it only
comes to 70, so you need to double the area in question (but there are
clues such as some sub areas totalling 4.5 and 16.5 which could alert the
student that a factor of 2 is needed).
Also from a pedantic point of view, the question says times were taken to
the nearest minute, so the X axis scale is a bit odd.

###################################################################




 From [log in to unmask]  Sat Jan 19 14:35:05 2008


That=B9s an A level question?

Good grief, it looks to this aged lady like a gcse intermediate question, a=
t
highest
Or key stage 3.

I find the level implied by this question far more horrifying than the poor
specification of frequency density
Still, they won=B9t need to take their shoes off & use their toes as a
counting aid

Best

Diana

###################################################################

 From [log in to unmask]  Sat Jan 19 20:12:44 2008


In addition, I find the variable width of the bars to be rather
disturbing as well.  I hope people don't get the idea that this is "OK."

Jay
###################################################################
###################################################################

Appendix B:  Previous Discussion of Frequency Density on Allstat (Oct 2007)


###################################################################


From:         Sandy MacRae


A few weeks ago I posted the following enquiry to Allstat:
 > I have been told that GCSE and A-level statistics examinations require the Y
 > axis of a histogram to be labelled "Frequency density", with the appropriate
 > units mentioned..
 >
 > When I asked for examples of this usage in published histograms displaying real
 > data I was referred to textbooks for these examinations. Can anyone point me to
 > examples from professional practice in any field?

I had a few replies, of which some indicated familiarity with this labelling
for histograms. However, none yielded any reference to a published example. If
examinations demand this type of labelling, textbooks will obviously use it and
it is appropriate for histograms with variable bin size or vanishingly narrow bin
size. But does anyone else use it when reporting data? I don't need a full
reference (though it would be useful) because I am willing to search on the
basis of even a vague indication of where to look.

###################################################################


From:         "Allan Reese (Cefas)"

Sandy MacRae suggests: If examinations demand this type of labelling,
textbooks will obviously use it and it is appropriate for histograms with
variable bin size or vanishingly narrow bin size. But does anyone else use
it when reporting data? However, I had commented to him off-list that I would
regard the label "frequency density" as an oxymoron. In terms of usage, Stata
histogram command offers options: " density, fraction, frequency, and percent
specify whether you want the histogram scaled to density units, fractional
units, frequencies, or percentages. density is the default." The distinction
I would draw is that density units imply the sum of column areas is normalized
as 1, while frequency units imply the sum is the number of observations.

Who  examines the examiners?

###################################################################


From:         "R.Thomas"

Surely the label 'frequency density' is formally correct? The fact that
it is not instantly understandable, even to statisticians, is a problem
that belongs to the statistics profession as much as to teachers of statistics.

Statisticians do not take serious interest in descriptive statistics and have
not developed any widely understood vocabulary in the area. If statisticians
don't make themselves clear how can they expect others to do so. What is the
user of official statistics, for example, to make of explanations that include
phrases like "vanishingly narrow bin size"? Do they use bins in the Office for
National Statistics? What is you average sixth-former supposed to make of
"density units imply the sum of column areas is normalized as 1". For most of
the world density does not imply the use of 1 as a denominator. Why should
statisticians think differently? Is it because the statistics profession doees
not recognise the concept of denominator? And "normalised"???? Is this some
mysterious process? Wouldn't saying 'the vertical scale gives 
percentage/proportion'
be more widely understood? The width of the columns is not relevant to the
vertical scale and it would be appropriate that the horizontal scale specifies
that column widths are proportional to numbers

###################################################################


From:         "Hooper, Richard"

I don't think that "frequency density" is an oxymoron, since "density" just
means "divided by the width of the interval". The kind of density with which
statisticians are most familiar is a PROBABILITY DENSITY. A histogram can be
viewed as an estimate of the probability density function - in this case the
vertical axis should show the RELATIVE FREQUENCY DENSITY (relative frequency
is an estimate of probability). When Stata includes the "density" option in
its histogram command, this is a short-hand for relative frequency density.
FREQUENCY DENSITY is another alternative that can be plotted on the vertical
axis of a histogram. Frequency and relative frequency (not in density form)
can only be shown unambiguously on the vertical axis if all the bins have
equal width. Of course, since this is overwhelmingly the most common situation
we come across, these are what we most commonly see.

###################################################################


From:         "R.Thomas"

The problems highlighted in this discussion seems to stem from the use
of the word 'bin' in place of column. The subject matter is charts. Use
of the word bin seems to stem from the computerisation of charts. It is a
bit of computer jargon inappropriately imported into statistics As far as
charts are concerned 'bin' has no meaning that cannot be better covered by
the purely descriptive word column. It makes sense to talk and write about
column-width. Bin-width just illustrates that bin is not the right word.
Would it make sense to say 'density means divided by the width of the bin'?
Statisticians should keep in mind that they need to communicate with the
public. Yes frequency density can be plotted on the vertical axis. But 'number
of occurrences per ...' is much more intellible.

###################################################################

From:     Thomas Chu


Please correct me if I am wrong. I can remember from my GCSE days that the
widths of those columns in a histogram are known as 'class intervals'.
Nowadays, softwares call them 'bins'. I still prefer calling them 'class 
intervals'.

 From "A Concise Course in A level Statistics" by J Crawshaw & J Chambers:
1. In a histogram, rectangles are drawn so that the area of each rectangle
is proportional to the frequency.
2. When all the 'class intervals' are of equal width, the frequency can be
used for the height of each rectangle.
3. frequency density = frequency / class width

###################################################################
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options