Dear Jane and other PhoNet people,
I agree with the responses so far: if you are looking at intonation or
speech F0, a logarithmic scale (e.g. semitones) is more likely to enable
you to compare meaningfully across speakers than a linear scale (i.e.
Hertz). However, not everyone would agree, and anyway there are issues of
cross-speaker normalisation that are not captured by using a log scale.
With regard to lack of general agreement: A number of Dutch researchers
use the ERB scale mentioned by David Deterding for F0/intonation, on the
basis of findings (Hermes and van Gestel 1991 in JASA) that it gives a
closer approximation to the perception of pitch movements. However, this
has not been widely adopted outside the Netherlands. H & vG's results
were based primarily on the perceptual equivalence of pitch excursions in
different parts of a speaker's range. Arguably this is a less natural
and/or "ecologically valid" task than comparing pitch excursions between
one speaker and another, or comparing pitch excursions in the same part of
the speaker's range.
My own findings and those of various students (unfortunately largely
unpublished) suggest that a log scale more successfully normalises away
from differences between speakers. In particular, average pitch ranges
are more similar between adult males and adult females when expressed in
semitones than when expressed in ERB units, and more similar in ERB than
in Hz. On the face of it, this suggests that a semitone scale is most
appropriate for normalising. (And as already noted in the discussion, we
KNOW that in music, in all known musical cultures, the log scale is the
correct choice.) However, even this needs a closer look. Different
individuals, even ones with very similar pitch LEVELS, can have markedly
different pitch RANGES. Someone with an animated voice may use a pitch
excursion of 6 st. to express the same amount of emphasis (or whatever it
is) that someone with a monotonous voice expresses with an excursion of 2
st. Do we want to normalise this difference away? If so, why? If not,
why not? These are complex issues relating to the meaning of intonation,
the connection between "paralinguistic" communication and linguistically
structured sound patterns, etc.
One thing that also emerges clearly from my own research and that of my
students - again, still largely unpublished - is that the PROPORTIONS
within a given speaker's pitch range are remarkably constant. This can be
seen most clearly in a tone language, but analogous facts are true of
non-tonal languages as well. Imagine a language with H M and L tones.
For a speaker with a wide range, H and L may be 9 st apart, while for a
speaker with a narrow range, H and L may be 4 st apart, but M will be the
same proportional distance for both. That is, suppose M is 3 st above L
in the speaker with the wide range - i.e. one third of the speaker's
overall range. We can confidently predict that in the speaker with the
narrow range M will also be scaled at one third of the the overall range,
i.e. 1.33 st above L. (I really do intend to publish this someday....)
What this means, clearly, is that differences between speakers are NOT all
normalised away by the use of a log scale.
Finally, since the most serious issues of normalisation arise in comparing
adult males and females, it may be that Jane can avoid these issues to
some extent if she's working with children. But I personally would still
use a log scale, even on child data.
Bob Ladd
|