On Aug 20, 2010, at 12:52 AM, J. B. Lethbridge wrote:
> If I divide the number of distinct words (=lexicon) in FQ, Harington's
> Ariosto and Fairfax's Tasso, by the respective number of lines, I get
> lexical densities of FQ = 0.491; H = 0.419; F = 0.636.
>
> This seems very low. Have I done something wrong David (W-O)?
I think it's just that these are longish texts, and as I mentioned in
an earlier post, the longer the text, the lower this particular
ratio. If you take any of these texts, chop it in half, and calculate
the density for each piece, I think you'll find that each half is much
"denser" than the whole.
Which makes me less comfortable with this whole approach to measuring
density the more I think about it. If I were any good at statistics I
might know how to normalize these numbers so texts of different length
could be compared, but as it is I think comparing texts (or chunks
thereof) having equal length is the only way to go.
> Lineation etc:
> Fairfax 15496 lines
> Harington 33288
> FQ 34984
Lines of differing length will naturally skew things in different
directions. Which might lead to interesting questions. For example,
how does Spenser use the extra foot in every ninth line? Does he
stretch these lines out with "filler" words so they contribute less to
density than one might expect, or does he use them to pack in more
unique words? Or maybe both depending on whatever local interests are
in play in the stanza or stanza transition?
> Harington 13940 Lexicon
> Fairfax 9848
> FQ 17191
A lemmatized FQ has about 12,000 lemmata.
> By any count I make Fairfax has a much larger lexicon.
Perhaps not -- more likely just a significantly shorter text.
________________________________________
Craig A. Berry
mailto:[log in to unmask]
"... getting out of a sonnet is much more
difficult than getting in."
Brad Leithauser
|