Print

Print


On 24 October 2013 08:18, Cat Davies <[log in to unmask]> wrote:

> Some data I’m working on contains unequal numbers of observations per
> participant. The data come from an open-ended writing task and we want to
> compare the number of times which participants across 4 groups use
> different types of articles (a, the, etc). The writing samples are of
> differing lengths and so contain different numbers of article use.
>
> What would be the best way of coming up with a comparable score for each
> type of article per participant and later per group? We could calculate
> percentages of say ‘the’ use from the total number of articles produced,
> but that feels unsatisfactory as the percentage score would be more
> accurate for those participants who produced longer writing samples.
>
>
Yes, you're right, it would affect your reliability.  Most people don't
realize that.



> Then I suppose this would have implications for the statistical test
> employed.
>
>
>
I might  need to understand your data better, but here are a few thoughts.

If you can estimate the reliability, you can use some form of weighted
least squares regression, where you have a variable that rates the
"importance" of different rows in the dataset. The more important rows are
weighted higher (counted more) than the less important rows.

Another possibility is to use an offset in a Poisson regression. If your
variables are counts, Poisson regression is often appropriate.   An offset
variable adds in a predictor variable, but fixes the parameter to be 1.  So
it might say "Cat hit the target 4 times, and Jeremy hit the target 5
times, but Cat had 5 shots and Jeremy had 20".  It takes into account the 5
and 20.

Third (if I've understood correctly) you could use a multilevel model.  A
regular multilevel model is used when you have (say) kids in classrooms,
and you want to know how a characteristic of the teacher is predictive of
the outcomes at the kid level. You've got different numbers of kids in each
classroom though. Same deal here, but instead of kids in classrooms, you've
got tasks in people.

(It might be that you need a combination of Poisson regression with an
offset AND a multilevel model, in which case you might consider (a) crying,
or (b) finding someone knowledgeable to help you out. )

Also, in resonse to Takao's later comment, RM ANOVA is equivalent to a
multilevel model, when you have the same measurements from everyone, but
when one person misses on e measure, RM ANOVA will throw them out.  A
multilvel model won't.

Jeremy