Some data I’m working on contains unequal numbers of observations per participant. The data come from an open-ended writing task and we want to compare the number of times which participants across 4 groups use different types of articles (a, the, etc). The writing samples are of differing lengths and so contain different numbers of article use.
What would be the best way of coming up with a comparable score for each type of article per participant and later per group? We could calculate percentages of say ‘the’ use from the total number of articles produced, but that feels unsatisfactory as the percentage score would be more accurate for those participants who produced longer writing samples.
Then I suppose this would have implications for the statistical test employed.
Thanks for any suggestions.
|