Sarah,
If you are going to get useful answers, you probably need to be a bit
more explicit.
Firstly, how have you identified outliers?
Personally, unless the data is impossible (in the context of your
research design), I would be hesitant to discard information. Often the
outliers are some of the most interesting data that we have gathered,
because it goes against our expectations and allows us to discover more
about the phenomenon under study.
That being said, if you have a sample of 7 students, 6 of whom earn 13K
per year, and one of whom earns 100K per year, the final student is an
outlier, and the mean will not be an accurate reflection of average
income. However the median will not be affected by this one unusual
person.
I suppose what I am getting at is that you need to understand your data
(plot absolutely everything!) before you start removing "outliers". In
addition, I would probably use robust or nonparametric methods in order
to reduce the influence of these anomalous observations.
I hope this helps.
Best Wishes,
Richie.
On Sun, 2011-11-13 at 17:57 +0000, Sarah Azam wrote:
> Hi everyone,
>
> I am in a tricky situation with regards to data screening because I
> have grouped data and ungrouped data.
>
> For the 1st hypothesis the data set is looked at a whole ie everybody
> all patients with chronic pain in the data set thus have identified
> the outliers (15 seen)
>
> For the 2nd hypothesis the data is grouped as I'm looking at
> differences according to pain type.I have 3 pain groups classified. I
> have explored the data for outliers.(25 found).
>
> I have different univariate outliers identified by each scenario, but
> multivariate outliers were the same for both situations. However
> Textbooks say that the data needs to be either grouped or ungrouped.
> In order to deal with these I feel I would now need 2 versions of the
> data for when outliers have been dealt with so when I analyse each
> hypothesis I use the appropriately transformed data. It would not work
> having all the data outliers ammended for everyone all together.
>
> What is the correct method, what are the hard and fast rules? I dont
> see a clear solution according textbooks ( tabachink & fidell, andy
> field) rather the advise is to deal eith outliers and normality of
> data with ungrouped initial data first then with grouped differences
> and run data analyses with different versions.
>
> Also when doing score alterations for the grouped data I take it when
> you look at the extreme scores, you choose to make the outlier smaller
> or larger + 1 unit, when comparing to next extreme score ONLY FROM THE
> SAME PAIN GROUP?? For eg if a back pain case has outlier on a variable
> such as disability, for score alteration I would look at the next
> extreme disability score from ONLY back pain and not within ALL pain
> groups?? Ie:ignore disability scores from 2 remaining pain groups?
>
>
> Also, would I report the results of data screening seperately? When
> examining data for normality would this be also done seperately for
> ungrouped and grouped data along with any transformations etc?
>
> Also what is the best solution for multivariate outliers? Only have 3
> in the data set want to retain, textbook says these scores can be
> replaced
>
> Please let me have your thoughts or suggestions asap
>
> Sarah
>
> Sent from my iPhone
|