Hi All,
I have a question with regards to an outlier filtering algorithm for
extremely skewed data. I am currently using the following filter limits
1) ( Q1-1.5*IQR, Q3+1.5*IQR )
where Q1 = 25th percentile, Q2=50th percentile=median, Q3=75th percentile.
As I understand this is the standard formula on Box and Whisker plots to
flag outliers. It works fine as long as my data is not too skewed. I have
data with lots of zeroes and long right tails. If Q1=Q2=Q3=0 I have a
problem!
I have tried the approach of applying a power transformation to the data
such as log(1+x) and computing the outlier filter in the transformed space
and then back transforming. This helps, but I still have situations in the
transformed space where Q1=Q2=Q3=0!
Are there other transformations I can use or alternative methods?
I know there is no substitute for plotting the data and visually sanity
checking the data but it is not possible in the case.
Thanks in advance
Regards,
Richard
|