Print

Print


Thanks for these references, Janice.

Murray

On 16/08/2011 12:55 p.m., Janice Scealy wrote:
> As you say there are various different approaches available for analysing compositional data. Choice of which method or transformation to use will depend on the data.  There are 3 main approaches available for continuous compositions:
>
> 1) The logratio approach. The main reference is Aitchison (1986), who discusses 3 specific transformations including the clr (centred logratio), alr (additive logratio) and mlr (multiplicative logratio). More recently Egozcue et. al. (2003) defined the ilr (isometric logratio) transformation. I've seen all 4 of these used in recent papers and the choice will depend to some extent on the purpose of the analysis. The standard approach is to assume that the logratios have a multivariate normal distribution. Another approach is to assume the logratios have a skew-normal distribution (e.g. see Mateu-Figueras and Pawlowsky-Glahn (2011)).
>
> 2)  Box-cox type transformations. See Barcel\'o et. al. (1996) and Tsagris et. al (2011) for further details.  Tsagris et. al (2011) also gives some references for the approach where you leave the variables untransformed and they also discuss something similar to your suggestion of analysing the ratios of the variables to their arithmetic mean. These are power transformations which transform the the p-dimensional compositional data on the simplex to a subset of the p-1 dimensional reals. The standard approach is to then use the p-1 dimensional normal distribution to model the transformed data. However one issue is that the normal distribution is defined on the entire real p-1 dimensional space, but the Box-Cox transformation maps the data to only a subset.
>
> 3) The square root transformation. This approach transforms the compositional data onto the surface of the hypersphere and then one can use distributions for directional data to model compositional data. See Scealy and Welsh (2011) who proposed using the Kent distribution to model the transformed data. One advantage of this approach is that it can handle zeros directly (unlike the logratio approach) and it may work better than the logratio approach for data distributed close to zero since the logratios could be highly skewed in this case due to taking logs of small values.
>
> You mention that your dataset contains large numbers of trace elements and you have detection limit problems. I'm assuming this means that you have censored data near 0. This is a common problem in geochemical samples. I recently attended a workshop on compositional data analysis and I recall some of the speakers were talking about this issue. See the detailed program papers at the following link:
> http://congress.cimne.com/codawork11/frontal/Home.asp
> Some additional references which might be useful to you are Mart\'in-Fern\'andez et. al. (2003), Palarea-Albaladejo et. al. (2007), and Hron et. al. (2010).
>
> References
> Aitchison, J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall.
> Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barcel\'o-Vidal, C. (2003).  Isometric logratio transformations for compositional data analysis. Mathematical Geology,  35:3, 279-300.
> Mateu-Figueras, G. and Pawlowsky-Glahn, V. (2011). The Skew-Normal Distribution on the Simplex. Communications in Statistics- Theory and Methods, 36: 9, 1787-1802.
> Barcel\'o, C., Pawlowsky, V., and Grunsky, E. (1996). Some aspects of transformations of compositional data and the identification of outliers. Mathematical Geology,   28:4, 501-518.
> Tsagris, M. T., Preston, S., and Wood, A. T. A. (2011). A data-based power transformation for compositional data. Compositional data analysis workshop, Sant Feliu de Guixols Girona, Spain. http://congress.cimne.com/codawork11/frontal/Home.asp.
> Scealy, J. L. and Welsh, A. H. (2011). Regression for Compositional Data by Using Distributions Defined on the Hypersphere. Journal of the Royal Statistical Society Series B,  73, 351-375.
> Mart\'in-Fern\'andez, J. A., Barcel\'o-Vidal, C. and Pawlowsky-Glahn, V. (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology, 35:3, 253--278.
> Palarea-Albaladejo J., Mart\'in-Fern\'andez, J. A. and G\'omez-Garc\'ia, J. (2007). A parametric approach for dealing with compositional rounded zeros. Mathematical Geology,  39, 625-645.
> Hron, K. Templ, M., and Filzmoser, P. (2010). Imputation of missing values for compositional data using classical and robust methods. Computational Statistics and Data Analysis,  54, 3095-3107.
>
>
>
>
> On 15/08/2011, at 9:47 PM, Murray Jorgensen wrote:
>
>> Pardon this re-post! Gilbert McKenzie asks if the data is discrete or continuous. They are continuous. The data on my mind at the moment chemical analyses of geological samples. The variables are elements in oxidized form. Some data sets have just 9 or so major constituents, others include very large numbers of trace elements where the rarer may present problems relating to the limits of detection.
>>
>> Murray
>> ==============
>> When analysing compositional data that sums across variables to a constant it is well-known that Aitchison recommends analysing the log of the ratios of the variables to their geometric mean. Others leave the variables untransformed.
>>
>> A third approach might be to analyse the logged proportions, ie the log of the ratios of the variables to their arithmetic mean. Can anyone point me to discussions in the literature about why this might be a good or a bad thing to do?
>>
>> Cheers,  Murray
>> --
>> Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
>> Department of Statistics, University of Waikato, Hamilton, New Zealand
>> Email: [log in to unmask]  [log in to unmask]        Fax 7 838 4155
>> Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350
>>
>> ----
>>
>> FOR INFORMATION ABOUT "ANZSTAT", INCLUDING UNSUBSCRIBING, PLEASE VISIT http://www.maths.uq.edu.au/anzstat/
>

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [log in to unmask]                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.