Research groups > GR-EADC > Presentation
Go to content (click on Intro)
UdG Home UdG Home
Close
Menu

Research Group in Statistics and Compositional Data Analysis

Presentation

Compositional data (CoDa) have been historically described as random vectors with strictly positive components, the sum of which is a constant (for example, one, 100, or a million). More recently, the term CoDa covers all vectors that represent the parts of a whole, so that the information contained in the vector is relative, thereby including not only parts for unit or percentage, but also compositions in their original units (for example kg, euros, or minutes).

These types of data appear in many applications and the interest and importance in processing these data with consistent statistical methods cannot be underestimated. Although concern for problems relating to these types of data continued unabated through researchers in the field of the geo-sciences, awareness of the need for coherent methods has been growing in other fields, such as medical and environmental sciences. Typical examples of these various fields are: economics (income / expenses distribution), medicine (body composition: fat, bone, muscle), data from surveys (ipsative preference data), food industry (food composition: grease, sugar, etc.,) chemistry (chemical composition), ecology (abundance of various species), palaeontology (Foraminifera taxa), agriculture (ionomic nutrient balance), sociology (use-of-time surveys), environmental sciences (soil, water and air pollution), microbiome (OTU composition), health sciences (daily time in various physical activities) and genetics (genotype frequency).

Research in CoDa is having today an extensive impact on these fields. However, it has taken a long time to find a solution to the problem of how to carry out a suitable statistical analysis of this type of data, that is, to solve the problem of spurious correlation , as Karl Pearson called it in 1897, or the problem of data closure, as Felix Chayes called it in the 1960s. In short, these authors noted that the standard statistical techniques lost their applicability and classical interpretation when applied to the CoDa, so new techniques had to be developed. No theoretically sound solution had been proposed until 1980. It was John Aitchison who put forward a consistent theory based on log-quotients. Subsequent developments have shown that the mathematical foundation of a statistical analysis suitable for these types of data are based on the definition of a specific geometry in the simplex (the CoDa’s sample space). Based on this result, it is possible for any multi-variant statistical analysis to be rigorously developed as they are, among other things, cluster analysis, discriminating analysis, factor analysis and the linear regression models.

This line of research is presently being developed by members of the CoDa-Research Group . The core of the group belongs to the University of Girona (UdG) and includes members of the Polytechnic University of Catalonia (UPC ) and the Biomathematics & Statistics Scotland (BioSS ). Disseminations and transfers of the results of the research include the following activities: the international CoDaCourse, the CoDaPack statistical package, the biennial CoDaWork workshop and the CoDaWeb website. Visit it for further information!

All researchers who are working in real case studies as well as in the mathematical foundations of the CoDa are welcome. Join us!

Choose which types of cookies you accept which the University of Girona can store in your browser.

Those that are essential for enabling your connection. There is no option for disabling them, as they are necessary for the functioning of the website.

These enable your options to be remembered (for example language or region you are accessing from), to provide you with advanced services.

They provide statistical information and enable improved services. We use Google Analytics cookies which you can deactivate by installing this plugin.

To offer advertising contents relating to the interests of users, either directly, or through third parties (“adservers”). These must be activated if you wish to see the YouTube videos uploaded to the University of Girona’s website.