Research groups > GR-EADC > Lines of research
Go to content (click on Intro)
UdG Home UdG Home
UdG 30 years
Closing
Menu
Machine translation, text awaiting revision

Research Group in Compositional Data Statistics and Analysis (GR-EADC)

Lines of research

Statistical analysis of compositional data

Nowadays there are several lines of research that complement and generalise compositional methods and offer new problems that require supplementary research efforts. We highlight the following among them:

Among others, these methods can be applied in omic sciences;

These methods pay so much attention to composition that this optimises some variable answers such as the optimum design of experiments (DOE) for mixtures. Among other fields, this approach can be applied in the food industry, the economy, medicine, time uses in physical activities and sustainable development.

The foundations of these methods require the study of mathematical properties for an optimisation limited to the simplex in terms of Aitchison’s geometry.

Zeros pose difficulties when applying log-ratios techniques. This "issue with zeros" is an active line of research being developed through several projects headed by this team, with high-impact publications.

However, two important problems are still to be resolved: firstly, the development of a methodology for the treatment of absolute zeros, where the techniques for the allocation of missing data are not suitable, a usual stage when optimising the information as previously described; and, secondly, broadening the methodology for the treatment of the number of zeros in data of large dimensions.

The origin of the field of research into compositional data (CoDa) dates back to the end of the 19th century, when K. Pearson warned of the dangers of using the linear correlation (by Pearson) for the analysis of the relation between indexes that share the same information as the denominator: thus was born the concept of spurious correlation . Much effort went into solving this difficulty through until the 1980s. It was not until the eighties that J. Aitchison , based on his knowledge of the lognormal distribution of probability, presented a methodology for the statistical analysis of CoDa. One of Aitchison’s most important legacies is the introduction of an initial definition of the property of invariance of statistical analysis through a change of scale, totally linked to spurious correlation. Moreover, based on the usual use of the ternary diagram (or de Finetti’s diagram) in biology and geology for the representation of three proportions, Aitchison defines the simplex as a demonstrative space of CoDa: random vectors of positive components with a constant sum (one for proportions, 100 for percentages, a million per ppm). This space is limited with its own internal operation: the perturbation. This operation is related with the idea about using the quotient instead of the rest for the comparison of proportions. Moreover, Aitchison introduces the concept of sub-composition (subset of parts of a composition) to define a relation between spaces and subspaces in terms of geometric projection. This definition is crucial for the property of sub-compositional coherence in statistical analyses. In essence, this property requires that in the analysis of a sub-composition, results which contradict the analysis cannot be obtained from a composition that contains it. The non-compliance of this property on the part of linear correlation explains the spurious correlation found by K. Pearson. The formation of sub-compositions, as a geometric projection, is also related with metric aspects of the analysis of CoDa. In this sense, the distance between two samples that use a sub-composition has to be less or the same as when the original composition is used. The metric aspects have a very relevant role in non-parametric statistical techniques, like some of the techniques for the analysis of clusters. The relative nature of the information contained in CoDa suggests that, when we compare two compositions (distances between samples) and when we analyse the link between the parts of a composition (correlation between parts), quotients have to be used, between rows or columns of the data matrix, respectively. These proportions take positive values from the real space, or the interval (0, 1) or, through the inverse relation, in the interval (1, infinite +). However, when logarithms are taken, log-quotients are defined in all real space, where here there is a symmetrical relation among the intervals (-infinite, 0) and (0, infinite +). Using this focus, based on the log-quotients for the parts of the composition, basic variables can be constructed for any statistical analysis.

At the beginning of the 21st century, and particularly as a result of the projects led by members of the CoDa-Research Group, mathematical and statistical foundations for compositional analysis were consolidated. A series of contributions establishes these foundations according to the geometry of the simplex and describe its Euclidean structure through the operations introduced by Aitchison in the eighties. The metric structure of Aitchison’s space leads us to create log-quotient coordinates about an orthonormal basis for the representation of compositional vectors. Following the principle of “working in coordinates”, the compositional analysis uses classical methods on these coordinates. The need to find an orthonormal basis has motivated the introduction of three algorithms to construct it automatically. Two of these algorithms can be applied to high-dimension compositions.

A crucial result for the consolidation (previously described) of compositional methods is the definition of compositional space as classes of equivalence for vectors in positive real space. It is worth noting that with this step the definition of the analysis of compositional of data is introduced as a methodology for researchers interested in the study of relative information from the parts without the constant limitation from the sum of the parts. This proposal broadens the use of compositional methods and opens a door for their general use in other fields. Moreover, for more complete information on the data, the compositional analysis has to be complemented with the information in relation to the "compositional total". This focus is applied for developing a methodology designed to solve the problem of "restoring original units". Notice that this difficulty arises when the results obtained in compositional coordinates transform again in terms of the original units of the parts of the composition.

Choose which types of cookies you accept which the University of Girona can store in your browser.

Those that are essential for enabling your connection. There is no option for disabling them, as they are necessary for the functioning of the website.

These enable your options to be remembered (for example language or region you are accessing from), to provide you with advanced services.

They provide statistical information and enable improved services. We use Google Analytics cookies which you can deactivate by installing this plugin.

To offer advertising contents relating to the interests of users, either directly, or through third parties (“adservers”). These must be activated if you wish to see the YouTube videos uploaded to the University of Girona’s website.