Projects | My Site

Current projects

Biplot visualisations based on the generalised singular value decomposition

Sugnet Lubbe & Raeesa Ganey

The Generalised Singular Value Decomposition (GSVD), also termed the quotient SVD, simultaneously decomposes two matrices A and B with an equal number of columns into the product of three matrices each.

A = UCH

B = VSH

As with the SVD, the matrices U and V are orthonormal and C and S are diagonal. The matrix H is not orthogonal, but non-singular and the same matrix H appears in both the decompositions of A and B.

This project explores different avenues of constructing biplots associated with analyses based on the GSVD.

Optimal CVA biplots for the two-group discrimination problem

Niël le Roux & Sugnet Lubbe

CVA biplots are useful for visualising group separation and overlap associated with linear discriminant analysis (LDA). Since LDA is based on maximising the between group vs within group variance, the dimension of the canonical space depends on the rank of the between group sums-of-squares-and-cross-products matrix. Assuming more variables than groups, the rank of this matrix is the number of groups minus one. This means that the canonical space reduces to a one-dimensional line in the two-group case. A transformation can be made from the p-dimensional original space to a p-dimensional canonical space, but since all but the first eigenvalue is zero, the second, third, etc. dimensions are not ordered and not uniquely defined. In this project an optimal second dimension is found for a useful 2D CVA biplot.

Extending GPAbin to visualise missing multivariate continuous data

Johané Nienkemper-Swanepoel, Sugnet Lubbe & Niël le Roux

Multiple imputation is a well-established technique for analysing missing data. Multiple imputed data sets are obtained and analysed separately using standard complete data techniques. The estimates from the separate analyses are then combined for inference. However, the exploratory analysis options of multiple imputed data sets are limited. Biplots are regarded as generalised scatterplots which provide a simultaneous configuration of both samples and variables. Therefore, a visualisation for each of the multiple imputed data sets can be constructed and interpreted individually, but in order to formulate an unbiased conclusion, the visualisations have to be appropriately combined for a unified interpretation. The GPAbin technique has been developed to address this problem for multiple correspondence analysis biplots of multiple imputed data sets. Generalised orthogonal Procrustes analysis (GPA) is used to align the biplots before combining them in a mean coordinate matrix. The name GPAbin is derived from the amalgamation of GPA and Rubin’s rules, which are the combining steps used after multiple imputation. Simulation studies have confirmed the usefulness of the GPAbin method for categorical data. In this project the GPAbin methodology is extended to multivariate continuous data for using principal component analysis biplots.

Exploding biplots with density axes in Plotly

Carel van der Merwe & Delia Sandilands

Biplots are useful when visualizing multivariate data. It can, however, sometimes be challenging to interpret, for example when the axes and points cause overcrowding of the plot. This overcrowding is often due to the presence of many variables, highly correlated variables, or merely data sets with a large number of observations. In this paper improvements to the biplot are made to address these shortcomings. These improvements include: i) the automatic parallel translation, or "explosion", of axes, ii) the use of densities on the axes to improve interpretation and representation of large data sets, and iii) introducing interactive biplots via the use of the Plotly package in R. These improvements result in a better composition of the plot to make it seem less crowded, more easily interpretable, offer additional information that can get lost in the case of a high volume of data, and allowing the user to inspect the biplot element-wise. An accompanying Shiny web-based application was also created and is available at https://carelvdmerwe.shinyapps.io/ExplodingBiplots/.

Correspondence analysis related biplot visualisations to aid analyses of categorical data containing missing values

Johané Nienkemper-Swanepoel, Carel van der Merwe, Sugnet Lubbe & Niël le Roux

Visualising incomplete data enables the recognition of response patterns and the evaluation of the effect of the unobserved information on the interpretable information. Multiple imputation is the preferred method for imputing missing values but his poses a real challenge for visualisation of multiple completed data sets. The GPAbin procedure has been developed to combine multiple correspondence analysis (MCA) biplots. Procrustes analysis and Rubin's rules in creating a single display for nominal categorical data with missing values. Instead of imputation, subset MCA can be used to visualise the observed and missing values together of separately. In addition, the procedure also allows for investigation the underlying missing data mechanism. Interactive software with the shiny package in R is being developed for the visualisation.