R365: Day 11 – {obliclus} package

I will continue the analysis on the JohnsonJohnson dataset that I began in day 10 in the next issue, I wanted to look at a random package today to see what else is out there. After flipping through the list of R packages, I came across {obliclus}, a package used for ‘cluster based factor rotation’. I had no idea what that was so I figured that I would learn something by looking the package over. The package was developed in 2012 and the vignette for the package is poorly edited and difficult to understand in some places. The package is designed to conduct rotation techniques to identify groups on important factors. The basis of this work, based on wikipedia, seems to be factor analysis, an analysis that is often compared to principle component analysis. Factor analysis seems to be fairly common technique used in psychology and other social sciences. This analysis method takes a list of variables that are supposed to be affected by a (presumably) smaller number of factors. Loading factors seem to be constants that pertain to how an individual is scaled compared to others. Analogous to the Pearson’s r value (think r^2), the square of the loading factor is the percent of variance in an indicator variable that a loading factor explains.  ‘Rotation’ seems to refer to how the factors are loaded, with an unrotated dataset resulting in most of the variables loading into the first few factors, and often in more than one factor. This makes data interpretation difficult. Rotation seeks a more ‘simple structure’, with variables loading strongly on one factor, and more weakly on others. While factor rotation helps to reduce the number of explanatory variables, each of the output rotations are equally valid, making the final data analysis subjective.
The package is poorly documented and the examples leave much to be desired. Without additional explanation for what occurred in the examples , it is difficult to understand exactly what the function obliclus does. It seems to take a dataset and a set of explanatory variables and cluster the variables by how much the dataset fits them. I still feel that there is much to learn about this analysis style, but this package is definitely NOT the tool to use, at least not for novices.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s