Blake Nicholson, Seelio
March 12, 2015
\[ \text{PC}_1 = \alpha_{11}x_1 + \alpha_{12}x_2 + \cdot\cdot\cdot + \alpha_{1n}x_n \] \[\vdots\] \[ \text{PC}_k = \alpha_{k1}x_1 + \alpha_{k2}x_2 + \cdot\cdot\cdot + \alpha_{kn}x_n, \]
where \(k << n\)
prcomp
(or princomp
)library(ggplot2)
qplot(data=cars, x=speed, y=dist)
pca <- prcomp(cars)
print(pca)
## Standard deviations:
## [1] 26.12524 3.08084
##
## Rotation:
## PC1 PC2
## speed -0.1656479 -0.9861850
## dist -0.9861850 0.1656479
print(summary(pca))
## Importance of components:
## PC1 PC2
## Standard deviation 26.1252 3.08084
## Proportion of Variance 0.9863 0.01372
## Cumulative Proportion 0.9863 1.00000
Principal component 1 accounts for nearly all of the variance due to the fact that the variances of the two columns are quite different. In general, one should scale their data to have unit variance by passing “scale.=TRUE” to prcomp.
plot(pca)
Normally, you would be considering more than two components and the scree plot would look more like:
ev.slopes <- pca$rotation[2, ]/pca$rotation[1, ]
cars.centered <- transform(cars, speed=speed-mean(speed),
dist=dist-mean(dist))
qplot(data=cars.centered, x=speed, y=dist) +
geom_abline(intercept=0, slope=ev.slopes[1])
scale.=TRUE
to prcomp
or cor=TRUE
to princomp
GPArotation
princomp$scores
or prcomp$x
Check out the psych
package
Provides:
principal()
fa.parallel()
fa()
(Recall that in PCA the assumption was that principal components were a linear combination of observed variables)
factanal
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
sapply(1:3, function(f)
factanal(mtcars, factors=f, method="mle")$PVAL)
## objective objective objective
## 1.496220e-17 4.047510e-04 2.051923e-01
Find the first number of factors where the p-value is not significant. The results above suggest that we should use 3 factors.
factanal
factanal(mtcars, factors=3, method="mle")
##
## Call:
## factanal(x = mtcars, factors = 3, method = "mle")
##
## Uniquenesses:
## mpg cyl disp hp drat wt qsec vs am gear carb
## 0.135 0.055 0.090 0.127 0.290 0.060 0.051 0.223 0.208 0.125 0.158
##
## Loadings:
## Factor1 Factor2 Factor3
## mpg 0.643 -0.478 -0.473
## cyl -0.618 0.703 0.261
## disp -0.719 0.537 0.323
## hp -0.291 0.725 0.513
## drat 0.804 -0.241
## wt -0.778 0.248 0.524
## qsec -0.177 -0.946 -0.151
## vs 0.295 -0.805 -0.204
## am 0.880
## gear 0.908 0.224
## carb 0.114 0.559 0.719
##
## Factor1 Factor2 Factor3
## SS loadings 4.380 3.520 1.578
## Proportion Var 0.398 0.320 0.143
## Cumulative Var 0.398 0.718 0.862
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 30.53 on 25 degrees of freedom.
## The p-value is 0.205