R Pruim
JMM 2016
The support for documentation creation in RStudio is great.
These slides are HTML, but I created them in RMarkdown (+ a little bit of HTML fiddling)
A single RMarkdown file can generate PDF, HTML, or Word
No need to know HTML, LateX or Word
But if you do, you can take advantage
We’ll say more about RMarkdown later
<-
(or ->
)product <- 5 * 3 * 27
product
## [1] 405
sqrt(100)
## [1] 10
log10(product)
## [1] 2.607455
RStudio provides several ways to get help
?
followed by name of function or data set
Head coach, Green Bay Packers (NFL Football)
|
Head coach, Green Bay Packers (NFL Football)
|
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote?
Head coach, Green Bay Packers (NFL Football)
|
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.
Head coach, Green Bay Packers (NFL Football)
|
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.
Vic Ketchman (packers.com):
“Less volume, more creativity.”
Source: Ask Vic @ packers.com
You’ve got to watch that you don’t do too much. We have a philosophy on our coaching staff about less volume, more creativity.
A lot of times you end up putting in a lot more volume, because you are teaching fundamentals and you are teaching concepts that you need to put in, but you may not necessarily use because they are building blocks for other concepts and variations that will come off of that … In the offseason you have a chance to take a step back and tailor it more specifically towards your team and towards your players."
Q. (for McCarthy) How many offensive and defensive plays might you have coming into a game on average?
A. That’s an excellent question because years ago when I first got into the NFL we had 150 passes in our game plan. I’ve put a sign on all of the coordinators’ doors - less volume, more creativity. We function with more concepts with less volume. We’re more around 50 (passes) into a game plan.
Source: http://www.jsonline.com/packerinsider/106968233.html (Nov 10, 2010)
List every R command used throughout a course
Organize by syntactic similarity and by purpose
Scratch everything you could have done without
Replace dissimilar tools with more similar tools
Aim for a set of commands that is
Result: Minimal R for Intro Stats
It is not enough to use R, it must be used efficiently and elegantly.
The mosaic
package attempts to be part of one solution.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. — Antoine de Saint-Exupery (writer, poet, pioneering aviator) |
require(mosaic)
Note: R purists might prefer
library(mosaic)
but require()
is easier for novices to remember.
# simpler version
goal( ~ x, data = mydata )
# fancier version
goal( y ~ x | z , data = mydata )
# unified version
goal( formula , data = mydata )
xyplot()
)births ~ dayofyear
)data=Births78
)
?Births78
for documentation
xyplot( births ~ dayofyear, data=Births78)
Command: bwplot()
HELPrct
age
, substance
?HELPrct
for info about databwplot( age ~ substance, data=HELPrct)
Command: bwplot()
HELPrct
age
, substance
?HELPrct
for info about databwplot( substance ~ age, data=HELPrct )
histogram( ~ age, data=HELPrct)
Note: When there is one variable it is on the right side of the formula.
histogram( ~age, data=HELPrct )
densityplot( ~age, data=HELPrct )
bwplot( ~age, data=HELPrct )
qqmath( ~age, data=HELPrct )
freqpolygon( ~age, data=HELPrct )
bargraph( ~sex, data=HELPrct )
xyplot( i1 ~ age, data=HELPrct )
bwplot( age ~ substance, data=HELPrct )
bwplot( substance ~ age, data=HELPrct )
histogram()
, qqmath()
, densityplot()
, freqpolygon()
, bargraph()
xyplot()
, bwplot()
Create a plot of your own choosing with one of these data sets
names(KidsFeet) # 4th graders' feet
?KidsFeet
names(Utilities) # utility bill data
?Utilities
names(NHANES) # body shape, etc.
?NHANES
groups =
group to overlay.y ~ x | z
to create multipanel plots.densityplot( ~ age | sex, data=HELPrct,
groups=substance, auto.key=TRUE)
My approach:
xyplot( births ~ dayofyear, data=Births78,
groups=dayofyear %% 7, type='l',
auto.key=list(columns=4, lines=T, points=F),
par.settings=list(
superpose.line=list( lty=1 ) ))
Big idea: Replace plot name with summary name
histogram( ~ age, data=HELPrct )
mean( ~ age, data=HELPrct )
## [1] 35.7
The mosaic package includes formula aware versions of mean()
, sd()
, var()
, min()
, max()
, sum()
, IQR()
, …
Also provides favstats()
to compute our favorites.
favstats( ~ age, data=HELPrct )
## min Q1 median Q3 max mean sd n missing
## 19 30 35 40 60 35.7 7.71 453 0
tally( ~ sex, data=HELPrct)
##
## female male
## 107 346
tally( ~ substance, data=HELPrct)
##
## alcohol cocaine heroin
## 177 152 124
Three ways to think about this. All do the same thing.
sd( age ~ substance, data=HELPrct )
sd( ~ age | substance, data=HELPrct )
sd( ~ age, groups=substance, data=HELPrct )
## alcohol cocaine heroin
## 7.65 6.69 7.99
tally( sex ~ substance, data=HELPrct )
## substance
## sex alcohol cocaine heroin
## female 36 41 30
## male 141 111 94
tally( ~ sex + substance, data=HELPrct )
## substance
## sex alcohol cocaine heroin
## female 36 41 30
## male 141 111 94
mean( age ~ substance | sex, data=HELPrct )
## A.F C.F H.F A.M C.M H.M F M
## 39.2 34.9 34.7 38.0 34.4 33.1 36.3 35.5
mean( age ~ substance | sex, data=HELPrct, .format="table" )
## substance sex mean
## 1 A F 39.2
## 2 A M 38
## 3 C F 34.9
## 4 C M 34.4
## 5 H F 34.7
## 6 H M 33.1
median()
, min()
, max()
, sd()
, var()
, favstats()
, etc. mean( age ~ sex, data=HELPrct )
bwplot( age ~ sex, data=HELPrct )
lm( age ~ sex, data=HELPrct )
## female male
## 36.3 35.5
## (Intercept) sexmale
## 36.252 -0.784
Answer each question with both a numberical summary and a graphical summary.
Are boys feet larger than girls feet? (KidsFeet
)
Do boys and girls differently shaped feet? (KidsFeet
)
The mosaic
package includes some other things, too
t.test()
)xchisq.test()
, xpnorm()
, xqqmath()
mplot()
mplot(HELPrct)
interactive plot creationplot()
in some situationsdotPlot()
, ashplot()
histogram()
controls (e.g., width
)xpnorm( 700, mean=500, sd=100)
##
## If X ~ N(500,100), then
##
## P(X <= 700) = P(Z <= 2) = 0.9772
## P(X > 700) = P(Z > 2) = 0.0228
## [1] 0.977
xpnorm( c(300, 700), mean=500, sd=100)
# text output supressed
xpbinom( 40, size = 100, prob = 0.5)
## [1] 0.0284
xchisq.test(phs)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: x
## X-squared = 20, df = 1, p-value = 8e-07
##
## 104.00 10933.00
## ( 146.52) (10890.48)
## [12.05] [ 0.16]
## <-3.51> < 0.41>
##
## 189.00 10845.00
## ( 146.48) (10887.52)
## [12.05] [ 0.16]
## < 3.51> <-0.41>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <residual>
Modeling is really the starting point for the mosaic
design.
lm()
and glm()
) defined the templatemodel <- lm(width ~ length * sex,
data=KidsFeet)
Width <- makeFun(model)
Width( length=25, sex="B")
## 1
## 9.17
Width( length=25, sex="G")
## 1
## 8.94
xyplot( width ~ length, data=KidsFeet,
groups=sex, auto.key=TRUE )
plotFun( Width(length, sex="B") ~ length,
col=1, add=TRUE)
plotFun( Width(length, sex="G") ~ length,
col=2, add=TRUE)
This is still experimental, but works in many simple situations.
model <- lm(width ~ length * sex, data=KidsFeet)
plotModel(model)