R Pruim
JMM 2016
The support for documentation creation in RStudio is great.
These slides are HTML, but I created them in RMarkdown (+ a little bit of HTML fiddling)
A single RMarkdown file can generate PDF, HTML, or Word
No need to know HTML, LateX or Word
But if you do, you can take advantage
We’ll say more about RMarkdown later
<- (or ->)product <- 5 * 3 * 27
product## [1] 405
sqrt(100)## [1] 10
log10(product)## [1] 2.607455
RStudio provides several ways to get help
? followed by name of function or data set| 
 | 
 Head coach, Green Bay Packers (NFL Football) 
  | 
| 
 | 
 Head coach, Green Bay Packers (NFL Football) 
  | 
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote?
| 
 | 
 Head coach, Green Bay Packers (NFL Football) 
  | 
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.
| 
 | 
 Head coach, Green Bay Packers (NFL Football) 
  | 
Joe from Fitchburg, WI:
Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.
Vic Ketchman (packers.com):
“Less volume, more creativity.”
Source: Ask Vic @ packers.com
You’ve got to watch that you don’t do too much. We have a philosophy on our coaching staff about less volume, more creativity. 
A lot of times you end up putting in a lot more volume, because you are teaching fundamentals and you are teaching concepts that you need to put in, but you may not necessarily use because they are building blocks for other concepts and variations that will come off of that … In the offseason you have a chance to take a step back and tailor it more specifically towards your team and towards your players."
Q. (for McCarthy) How many offensive and defensive plays might you have coming into a game on average?
A. That’s an excellent question because years ago when I first got into the NFL we had 150 passes in our game plan. I’ve put a sign on all of the coordinators’ doors - less volume, more creativity. We function with more concepts with less volume. We’re more around 50 (passes) into a game plan.
Source: http://www.jsonline.com/packerinsider/106968233.html (Nov 10, 2010)
List every R command used throughout a course
Organize by syntactic similarity and by purpose
Scratch everything you could have done without
Replace dissimilar tools with more similar tools
Aim for a set of commands that is
Result: Minimal R for Intro Stats
It is not enough to use R, it must be used efficiently and elegantly.
The mosaic package attempts to be part of one solution.
| 
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.  — Antoine de Saint-Exupery (writer, poet, pioneering aviator)  | 
 | 
require(mosaic)  
 Note: R purists might prefer
library(mosaic)but require() is easier for novices to remember.
# simpler version
goal( ~ x, data = mydata )          
# fancier version
goal( y ~ x | z , data = mydata )   
# unified version
goal( formula , data = mydata )     
xyplot())births ~ dayofyear)data=Births78)
?Births78 for documentation
xyplot( births ~ dayofyear, data=Births78) Command: bwplot()
HELPrctage, substance?HELPrct for info about databwplot( age ~ substance, data=HELPrct)Command: bwplot()
HELPrctage, substance?HELPrct for info about databwplot( substance ~ age, data=HELPrct )histogram( ~ age, data=HELPrct) Note: When there is one variable it is on the right side of the formula.
  histogram( ~age, data=HELPrct ) 
densityplot( ~age, data=HELPrct ) 
     bwplot( ~age, data=HELPrct ) 
     qqmath( ~age, data=HELPrct ) 
freqpolygon( ~age, data=HELPrct ) 
   bargraph( ~sex, data=HELPrct )xyplot(  i1 ~ age,       data=HELPrct ) 
bwplot( age ~ substance, data=HELPrct ) 
bwplot( substance ~ age, data=HELPrct ) histogram(), qqmath(), densityplot(), freqpolygon(), bargraph()
xyplot(), bwplot()Create a plot of your own choosing with one of these data sets
names(KidsFeet)    # 4th graders' feet
?KidsFeetnames(Utilities)   # utility bill data
?Utilitiesnames(NHANES)      # body shape, etc.
?NHANESgroups =group to overlay.y ~ x | z to create multipanel plots.densityplot( ~ age | sex, data=HELPrct,  
               groups=substance,  auto.key=TRUE)   My approach:
xyplot( births ~ dayofyear, data=Births78,  
  groups=dayofyear %% 7, type='l',
  auto.key=list(columns=4, lines=T, points=F),
  par.settings=list(
    superpose.line=list( lty=1 ) ))Big idea: Replace plot name with summary name
histogram( ~ age, data=HELPrct )
     mean( ~ age, data=HELPrct )## [1] 35.7
The mosaic package includes formula aware versions of mean(), sd(), var(), min(), max(), sum(), IQR(), …
Also provides favstats() to compute our favorites.
favstats( ~ age, data=HELPrct )##  min Q1 median Q3 max mean   sd   n missing
##   19 30     35 40  60 35.7 7.71 453       0
tally( ~ sex, data=HELPrct)## 
## female   male 
##    107    346
tally( ~ substance, data=HELPrct)## 
## alcohol cocaine  heroin 
##     177     152     124
Three ways to think about this. All do the same thing.
sd(   age ~ substance, data=HELPrct )
sd( ~ age | substance, data=HELPrct )
sd( ~ age, groups=substance, data=HELPrct )## alcohol cocaine  heroin 
##    7.65    6.69    7.99
tally( sex ~ substance, data=HELPrct )##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94
tally( ~ sex + substance, data=HELPrct )##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94
mean( age ~ substance | sex, data=HELPrct )##  A.F  C.F  H.F  A.M  C.M  H.M    F    M 
## 39.2 34.9 34.7 38.0 34.4 33.1 36.3 35.5
mean( age ~ substance | sex, data=HELPrct, .format="table" )##   substance sex mean
## 1         A   F 39.2
## 2         A   M   38
## 3         C   F 34.9
## 4         C   M 34.4
## 5         H   F 34.7
## 6         H   M 33.1
median(), min(), max(), sd(), var(), favstats(), etc.  mean( age ~ sex, data=HELPrct )
bwplot( age ~ sex, data=HELPrct ) 
    lm( age ~ sex, data=HELPrct )## female   male 
##   36.3   35.5
## (Intercept)     sexmale 
##      36.252      -0.784
Answer each question with both a numberical summary and a graphical summary.
Are boys feet larger than girls feet? (KidsFeet)
Do boys and girls differently shaped feet? (KidsFeet)
The mosaic package includes some other things, too
t.test())xchisq.test(), xpnorm(), xqqmath()mplot()mplot(HELPrct) interactive plot creationplot() in some situationsdotPlot(), ashplot()histogram() controls (e.g., width)xpnorm( 700, mean=500, sd=100)## 
## If X ~ N(500,100), then 
## 
##  P(X <= 700) = P(Z <= 2) = 0.9772
##  P(X >  700) = P(Z >  2) = 0.0228
## [1] 0.977
xpnorm( c(300, 700), mean=500, sd=100)# text output supressedxpbinom( 40, size = 100, prob = 0.5)## [1] 0.0284
xchisq.test(phs)## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  x
## X-squared = 20, df = 1, p-value = 8e-07
## 
##    104.00   10933.00 
## (  146.52) (10890.48)
## [12.05]  [ 0.16] 
## <-3.51>  < 0.41> 
##    
##    189.00   10845.00 
## (  146.48) (10887.52)
## [12.05]  [ 0.16] 
## < 3.51>  <-0.41> 
##    
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <residual>
Modeling is really the starting point for the mosaic design.
lm() and glm()) defined the templatemodel <- lm(width ~ length * sex, 
            data=KidsFeet)
Width <- makeFun(model)
Width( length=25, sex="B")##    1 
## 9.17
Width( length=25, sex="G")##    1 
## 8.94
xyplot( width ~ length, data=KidsFeet, 
        groups=sex, auto.key=TRUE )
plotFun( Width(length, sex="B") ~ length, 
         col=1, add=TRUE)
plotFun( Width(length, sex="G") ~ length, 
         col=2, add=TRUE)This is still experimental, but works in many simple situations.
model <- lm(width ~ length * sex, data=KidsFeet)
plotModel(model)