Less Volume, More Creativity

R Pruim

JMM 2016

A note about these slides

The support for documentation creation in RStudio is great.

Getting Oriented to RStudio

Access to RStudio Sever

Panes and Tabs

The Console – ephemeral, interactive commands

R is case sensitive

Arrows and Tab

If all else fails, try ESC

Caclulation and Assignment

product <- 5 * 3 * 27
product
## [1] 405
sqrt(100)
## [1] 10
log10(product)
## [1] 2.607455

Environment Tab

RStudio Environment Tab

History Tab

RStudio History Tab

Packages Tab

RStudio History Tab

Help!

RStudio provides several ways to get help

RStudio History Tab

Less Volume, More Creativity

Less Volume, More Creativity

Mike McCarthy

Head coach, Green Bay Packers (NFL Football)

  • Packers subscribe to “draft and develop”
  • Among the youngest teams in the league every year
  • Coaching staff constantly teaching young players

Mike McCarthy

Head coach, Green Bay Packers (NFL Football)

  • Packers subscribe to “draft and develop”
  • Among the youngest teams in the league every year
  • Coaching staff constantly teaching young players

Joe from Fitchburg, WI:

Do you have a favorite Mike McCarthy quote?

Mike McCarthy

Head coach, Green Bay Packers (NFL Football)

  • Packers subscribe to “draft and develop”
  • Among the youngest teams in the league every year
  • Coaching staff constantly teaching young players

Joe from Fitchburg, WI:

Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.

Mike McCarthy

Head coach, Green Bay Packers (NFL Football)

  • Packers subscribe to “draft and develop”
  • Among the youngest teams in the league every year
  • Coaching staff constantly teaching young players

Joe from Fitchburg, WI:

Do you have a favorite Mike McCarthy quote? Mine is “statistics are for losers”.

Vic Ketchman (packers.com):

“Less volume, more creativity.”

Source: Ask Vic @ packers.com

More Mike McCarthy Quotes

You’ve got to watch that you don’t do too much. We have a philosophy on our coaching staff about less volume, more creativity.

A lot of times you end up putting in a lot more volume, because you are teaching fundamentals and you are teaching concepts that you need to put in, but you may not necessarily use because they are building blocks for other concepts and variations that will come off of that … In the offseason you have a chance to take a step back and tailor it more specifically towards your team and towards your players."

More Mike McCarthy Quotes

Q. (for McCarthy) How many offensive and defensive plays might you have coming into a game on average?

A. That’s an excellent question because years ago when I first got into the NFL we had 150 passes in our game plan. I’ve put a sign on all of the coordinators’ doors - less volume, more creativity. We function with more concepts with less volume. We’re more around 50 (passes) into a game plan.

Source: http://www.jsonline.com/packerinsider/106968233.html (Nov 10, 2010)

The Minimal R Exercise

List every R command used throughout a course

Organize by syntactic similarity and by purpose

Scratch everything you could have done without

Replace dissimilar tools with more similar tools

Aim for a set of commands that is

Result: Minimal R for Intro Stats

Less Volume, More Creativity

It is not enough to use R, it must be used efficiently and elegantly.

The mosaic package attempts to be part of one solution.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

— Antoine de Saint-Exupery (writer, poet, pioneering aviator)

Make sure the mosaic package is loaded

require(mosaic)  


Note: R purists might prefer

library(mosaic)

but require() is easier for novices to remember.

The Most Important Template

 

goal ( yyy ~ xxx , data = mydata )

 

The Most Important Template

 

goal (  y  ~  x  , data = mydata )

The Most Important Template

 

goal (  y  ~  x  , data = mydata , …)

 

Other versions:

# simpler version
goal( ~ x, data = mydata )          
# fancier version
goal( y ~ x | z , data = mydata )   
# unified version
goal( formula , data = mydata )     

2 Questions

 

goal (  y  ~  x  , data = mydata )

 

What do you want R to do? (goal)

 

What must R know to do that?

2 Questions

 

goal (  y  ~  x  , data = mydata )

 

What do you want R to do? (goal)

What must R know to do that?

How do we make this plot?

How do we make this plot?

What is the Goal?

What does R need to know?

How do we make this plot?

What is the Goal?

What does R need to know?

How do we tell R to make this plot?

What is the Goal?

What does R need to know?

How do we make this plot?

 

goal (  y  ~  x  , data = mydata )

 

xyplot( births ~ dayofyear, data=Births78) 

Your turn: How do you make this plot?

Two Questions?

Your turn: How do you make this plot?

  1. Command: bwplot()

  2. The data: HELPrct

Your turn: How do you make this plot?

bwplot( age ~ substance, data=HELPrct)

Your turn: How about this one?

  1. Command: bwplot()

  2. The data: HELPrct

Your turn: How about this one?

bwplot( substance ~ age, data=HELPrct )

Graphical Summaries: One Variable

histogram( ~ age, data=HELPrct) 

Note: When there is one variable it is on the right side of the formula.

Graphical Summaries: Overview

One Variable

  histogram( ~age, data=HELPrct ) 
densityplot( ~age, data=HELPrct ) 
     bwplot( ~age, data=HELPrct ) 
     qqmath( ~age, data=HELPrct ) 
freqpolygon( ~age, data=HELPrct ) 
   bargraph( ~sex, data=HELPrct )

Two Variables

xyplot(  i1 ~ age,       data=HELPrct ) 
bwplot( age ~ substance, data=HELPrct ) 
bwplot( substance ~ age, data=HELPrct ) 

The Graphics Template

One variable

plotname ( ~  x  , data = mydata , …)

 

Two Variables

plotname (  y  ~  x  , data = mydata , …)

Your turn

Create a plot of your own choosing with one of these data sets

names(KidsFeet)    # 4th graders' feet
?KidsFeet
names(Utilities)   # utility bill data
?Utilities
names(NHANES)      # body shape, etc.
?NHANES

groups and panels

densityplot( ~ age | sex, data=HELPrct,  
               groups=substance,  auto.key=TRUE)   

Bells & Whistles

My approach:

Bells and Whistles

xyplot( births ~ dayofyear, data=Births78,  
  groups=dayofyear %% 7, type='l',
  auto.key=list(columns=4, lines=T, points=F),
  par.settings=list(
    superpose.line=list( lty=1 ) ))

Numerical Summaries: One Variable

Big idea: Replace plot name with summary name

histogram( ~ age, data=HELPrct )
     mean( ~ age, data=HELPrct )
## [1] 35.7

Other Summaries

The mosaic package includes formula aware versions of mean(), sd(), var(), min(), max(), sum(), IQR(), …

Also provides favstats() to compute our favorites.

favstats( ~ age, data=HELPrct )
##  min Q1 median Q3 max mean   sd   n missing
##   19 30     35 40  60 35.7 7.71 453       0

Tallying

tally( ~ sex, data=HELPrct)
## 
## female   male 
##    107    346
tally( ~ substance, data=HELPrct)
## 
## alcohol cocaine  heroin 
##     177     152     124

Numerical Summaries: Two Variables

Three ways to think about this. All do the same thing.

sd(   age ~ substance, data=HELPrct )
sd( ~ age | substance, data=HELPrct )
sd( ~ age, groups=substance, data=HELPrct )
## alcohol cocaine  heroin 
##    7.65    6.69    7.99

Numerical Summaries: Tables

tally( sex ~ substance, data=HELPrct )
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94
tally( ~ sex + substance, data=HELPrct )
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94

Numerical Summaries

mean( age ~ substance | sex, data=HELPrct )
##  A.F  C.F  H.F  A.M  C.M  H.M    F    M 
## 39.2 34.9 34.7 38.0 34.4 33.1 36.3 35.5
mean( age ~ substance | sex, data=HELPrct, .format="table" )
##   substance sex mean
## 1         A   F 39.2
## 2         A   M   38
## 3         C   F 34.9
## 4         C   M 34.4
## 5         H   F 34.7
## 6         H   M 33.1

One Template to Rule a Lot

  mean( age ~ sex, data=HELPrct )
bwplot( age ~ sex, data=HELPrct ) 
    lm( age ~ sex, data=HELPrct )
## female   male 
##   36.3   35.5
## (Intercept)     sexmale 
##      36.252      -0.784

Exercises

Answer each question with both a numberical summary and a graphical summary.

  1. Are boys feet larger than girls feet? (KidsFeet)

  2. Do boys and girls differently shaped feet? (KidsFeet)

Some Other Things in mosaic

Some other things

The mosaic package includes some other things, too

xpnorm()

xpnorm( 700, mean=500, sd=100)
## 
## If X ~ N(500,100), then 
## 
##  P(X <= 700) = P(Z <= 2) = 0.9772
##  P(X >  700) = P(Z >  2) = 0.0228

## [1] 0.977

xpnorm()

xpnorm( c(300, 700), mean=500, sd=100)

# text output supressed

Other distributions, too

xpbinom( 40, size = 100, prob = 0.5)

## [1] 0.0284

xchisq.test()

xchisq.test(phs)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  x
## X-squared = 20, df = 1, p-value = 8e-07
## 
##    104.00   10933.00 
## (  146.52) (10890.48)
## [12.05]  [ 0.16] 
## <-3.51>  < 0.41> 
##    
##    189.00   10845.00 
## (  146.48) (10887.52)
## [12.05]  [ 0.16] 
## < 3.51>  <-0.41> 
##    
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <residual>

Modeling

Modeling is really the starting point for the mosaic design.

Models as Functions

model <- lm(width ~ length * sex, 
            data=KidsFeet)
Width <- makeFun(model)
Width( length=25, sex="B")
##    1 
## 9.17
Width( length=25, sex="G")
##    1 
## 8.94

Models as Functions – Plotting

xyplot( width ~ length, data=KidsFeet, 
        groups=sex, auto.key=TRUE )
plotFun( Width(length, sex="B") ~ length, 
         col=1, add=TRUE)
plotFun( Width(length, sex="G") ~ length, 
         col=2, add=TRUE)

Models as Functions – Auto Plotting

This is still experimental, but works in many simple situations.

model <- lm(width ~ length * sex, data=KidsFeet)
plotModel(model)