mosaic
is installed from CRAN (or github)# install from CRAN
install.packages("mosaic")
You can use a temporary RStudio server account
Be sure to attach the mosaic
package
require(mosaic)
Macalester |
Amherst |
Calvin |
Aim for an R toolkit that is
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. — Antoine de Saint-Exupery (writer, poet, pioneering aviator) |
The mosaic
package facilitates a Less Volume, More Creativity approach.
lattice
and other core R packagesdo()
List every R command used throughout a course
Organize by syntactic similarity and by purpose
Scratch everything you could have done without
Replace dissimilar tools with more similar tools
Aim for a set of commands that is
Result: Minimal R for Intro Stats
mosaic
is installed from CRAN (or github)# install from CRAN
install.packages("mosaic")
You can use a temporary RStudio server account
Be sure to attach the mosaic
package
require(mosaic)
# simpler version
goal( ~ x, data = mydata )
# fancier version
goal( y ~ x | z , data = mydata )
# unified version
goal( formula , data = mydata )
xyplot()
)births ~ date
)data = Births78
)
?Births78
for documentation
xyplot(births ~ date, data = Births78)
Command: bwplot()
HELPrct
age
, substance
?HELPrct
for info about databwplot( age ~ substance, data = HELPrct)
xyplot( births ~ date, data = Births78)
Command: bwplot()
HELPrct
age
, substance
?HELPrct
for info about databwplot(substance ~ age, data = HELPrct)
bwplot(age ~ substance, data = HELPrct)
histogram( ~ age, data = HELPrct)
Note: When there is one variable it is on the right side of the formula.
histogram( ~age, data = HELPrct )
densityplot( ~age, data = HELPrct )
bwplot( ~age, data = HELPrct )
qqmath( ~age, data = HELPrct )
freqpolygon( ~age, data = HELPrct )
bargraph( ~sex, data = HELPrct )
xyplot( i1 ~ age, data = HELPrct )
bwplot( age ~ substance, data = HELPrct )
bwplot( substance ~ age, data = HELPrct )
histogram()
, qqmath()
, densityplot()
, freqpolygon()
, bargraph()
xyplot()
, bwplot()
groups =
group to overlay.y ~ x | z
to create multipanel plots.densityplot( ~ age | sex, data = HELPrct,
groups = substance, auto.key = TRUE)
My approach:
mplot()
xyplot(births ~ date, data = Births78, groups = wday,
type = 'l',
auto.key = list(columns = 4, lines = TRUE, points = FALSE),
par.settings = list(superpose.line = list( lty = 1 ) ))
Big idea: Replace plot name with summary name
histogram( ~ age, data = HELPrct )
mean( ~ age, data = HELPrct )
## [1] 35.65
The mosaic package includes formula aware versions of mean()
, sd()
, var()
, min()
, max()
, sum()
, IQR()
, …
Also provides favstats()
to compute our favorites.
favstats( ~ age, data = HELPrct )
## min Q1 median Q3 max mean sd n missing
## 19 30 35 40 60 35.65 7.71 453 0
tally( ~ sex, data = HELPrct)
##
## female male
## 107 346
tally( ~ substance, data = HELPrct)
##
## alcohol cocaine heroin
## 177 152 124
Three ways to create a plot with two variables. All three can be used to get corresponding numerical summaries.
bwplot(age ~ substance, data = HELPrct)
sd(age ~ substance, data = HELPrct)
histogram( ~ age | substance, data = HELPrct)
sd( ~ age | substance, data = HELPrct)
densityplot( ~ age , groups = substance, data = HELPrct)
sd( ~ age , groups = substance, data = HELPrct)
## alcohol cocaine heroin
## 7.652 6.693 7.986
tally( sex ~ substance, data = HELPrct )
## substance
## sex alcohol cocaine heroin
## female 36 41 30
## male 141 111 94
tally( sex ~ substance, data = HELPrct,
format = "prop", margins = TRUE )
## substance
## sex alcohol cocaine heroin
## female 0.2034 0.2697 0.2419
## male 0.7966 0.7303 0.7581
## Total 1.0000 1.0000 1.0000
mean( age ~ homeless | sex, data = HELPrct )
## Yes.F No.F Yes.M No.M F M
## 35.95 36.43 36.47 34.51 36.25 35.47
mean( age ~ homeless | sex, data = HELPrct, .format = "table" )
## homeless sex mean
## 1 Yes F 35.95
## 2 Yes M 36.47
## 3 No F 36.43
## 4 No M 34.51
median()
, sd()
, favstats()
, … bwplot(age ~ sex, data = HELPrct)
mean(age ~ sex, data = HELPrct)
sd(age ~ sex, data = HELPrct)
favstats(age ~ sex, data = HELPrct)
t.test(age ~ sex, data = HELPrct)
lm(age ~ sex, data = HELPrct)
The mosaic
package includes some other things, too
do()
xchisq.test()
, xpnorm()
, xqqmath()
mplot()
mplot(HELPrct)
interactive plot creationplot()
in some situationshistogram()
controls (e.g., width
)xpnorm(700, mean = 500, sd = 100)
##
## If X ~ N(500, 100), then
##
## P(X <= 700) = P(Z <= 2) = 0.9772
## P(X > 700) = P(Z > 2) = 0.02275
## [1] 0.9772
xpnorm( c(300, 700), mean = 500, sd = 100)
##
## If X ~ N(500, 100), then
##
## P(X <= 300) = P(Z <= -2) = 0.02275
## P(X <= 700) = P(Z <= 2) = 0.97725
## P(X > 300) = P(Z > -2) = 0.97725
## P(X > 700) = P(Z > 2) = 0.02275
## [1] 0.02275 0.97725
xchisq.test(phs)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: x
## X-squared = 24, df = 1, p-value = 8e-07
##
## 104.00 10933.00
## ( 146.52) (10890.48)
## [12.05] [ 0.16]
## <-3.51> < 0.41>
##
## 189.00 10845.00
## ( 146.48) (10887.52)
## [12.05] [ 0.16]
## < 3.51> <-0.41>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
Modeling is really the starting point for the mosaic
design.
lm()
and glm()
) defined the templatemodel <- lm(width ~ length * sex, data = KidsFeet)
Width <- makeFun(model)
Width(length = 25, sex = "B")
## 1
## 9.168
Width(length = 25, sex = "G")
## 1
## 8.939
xyplot( width ~ length, data = KidsFeet,
groups = sex, auto.key = TRUE )
plotFun( Width(length, sex = "B") ~ length,
col = 1, add = TRUE )
plotFun( Width(length, sex = "G") ~ length,
col = 2, add = TRUE )
plotModel(model)
do(r) * mean(~ age, data = resample(HELPrct))
do(r) * prop(~ homeless, data = resample(HELPrct))
do(r) * diffmean(age ~ sex,
data = resample(HELPrct))
do(r) * diffmean(age ~ sex,
data = resample(HELPrct, groups = sex))
do(r) * diffprop(homeless ~ sex,
data = resample(HELPrct))
do(r) * diffprop(homeless ~ sex,
data = resample(HELPrct, groups = sex))
# residual resampling
Kids.lm <- lm(length ~ width * sex, data = KidsFeet)
do(r) * lm(length ~ width * sex, data = resample(Kids.lm))
Often used on first day of class
Story
woman claims she can tell whether milk has been poured into tea or vice versa.
Question: How do we test this claim?
Use rflip()
to simulate flipping coins
rflip()
##
## Flipping 1 coin [ Prob(Heads) = 0.5 ] ...
##
## H
##
## Number of Heads: 1 [Proportion Heads: 1]
Faster if we flip multiple coins at once:
rflip(10)
##
## Flipping 10 coins [ Prob(Heads) = 0.5 ] ...
##
## H H H T T T H H H T
##
## Number of Heads: 6 [Proportion Heads: 0.6]
heads
= correct; tails
= incorrect than to compare with a given patternrflip(10)
simulates 1 guessing lady tasting 10 cups.
We can do that many times to see how guessing ladies do:
do(2) * rflip(10)
## n heads tails prop
## 1 10 6 4 0.6
## 2 10 5 5 0.5
do()
is clever about what it remembersLadies <- do(5000) * rflip(10)
head(Ladies, 1)
## n heads tails prop
## 1 10 4 6 0.4
histogram( ~ heads, data = Ladies, width = 1 )
tally( ~(heads >= 9) , data = Ladies)
##
## TRUE FALSE
## 52 4948
tally( ~(heads >= 9) , data = Ladies)
##
## TRUE FALSE
## 52 4948
tally( ~(heads >= 9) , data = Ladies, format = "prop")
##
## TRUE FALSE
## 0.0104 0.9896
prop( ~(heads >= 9), data = Ladies)
## TRUE
## 0.0104
shuffle()
or resample()
diffmean(age ~ sex, data = HELPrct)
## diffmean
## -0.7841
do(2) * diffmean(age ~ shuffle(sex), data = HELPrct)
## diffmean
## 1 0.1091
## 2 0.3171
Null <-
do(5000) * diffmean(age ~ shuffle(sex), data = HELPrct)
prop( ~(abs(diffmean) > 0.7841), data = Null )
## TRUE
## 0.3616
histogram(~ diffmean, data = Null, v = -.7841)
Bootstrap <- do(5000) * diffmean(age~sex, data= resample(HELPrct))
sd( ~diffmean, data = Bootstrap)
## [1] 0.8463
histogram( ~diffmean, data = Bootstrap, v = -.7841, glwd = 4 )
cdata(~diffmean, data = Bootstrap, p = .95)
## low hi central.p
## -2.4787 0.8895 0.9500
confint(Bootstrap, method = c("quantile","se"))
## name lower upper level method estimate margin.of.error df
## 1 diffmean -2.479 0.8895 0.95 percentile -0.7841 NA NA
## 2 diffmean -2.433 0.8931 0.95 stderr -0.7841 1.663 452
do(1) * lm(width ~ length, data = KidsFeet)
## Source: local data frame [1 x 9]
##
## Intercept length sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 2.862 0.2479 0.3963 0.411 25.82 1 37 1 1
do(3) * lm( width ~ shuffle(length), data = KidsFeet)
## Source: local data frame [3 x 9]
##
## Intercept length sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 9.136 -0.005807 0.5164 0.0002254 0.008343 1 37 1 1
## 2 9.762 -0.031122 0.5147 0.0064752 0.241144 1 37 1 2
## 3 11.639 -0.107066 0.4962 0.0766357 3.070856 1 37 1 3
do(1) * lm(width ~ length + sex, data = KidsFeet)
## Source: local data frame [1 x 10]
##
## Intercept length sexG sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 3.641 0.221 -0.2325 0.3849 0.4595 15.31 2 36 1 1
do(3) * lm( width ~ length + shuffle(sex), data = KidsFeet)
## Source: local data frame [3 x 10]
##
## Intercept length sexG sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 2.524 0.2546 0.3553 0.3569 0.5353 20.74 2 36 1 1
## 2 2.835 0.2458 0.1677 0.3922 0.4388 14.07 2 36 1 2
## 3 3.078 0.2432 -0.2032 0.3877 0.4516 14.82 2 36 1 3
Null <-
do(5000) * lm( width ~ length + shuffle(sex), data = KidsFeet)
histogram( ~ sexG, data = Null, v = -0.2325, glwd = 4)
histogram( ~ sexG, data = Null, v = -0.2325, glwd = 4)
prop(~ (sexG <= -0.2325), data = Null)
## TRUE
## 0.037
do(r) * mean(~ age, data = resample(HELPrct))
do(r) * prop(~ homeless, data = resample(HELPrct))
do(r) * diffmean(age ~ sex,
data = resample(HELPrct))
do(r) * diffmean(age ~ sex,
data = resample(HELPrct, groups = sex))
do(r) * diffprop(homeless ~ sex,
data = resample(HELPrct))
do(r) * diffprop(homeless ~ sex,
data = resample(HELPrct, groups = sex))
Kids.lm <- lm(length ~ width * sex, data = KidsFeet)
head(resample(Kids.lm), 2)
## length width sex
## 1 24.14 8.4 B
## 2 24.32 8.8 B
do(2) * lm(length ~ width * sex, data = resample(Kids.lm))
## Source: local data frame [2 x 11]
##
## Intercept width sexG width.sexG sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 11.825 1.485 -6.949 0.7384 0.8894 0.5987 17.4 3 35 1 1
## 2 4.018 2.336 11.269 -1.3057 0.8069 0.6292 19.8 3 35 1 2
For a few common transformation, resample knows how to invert. The tranformation
argument can be used to provide the transformation in other cases.
Kids.lm2 <- lm(log(length) ~ width * sex, data = KidsFeet)
head(resample(Kids.lm2), 2)
## length width sex
## 1 23.40 8.4 B
## 2 25.83 8.8 B
do(2) * lm(log(length) ~ width * sex, data = resample(Kids.lm2))
## Source: local data frame [2 x 11]
##
## Intercept width sexG width.sexG sigma r.squared F numdf dendf .row .index
## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int) (dbl)
## 1 2.763 0.04997 -0.2263 0.02484 0.03884 0.4433 9.29 3 35 1 1
## 2 2.355 0.09381 0.5952 -0.06717 0.03969 0.4643 10.11 3 35 1 2
Equipped with a modest subset of R, students can be quite creative:
goal(y ~ x, data = mydata)
xyplot()
, bwplot()
, histogram()
, etc.mean()
, sd()
, tally()
, favstats()
, etc.lm()
, glm()
t.test()
, binom.test()
, etc.do()
Additional R skills can be added as needed later.
If R is the hardest part of your course, then your R is too hard and your questions are too easy.
Tue @ 2:15 - 5:00 Breakout Sessions: Teaching with R
Technology lowering barriers: get started with R at the snap of a finger with Mine Cetinkaya-Rundel and Nicholas Horton
Notebooks with R Markdown with JJ Allaire
Lowering the barriers to inclusive, collaborative, reproducible analyses with Chester Ismay and Andrew Bray
Wed @ 2:00 - 2:45 Panel: Teaching with R: Free and Extendable with JJ Allaire, Mine Cetinkaya-Rundel, Chester Ismay, and R Pruim
Fri @ 1:30 - 2:00 BOF: Teaching with R and RStudio with R Pruim and N Horton
R-Shiny Apps to Empower a Simulation-Based Curriculum, Allison Theobold & Jim Robison-Cox, Montana State University
Dynamic Data in the Classroom, Jo Hardin, Pomona College
Writing about Simulations in a Theoretical Statistics Course, Amy Wagaman, Amherst College
Interactive Math Stat Visualizations Using R Shiny, Justin Post, North Carolina State University