Introduction to R

Open source language and environment for analysis, statistics & visualization

Tutorial goals

  • Why is it worthwhile to learn R?
  • Getting ready to work with R in the R Studio environment.
  • Learn very basic concepts of R.
  • Learn about R resources.
  • Being able to conduct exercises on your own or start your own project.
  • Together with our tutorials on GitHub, building EML with R and Pasta Rest API prepare you for catalogging your data in EDI.

Why is it worthwhile to learn R?

  • User friendly data analysis and statistics.
  • Excellent tool for visualization.
  • Used in research and data science community.
  • Free, open source scripting language.
  • CRAN: Comprehensive R Archive Network.
  • Compiles and runs on a wide variety of computer platforms: Windows, MacOS, Unix, Linux.
  • Extended support network and tools.

Examples of R visualization


Use data set “mpg”: (provided with R) Fuel economy data from 1999 and 2008 for 38 popular models of car

# list the structure of mpg
str(mpg)
Classes 'tbl_df', 'tbl' and 'data.frame':   234 obs. of  11 variables:
 $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
 $ model       : chr  "a4" "a4" "a4" "a4" ...
 $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr  "f" "f" "f" "f" ...
 $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr  "p" "p" "p" "p" ...
 $ class       : chr  "compact" "compact" "compact" "compact" ...
# print mpg
mpg
# A tibble: 234 x 11
   manufacturer      model displ  year   cyl      trans   drv   cty   hwy
          <chr>      <chr> <dbl> <int> <int>      <chr> <chr> <int> <int>
 1         audi         a4   1.8  1999     4   auto(l5)     f    18    29
 2         audi         a4   1.8  1999     4 manual(m5)     f    21    29
 3         audi         a4   2.0  2008     4 manual(m6)     f    20    31
 4         audi         a4   2.0  2008     4   auto(av)     f    21    30
 5         audi         a4   2.8  1999     6   auto(l5)     f    16    26
 6         audi         a4   2.8  1999     6 manual(m5)     f    18    26
 7         audi         a4   3.1  2008     6   auto(av)     f    18    27
 8         audi a4 quattro   1.8  1999     4 manual(m5)     4    18    26
 9         audi a4 quattro   1.8  1999     4   auto(l5)     4    16    25
10         audi a4 quattro   2.0  2008     4 manual(m6)     4    20    28
# ... with 224 more rows, and 2 more variables: fl <chr>, class <chr>
# help on mpg
?mpg
  • manufacturer
  • model: model name
  • displ: engine displacement, in litres
  • drv: f = front-wheel drive, r = rear wheel drive, 4 = 4wd
  • cty: city miles per gallon
  • hwy: highway miles per gallon
  • class: “type” of car
ggplot(data = mpg) +
geom_point(mapping = aes(x=displ,y=hwy),size=8) +
theme_bw(base_size = 40)

plot of chunk unnamed-chunk-5

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class), size = 8) +
theme_bw(base_size = 40)