Structure

  • A little bit about me

  • Reasons for joining R community

  • Some recent work using Tidyverse methods

  • Future plans

A bit about me

  • Research Associate at Heriot-Watt University

  • Background in Psychology

  • Currently working on an EPSRC project called SoCoRo (Socially Competent Robots)

  • Developing a socially competent robot to modify social signal processing

  • Feel free to visit our Twitter: [@socoro2017](http://https://twitter.com/socoro2017)

Reasons for joining R community

  • Psychologist by trade

  • Interested in data modelling but not satisfied with SPSS

  • Wanted a fully integrated system

  • raw data -> data processing -> analysis -> report writing

  • Clear and colourful graphs

  • Improved workflow: tidy, wrangle, model

Recent work using tidyverse methods

Parts of R I'll discuss today

  • Introduce main verbs of Tidyverse

  • Tidying data

  • Dealing with dates

  • Dealing with the autism-spectrum quotient (AQ)

  • Summarising data

  • Plotting data

  • Analysis: Binary logistic regression (brief)

Main verbs of tidyverse

  • filter: extract rows
  • select: extract columns
  • gather/spread: gather columns into rows/spread rows into columns
  • mutate: compute and append new/existing columns
  • summarise: summarise data based on stated criteria
  • ntile : ranking and splitting vectors

Tidying data using Tidyverse methods pt1

# A tibble: 6 x 7
     ID item    expression resp    date       resp_acc  resp_time
  <int> <chr>        <int> <chr>   <chr>      <chr>         <dbl>
1     1 Toast            1 Like    05/09/2017 Correct        6.02
2     1 Milk             2 Like    05/09/2017 Correct        6.45
3     1 Beans            3 Dislike 05/09/2017 Correct        5.48
4     1 Oatmeal          4 Dislike 05/09/2017 Correct        4.21
5     2 Beans            1 Dislike 05/09/2017 Incorrect     11.5 
6     2 Toast            2 Like    05/09/2017 Correct        3.58

Tidying data using Tidyverse methods pt 2

resp_data %>%
  filter(ID != "pilot", ID != "3") %>% 
  mutate(pad = ifelse(expression %in% c("1", "2"),
                      "Approval", "Disapproval")) %>% 
  mutate(ID = as.character(ID)) %>% 
  mutate(expression = as.character(expression)) %>%
  select(ID:resp, pad, resp_acc, resp_time) %>%
  arrange(ID)

# remove pilot data, outliers
# more accuracte categories better for graphs
# vectors with numeric values may be categorical; change data type

Tidying data using Tidyverse methods pt 3

## # A tibble: 6 x 7
##   ID    item    expression resp    pad         resp_acc resp_time
##   <chr> <chr>   <chr>      <chr>   <chr>       <chr>        <dbl>
## 1 1     Toast   1          Like    Approval    Correct       6.02
## 2 1     Milk    2          Like    Approval    Correct       6.45
## 3 1     Beans   3          Dislike Disapproval Correct       5.48
## 4 1     Oatmeal 4          Dislike Disapproval Correct       4.21
## 5 10    Oatmeal 2          Like    Approval    Correct      19.0 
## 6 10    Beans   3          Dislike Disapproval Correct       6.57

Dealing with dates pt 1

## # A tibble: 6 x 2
##   dob        expdate   
##   <chr>      <chr>     
## 1 12/04/1980 05/09/2017
## 2 19/09/1971 05/09/2017
## 3 05/11/1954 05/09/2017
## 4 31/05/1989 05/09/2017
## 5 11/10/1987 05/09/2017
## 6 29/07/1988 07/09/2017

Dealing with dates pt 2

q_data %>%
  select(dob, expdate)   %>%
  mutate(date = as.Date(expdate, "%d/%m/%Y")) %>%
  mutate(birth = as.Date(dob, "%d/%m/%Y")) %>%
  mutate(age = as.numeric(date - birth)/365.2422) %>%
  head
## # A tibble: 6 x 5
##   dob        expdate    date       birth        age
##   <chr>      <chr>      <date>     <date>     <dbl>
## 1 12/04/1980 05/09/2017 2017-09-05 1980-04-12  37.4
## 2 19/09/1971 05/09/2017 2017-09-05 1971-09-19  46.0
## 3 05/11/1954 05/09/2017 2017-09-05 1954-11-05  62.8
## 4 31/05/1989 05/09/2017 2017-09-05 1989-05-31  28.3
## 5 11/10/1987 05/09/2017 2017-09-05 1987-10-11  29.9
## 6 29/07/1988 07/09/2017 2017-09-07 1988-07-29  29.1

Dealing with the AQ: Using gather and ntile pt 1

## # A tibble: 6 x 4
##   q1                q2                  q3                q4              
##   <chr>             <chr>               <chr>             <chr>           
## 1 slightly disagree slightly agree      slightly agree    slightly agree  
## 2 slightly agree    slightly disagree   definitely agree  slightly agree  
## 3 slightly agree    slightly disagree   slightly agree    slightly disagr~
## 4 slightly disagree definitely disagree definitely agree  slightly disagr~
## 5 slightly agree    slightly disagree   definitely agree  slightly disagr~
## 6 slightly agree    slightly disagree   slightly disagree slightly disagr~

Dealing with the AQ: Using gather and ntile pt 2

q_data %>% 
  filter(ID != "pilot", ID != "3") %>%
  select(ID, q1:q50) %>%                                                                         
  gather(AQ_item, AQ_resp, q1:q50) %>% 
  mutate(AQ_num = substring(AQ_item,2)) %>%                                                 
  mutate('AQ_score' = 
           ifelse(AQ_num %in%
                    c(2,4:7,9,12:13,16,18:23,26,33,35,39,41:43,45:46) 
                  & AQ_resp %in% 
                    c("definitely agree","slightly agree"), 1,
                  ifelse(AQ_num %in% 
                           c(1,3,8,10:11,14:15,17,24:25,27:32,34,36:38,40,44,47:50)
                         & AQ_resp %in% 
                           c("slightly disagree","definitely disagree"), 1, 0)))

Dealing with the AQ: Using gather and ntile pt 3

## # A tibble: 6 x 5
##   ID    AQ_item AQ_resp           AQ_num AQ_score
##   <chr> <chr>   <chr>             <chr>     <dbl>
## 1 1     q1      slightly agree    1          0   
## 2 2     q1      slightly disagree 1          1.00
## 3 4     q1      slightly agree    1          0   
## 4 5     q1      slightly agree    1          0   
## 5 6     q1      slightly disagree 1          1.00
## 6 7     q1      slightly agree    1          0

Dealing with the AQ: Using gather and ntile pt 4

aq %>%
  group_by(ID) %>%
  mutate(AQ_tot = sum(AQ_score)) %>%
  ungroup() %>%
  select(ID, AQ_num, AQ_resp, AQ_score, AQ_tot) %>%
  head

Dealing with the AQ: Using gather and ntile pt 5

## # A tibble: 6 x 5
##   ID    AQ_num AQ_resp           AQ_score AQ_tot
##   <chr> <chr>  <chr>                <dbl>  <dbl>
## 1 pilot 1      slightly disagree     1.00  28.0 
## 2 pilot 1      slightly agree        0     28.0 
## 3 1     1      slightly agree        0     16.0 
## 4 2     1      slightly disagree     1.00  17.0 
## 5 3     1      slightly agree        0     10.0 
## 6 4     1      slightly agree        0      8.00

Dealing with the AQ: Using gather and ntile pt 6

aq %>% 
  group_by(ID) %>%
  mutate(AQ_tot = sum(AQ_score)) %>%
  ungroup() %>%
  select(ID, AQ_tot) %>%
  distinct() %>%
  mutate(medAQ = ntile(AQ_tot, 2)) %>%
  mutate(AQ_group = recode(medAQ,
                           "1" = "Low AQ",
                           "2" = "High AQ")) %>%
  select(-medAQ) %>%
  head

Dealing with the AQ: Using gather and ntile pt 7

## # A tibble: 56 x 3
##    ID    AQ_tot AQ_group
##    <chr>  <dbl> <chr>   
##  1 1      16.0  Low AQ  
##  2 10      9.00 Low AQ  
##  3 11     17.0  Low AQ  
##  4 12      9.00 Low AQ  
##  5 13     20.0  High AQ 
##  6 14      6.00 Low AQ  
##  7 15     19.0  High AQ 
##  8 16     11.0  Low AQ  
##  9 17     22.0  High AQ 
## 10 18     23.0  High AQ 
## # ... with 46 more rows

Using summarise to generate frequencies and proportions pt 1

## # A tibble: 6 x 5
##   ID    eng                resp    resp_acc resp_time
##   <chr> <chr>              <chr>   <chr>        <dbl>
## 1 1     Native English     Like    Correct       6.02
## 2 1     Native English     Like    Correct       6.45
## 3 1     Native English     Dislike Correct       5.48
## 4 1     Native English     Dislike Correct       4.21
## 5 10    Non-native English Like    Correct      19.0 
## 6 10    Non-native English Dislike Correct       6.57

Using summarise to generate frequencies and proportions pt 2

demog_resp %>%
  select(ID, eng, resp) %>%
  group_by(eng, resp) %>%
  summarise(n = n()) %>%
  group_by(eng) %>%
  mutate(freq = n/ sum(n)) %>%
  ungroup

Using summarise to generate frequencies and proportions pt 3

eng resp n freq
Native English Dislike 72 0.5294
Native English Like 63 0.4632
Native English Miss 1 0.0074
Non-native English Dislike 43 0.4886
Non-native English Like 41 0.4659
Non-native English Miss 4 0.0455

Plotting resp data using barplot

Plotting resp data with facet

Plotting questionnaire data pt 1

nat_rob %>%
  group_by(eng, robot_q_item) %>%
  summarise(mean = mean(robot_q_resp),
            median = median(robot_q_resp),
            IQR = IQR(robot_q_resp),
            sd = sd(robot_q_resp),
            n = n()) %>%
  mutate(se = sd / sqrt(n),
         lower_ci = mean - qt(1 - (0.05 / 2), n - 1) * se,
         upper_ci = mean + qt(1 - (0.05 / 2), n - 1) * se) %>%
  mutate("Native Language" = recode(eng,
                      "No" = "Non-native English",
                      "Yes" = "Native English"))

Plotting questionnaire data pt 2

eng robot_q_item mean median IQR sd n se lower_ci upper_ci Native Language
Native English Friendliness 3.794 4 2 1.0826 136 0.0928 3.611 3.978 Native English
Native English Interaction_rating 4.147 5 1 1.0923 136 0.0937 3.962 4.332 Native English
Native English Likeability 3.618 4 1 1.1159 136 0.0957 3.428 3.807 Native English
Native English Perceived_positiveness 3.382 3 1 1.0615 136 0.0910 3.202 3.562 Native English
Native English Performance_rating 3.559 4 1 1.0093 136 0.0865 3.388 3.730 Native English
Native English Voice_clarity 4.559 5 1 0.8144 136 0.0698 4.421 4.697 Native English
Non-native English Friendliness 3.682 4 1 0.8241 88 0.0879 3.507 3.856 Non-native English
Non-native English Interaction_rating 4.273 4 1 0.7540 88 0.0804 4.113 4.433 Non-native English
Non-native English Likeability 4.227 4 1 0.6013 88 0.0641 4.100 4.355 Non-native English
Non-native English Perceived_positiveness 3.727 4 1 0.9189 88 0.0980 3.533 3.922 Non-native English
Non-native English Performance_rating 3.273 3 1 0.8126 88 0.0866 3.100 3.445 Non-native English
Non-native English Voice_clarity 4.682 5 1 0.5580 88 0.0595 4.564 4.800 Non-native English

Plotting questionnaire data pt 3

Plotting response time data by group

Data analysis

  • Tend to use binary logistic regression; accounts for non-normality
  • At present I am not using Tidyverse for analysis. Work in progress…

Analysis procedure pt 1

  • Step one: generate a correlation matrix for numeric vectors

correl <- cor(aq_a[ , c(8:14)], use = "pairwise.complete.obs")

symnum(correl)

  • Evaluate collinearity - decide if need to drop/modify parameters

Analysis procedure pt 2

  • Perform stepwise AIC check on main effect model

mainmod <- glm(AQ_group ~ expression + eng + sex + resp ..., data = aq_a, family = binomial(link = 'logit'))

summary(mainmod)

  • Run a stepwise check of the model predictors

mod1 <- step(mainmod)

  • Extract lowest AIC model and summarise using summary

Analysis procedure pt 3

  • Check is model is significant

  • chi-square difference: chidiff = mod2$null.deviance - mod2$deviance

  • degrees of freedom difference: dfdiff = mod2$df.null - mod2$df.residual # degrees of freedom difference

pchisq(chidiff, dfdiff, lower.tail = F)

Analysis procedure pt 4

  • Calculate effect size

  • use BaylorEdPsych package and PseudoR2 syntax

PseudoR2(mod2)

Analysis procedure pt 5

  • Calculate correctness of model

correct <- mod2$fitted.values

  • fitted values are a continous porbability of the liklihood of falling into the second group (High_AQ)
  • have to convert to binary values
  • use 50/50 cut off point as that is chance level for 2 groups

binarycorrect <- ifelse(correct > 0.5, 1, 0)

binarycorrect <- factor(binarycorrect, levels = c(0,1), labels = c("Low AQ", "High AQ"))

table(aq_a$AQ_group, binarycorrect)

  • perform matrix calculations

Concluding thoughts

  • Tidyverse uses logical syntax

  • Can tidy, wrangle, and model data with relative ease

  • Graphical tool ggplot2 help visualise trends from multiple perspectives

Future work

  • Eventually want to use R and R Studio as integrated program for all my research activities

  • Writing functions to shorten code chunks

  • Learn to use tidy methods for analysis

  • Setting up a Github repo for version control

  • Looking at methods to articulate .rmd with Overleaf

Thank you