European R User Meeting, Poznań 2016

LimeSurvey: introduction

  • the most advanced open-source tool for on-line surveys (GNU GPLv2)
  • available in 82 languages (UTF-8 encoding)
  • developed in PHP since 2003 by Carsten Schmitz and others
  • in 2012 version 2.0 was introduced (current stable 2.52):
    • completely re-written, MVC approach, Yii PHP framework,
    • new GUI based on AJAX technology and new API
  • very popular: ~3 000 weekly downloads
  • also used by:
    • professional opinion and market research agencies
    • hundreds of universities worldwide
  • www.limesurvey.org

LimeSurvey + R: previous approaches

  • LimeSurvey plug-in for manually exporting data to R
    • each time you need two files: data in CSV & R syntax file
    • first attempt of R integration:
      • parsing HTML and sending data via RCurl
  • since LimeSurvey 2.0 new version of API is available

LimeRick: motivation

  • need for a bridge that connects very closely two important open-source projects (R and LimeSurvey)
  • need for an advanced tool that allow for:
    • quickly importing responses into R from active on-line surveys
    • automatically accessing properties of surveys and questions
    • monitoring survey status and responses directly from R
    • adding new responses to a survey directly from R
    • simplifying workflow for reproducible analysis
      (by providing in advance structured data schema)
    • developing data products based on real-time declarative data collection
      (ex. continous on-line tracking studies)
    • collecting meta-data of respondents' interactions with on-line surveys
      on unique low-granular level

LimeRick: installing and configuring

# devtools::install_github("kalimu/LimeRick")
suppressPackageStartupMessages(library(LimeRick))
# set link to the LimeSurvey API on the demo remote server
options(lsAPIurl = 'http://odgar.net/survey/index.php/admin/remotecontrol')
# set LimeSurvey user login data for survey testing purposes
options(lsUser = "LimeRickDemo"); options(lsPass = "LimeRickDemo")

# low-level API call
lsAPI(method = "release_session_key")
## [1] "OK"
# API call using a wrapper function
lsSessionKey("release")
## Connecting to: http://odgar.net/survey/index.php/admin/remotecontrol 
## Releasing session key...
## [1] "OK"

LimeRick: installing and configuring

# getting session key for the user and saving it inside a special environment
lsSessionKey("set")
## Connecting to: http://odgar.net/survey/index.php/admin/remotecontrol 
## Obtaining session key...
## [1] "wjmacgfnzfh9jwuzsamzhmcft4ejg9n2"
# setting locale for Polish characters in responses
Sys.setlocale("LC_ALL", "Polish")

LimeRick: listing surveys

# listing available surveys
(surveyList = lsList("surveys"))
##      sid                 surveyls_title startdate expires active
## 1 683736 Feedback survey for R Packages        NA      NA      Y
# extracting ID of demo survey
surveyID = surveyList$sid[1] 

You can submit your own answers to the demo survey: http://odgar.net/survey/index.php/survey/index/sid/683736

LimeRick: listing questions

questionList = lsList("questions", surveyID)
questionList[, names(questionList) %in% 
    c("sid", "qid", "gid", "title", "question","question_order")]
##   qid    sid gid       title                         question
## 1  16 683736   2      sector    What sector do you represent?
## 2  27 683736   2 packageName               The R package name
## 3  26 683736   2    feedback Your feedback about the package 
## 4  21 683736   2     country    Which country do you live in?
##   question_order
## 1              2
## 2              0
## 3              1
## 4              3

LimeRick: accessing properites

We can access 22 question properties and 58 survey properties. For example:

# Is the survey active? (Y - Yes)
lsGetProperties('survey', surveyID)$active
## [1] "Y"
# What is the main text of a given question?
lsGetProperties('question', surveyID, 16)$question
## [1] "What sector do you represent?"
# Is the question mandatory? (Y - Yes)
lsGetProperties('question', surveyID, 16)$mandatory
## [1] "N"

LimeRick: checking survey response

lsGetSummary(surveyID)
## $completed_responses
## [1] "502"
## 
## $incomplete_responses
## [1] "7"
## 
## $full_responses
## [1] "509"

LimeRick: accessing responses

d = lsGetResponses(surveyID, completionStatus = 'complete')
tail(d[, c(1, 2, 5, 9, 10, 11)])
##      id          submitdate           startdate
## 497 531 2016-10-08 07:49:07 2016-10-08 07:49:07
## 498 532 2016-10-08 07:49:07 2016-10-08 07:49:07
## 499 533 2016-10-08 07:52:01 2016-10-08 07:52:01
## 500 534 2016-10-08 07:52:01 2016-10-08 07:52:01
## 501 535 2016-10-08 08:33:01 2016-10-08 08:33:01
## 502 536 2016-10-08 08:33:01 2016-10-08 08:33:01
##                            feedback   sector country
## 497 Adding feedback directly from R academia  Poland
## 498       Good job! (Kamil, Poland) academia  Poland
## 499 Adding feedback directly from R academia  Poland
## 500       Good job! (Kamil, Poland) academia  Poland
## 501 Adding feedback directly from R academia  Poland
## 502       Good job! (Kamil, Poland) academia  Poland

LimeRick: passive measurement

lsAddPackageStats(packageName = "LimeRick",
                  functionName = "lsGetProperties",
                  functionStats = NROW(df))
dStats = lsGetPackageStats(usageStats = FALSE); tail(dStats, 3)
## Sending usage statistics for function lsGetResponses is disabled.
##        id          submitdate startlanguage packageName packageVer
## 4115 4623 2016-10-11 06:48:20            en    LimeRick 0.0.1.9000
## 4116 4624 2016-10-11 06:48:21            en    LimeRick 0.0.1.9000
## 4117 4625 2016-10-11 06:48:21            en    LimeRick 0.0.1.9000
##         functionName functionStats
## 4115 lsGetProperties      question
## 4116    lsGetSummary             1
## 4117  lsGetResponses           502

LimeRick: stats for R package functions

dStats %>% group_by(functionName) %>% summarise('usageCount' = n())
## # A tibble: 6 x 2
##         functionName usageCount
##                <chr>      <int>
## 1      lsAddResponse        498
## 2 lsGetAnswerOptions        270
## 3    lsGetProperties       1685
## 4     lsGetResponses        606
## 5       lsGetSummary        301
## 6             lsList        757
dStats %>% filter(functionName == 'lsGetResponses') %>% group_by(functionName) %>% 
    summarise('usageCount' = n(), 
              'responsesDownloaded' = sum(as.numeric(functionStats)))
## # A tibble: 1 x 3
##     functionName usageCount responsesDownloaded
##            <chr>      <int>               <dbl>
## 1 lsGetResponses        606              162809

LimeRick: the most often used functions

    ggplot(data = dStats, aes(x = reorder_size(functionName))) + 
        geom_bar(fill = "#56B4E9") + coord_flip() 

Data Products (LimeSurvey + R + LimeRick + Shiny)

Metadata analysis: traditional

  • timings page-by-page
  • lowest granularity possible: single question by page
  • matrix with multi sub-questions still gives one timing

Metadata analysis: low-granular

  • timings between each respondent-survey interaction
  • lowest granularity possible: keystroke or mouse click
  • also logged when respondents change their answers

Metadata analysis: summary

  • with prototype of scripts that integrates with LimeSurvey (by JarosÅ‚aw SzkoÅ‚a)
  • we can perform very detail meta-analysis of on-line survey by
    • merging existing and missing answers (non-response)
      with detiled timings
    • reproducing each respondent interaction with a survey on very low level
    • aggregating low-level data to build Key Performance Indicators for on-line surveys
  • in the future:
    • benchmarks for surveys and questions in pilot studies

Call for collaboration

Still much work to do:

  • developing & testing new functionalities
  • integrating the tool for metadata analysis
  • submitting the package to CRAN
  • publishing an academic paper
    • possible with real world case studies
  • developing data products
    for non-profit and commercial partners

The LimeRick GitHub repository is now open!
github.com/kalimu/LimeRick

Thank you!