Hitchhikers Need Free Vehicles!

Gregory M. Kapfhammer

Phil McMinn and Chris J. Wright


flickr photo shared by BergsPix under a Creative Commons ( BY-ND ) license

Randomization

Inherent in SBST techniques

Necessitates careful experiment design

Statistical analysis of results required!

When I say "statistics" ...

But, we need statistical analysis!

Statistical Analysis

A Hitchhiker's Guide to Statistical Tests for Assessing Randomized Algorithms in Software Engineering

Arcuri and Briand recommend statistical techniques

Code snippets provided in the R language

A tremendous asset to the SBST community!


creative commons licensed ( BY-NC ) flickr photo shared by Michael Kappel

Subtleties of Statistical Analysis

Well-meaning researchers may make small mistakes

Marco Torchiano revealed paradoxical effect sizes


creative commons licensed ( BY-NC ) flickr photo shared by Michael Kappel

Hitchhikers Need Vehicles

Shared repositories of statistical code

Well-tested implementations of procedures

Additional documentation and guidelines

Replication packages for completed analyses

Why is This Important?

Enhance the Maturity of the SBST Field

Supporting Tools and Platforms?

Suggestions

Use GitHub to store data and analysis code

Create R packages using devtools

Reveal your full analysis with RMarkdown

Use "best of breed" tools to support your work!


creative commons licensed ( BY-NC-ND ) flickr photo shared by sunface13

Carefully pick your analysis team ...

"Hadleyverse"

dplyr for fast data manipulation

tidyr for disciplined data restructuring

ggplot2 for impressive data visualization

Or, use the languages and packages you prefer

But, seriously, Hadley Wickham's code is awesome!


Publicly available photo shared by Hadley Wickham

Where do we go next?

Let's Talk

What statistical analysis do you regularly perform?

What is needed to move the SBST community forward?

What types of vehicles do hitchhikers really need?


Stocksnap.io photo shared by Alejandro Escamilla under a Public Domain license

Questions

Sharing data sets larger than what GitHub supports?

Use Git Large File Storage (LFS)

Why don't we release scripts for running experiments?

They are often customized. But, yes, we should!


Stocksnap.io photo shared by Alejandro Escamilla under a Public Domain license