Data Presentation and Visualization

Sean Davis, MD, PhD
National Cancer Institute
National Institutes of Health
Bethesda, MD, USA

September 12, 2017

Overview


This section is based on talk by Karl W. Broman titled “How to Display Data Badly” in which he described how the default plots offered by Microsoft Excel “obscure your data and annoy your readers”. His lecture was inspired by the 1984 paper by H Wainer: How to display data badly. Dr. Wainer was the first to elucidate the principles of the bad display of data. But according to Karl “The now widespread use of Microsoft Excel has resulted in remarkable advances in the field.”

Rules for bad data display (Rafa)

  • Display as little information as possible.
  • Obscure what you do show (with chart junk).
  • Use pseudo-3d and color gratuitously.
  • Make a pie chart (preferably in color and 3d).
  • Use a poorly chosen scale.
  • Ignore significant digits (more is always better, right)?

Rules for better data display (Karl)

  • Show the data
  • Avoid chart junk
  • Consider taking logs and/or differences
  • Put the things to be compared next to each other
  • Use color to set things apart, but consider color blind folks
  • Use position rather than angle or area to represent quantities
  • Align things vertically to ease comparisons
  • Use common axis limits to ease comparisons
  • Use labels rather than legends
  • Sort on meaningful variables (not alphabetically)
  • Must 0 be included in the axis limits?

Show the data









Boxplots and violin plots

Boxplots and violin plots


Boxplot vs. actual datapoints

Boxplot vs. actual datapoints


Boxplot vs. actual datapoints, but with some spice

Boxplot vs. actual datapoints, but with some spice


ggplot2 vs `base` theme

ggplot2 vs base theme


ggplot2 vs `Tufte` theme

ggplot2 vs Tufte theme

Avoid Pie Charts






Consider logs




Make the point




Log is better

Log is better


Focus reader on important aspects of data

Focus reader on important aspects of data

Guide and ease comparisons








Lots of choices

Don’t leave out important factors








Choose appropropriate scales and axes

Value ~ Radius

Value ~ Area

Value ~ Length

Miscellaneous

Sorting

Sorting

Compare distributions

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 184 rows containing non-finite values (stat_bin).

Compare distributions

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 184 rows containing non-finite values (stat_bin).

Compare distributions

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 184 rows containing non-finite values (stat_bin).

Compare distributions

## Warning: Removed 184 rows containing non-finite values (stat_density).

Include Zero?

## Warning: Removed 18 rows containing non-finite values (stat_boxplot).

Blow it with 3D

Colors and [R]ColorBrewer

Wrap up

Encoding data

The human eye has varying difficulty with interpreting graphical features. When encoding data, thing about using these features, in order from easiest to most difficult.

  • Quantitative data
    • Position
    • Length
    • Angle
    • Area
    • Luminance (light/dark)
    • Chroma (amount of color)
  • Categorical data
    • Shape
    • Hue (which color)
    • Texture
    • Width

Rules for bad data display

  • Display as little information as possible.
  • Obscure what you do show (with chart junk).
  • Use pseudo-3d and color gratuitously.
  • Make a pie chart (preferably in color and 3d).
  • Use a poorly chosen scale.
  • Ignore significant digits (more is always better, right)?

Rules for better data display

  • Show the data
  • Avoid chart junk
  • Consider taking logs and/or differences
  • Put the things to be compared next to each other
  • Use color to set things apart, but consider color blind folks
  • Use position rather than angle or area to represent quantities
  • Align things vertically to ease comparisons
  • Use common axis limits to ease comparisons
  • Use labels rather than legends
  • Sort on meaningful variables (not alphabetically)
  • Must 0 be included in the axis limits?

References

Few, Stephen, and Perceptual Edge. 2008. “Practical Rules for Using Color in Charts.” Visual Business Intelligence Newsletter 11. http://www.perceptualedge.com/articles/visual_business_intelligence/rules_for_using_color.pdf.