Homework 4: Logistic regression and judges

This assignment is based on an assignment created by German Rodriguez. Topics covered include:

reading in tabular data
ggplot2
logistic regression

For this assignment you will be extending the analysis in this article:

Greene and Schaffer (1992) “Leave to Appeal and Leave to Commence Judicial Review in Canada’s Refugee-Determination System: Is the Process Fair?” International Journal of Refugee Law, 4.1: 71-83.

Here’s how the authors describe the paper: “The purpose of this study is to determine whether refugees and other persons who are applying for leave to appeal a decision of the Convention Refugee Determination Division of the Canadian Immigration and Refugee Board, a decision of the Appeal Division of the Immigration and Refugee Board, or to commence an action for judicial review have a relatively equal chance to convince the Federal Court of Appeal of the merits of their applications.”"

The data and documentation were made avaialble by John Fox. I have stored a copy of the data in Greene.txt in the subfolder data.

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data <- read.table("data/Greene.txt", header=TRUE)
data <- tbl_df(data)

Before doing anything else, just play with the data to get a sense of it.

The main outcome of interest is decision which records whether the judge granted an appeal or not. In what proprotion of cases did the judge grant an appeal.

# convert decision to a number rather a factor
data$appeal_granted <- as.numeric(data$decision=="yes")
summarize(data, yes_rate = mean(appeal_granted))

## Source: local data frame [1 x 1]
## 
##   yes_rate
## 1 0.296875

There are 12 different judges in the data. A key question is whether different judges have different rates of granting an appeal. Make a plot showing the number of cases per judge.

library(ggplot2)
p <- ggplot(data, aes(judge))
p + geom_bar(stat="bin")

Now plot the rate at which the judges grant appeals.

data_to_plot <- group_by(data, judge) %>% 
                  summarize(yes_rate = mean(appeal_granted)) %>%
                  arrange(desc(yes_rate))
data_to_plot$judge <- reorder(data_to_plot$judge, data_to_plot$yes_rate)
p <- ggplot(data_to_plot, aes(x = yes_rate, y = judge))
p + geom_point()

Now let’s try this with logistic regression because we are going to move to more complicated models. Fit a logistic regression predicting whetheran appeal will be granted using judge as a categorical predictor. Use Iacobucci as the reference judge to facilitate the comparisons that follow. For more on how to control the reference level of a factor, check out this post on StackOverflow: http://stackoverflow.com/questions/3872070/how-to-force-r-to-use-a-specified-factor-level-as-reference-in-a-regression

data$judge <- relevel(data$judge, ref="Iacobucci")
fit <- glm(appeal_granted ~ judge, data = data, family = "binomial")
library(broom)
tidy(fit)

##               term   estimate std.error  statistic      p.value
## 1      (Intercept) -2.1594842 0.6097150 -3.5417930 0.0003974172
## 2  judgeDesjardins  2.2464956 0.6774021  3.3163401 0.0009120477
## 3       judgeHeald  1.3385037 0.7089861  1.8879124 0.0590377139
## 4    judgeHugessen  0.7323679 0.6892646  1.0625352 0.2879927768
## 5   judgeMacGuigan  1.0224057 0.6704056  1.5250554 0.1272452851
## 6     judgeMahoney  1.8912203 0.7123896  2.6547555 0.0079365903
## 7     judgeMarceau  2.5649493 0.7337704  3.4955748 0.0004730419
## 8      judgePratte  0.3677248 0.7524605  0.4886965 0.6250566009
## 9       judgeStone  1.0200500 0.7326337  1.3923056 0.1638298693
## 10       judgeUrie  1.9771627 0.8593131  2.3008641 0.0213993135

Using the results of the model calculate the predicted probability that Judge Marceau will grant an appeal.

fit_tidy <- tidy(fit)
predicted_probability <- 1 / (1 + exp(-(fit_tidy[fit_tidy$term=="(Intercept)", "estimate"] + fit_tidy[fit_tidy$term=="judgeMarceau", "estimate"])))
print(paste("Predicted probability for Judge Marceau is:", predicted_probability))

## [1] "Predicted probability for Judge Marceau is: 0.599999999999999"

Perhaps the different judges have different types of cases. That might be why they seem to be making such different decisions. Fit a model treating judge as a factor plus the following predictors: (i) an indicator for whether the expert rater thought the case had merit, (ii) location indicators using ‘other’ as the reference category, (iii) a language dummy with French as omitted category. Note that as above you might have to relevel the factors to get this to work as you want.

data$location <- relevel(data$location, ref="other")
data$language <- relevel(data$language, ref="French")
fit <- glm(appeal_granted ~ judge + rater + location + language, data = data, family = "binomial")
tidy(fit)

##                term   estimate std.error  statistic      p.value
## 1       (Intercept) -3.1099344 0.8821732 -3.5253103 4.229871e-04
## 2   judgeDesjardins  2.5118507 0.7134964  3.5204812 4.307645e-04
## 3        judgeHeald  1.3288574 0.7348309  1.8083852 7.054657e-02
## 4     judgeHugessen  1.2371875 0.7559655  1.6365660 1.017212e-01
## 5    judgeMacGuigan  1.3108339 0.7030106  1.8646005 6.223742e-02
## 6      judgeMahoney  1.7326906 0.7400300  2.3413788 1.921267e-02
## 7      judgeMarceau  3.4303647 0.8025491  4.2743364 1.917074e-05
## 8       judgePratte  0.6970888 0.7892341  0.8832471 3.771028e-01
## 9        judgeStone  0.9768491 0.7582761  1.2882500 1.976590e-01
## 10        judgeUrie  2.3313476 0.8969069  2.5993195 9.340879e-03
## 11         rateryes  1.4247201 0.2610852  5.4569166 4.844737e-08
## 12 locationMontreal -0.4276074 0.6158915 -0.6942903 4.875002e-01
## 13  locationToronto  0.1029737 0.3672029  0.2804273 7.791497e-01
## 14  languageEnglish  0.3268456 0.5580564  0.5856856 5.580868e-01

For a case that was ruled to have merit, that took place in Toronto, and that was in English, what is the predicted probability that Judge Marceau will grant the appeal?

fit_tidy <- tidy(fit)
predictor_value <- fit_tidy[fit_tidy$term=="(Intercept)", "estimate"] +
  fit_tidy[fit_tidy$term=="rateryes", "estimate"] +
  fit_tidy[fit_tidy$term=="judgeMarceau", "estimate"] +
  fit_tidy[fit_tidy$term=="locationToronto", "estimate"] +
  fit_tidy[fit_tidy$term=="languageEnglish", "estimate"] 
predicted_probability <- 1 / (1 + exp(-(predictor_value)))
print(paste("The predicted probability for this case is:", predicted_probability))

## [1] "The predicted probability for this case is: 0.897979154560357"

The command below is helpful for debugging, please don’t change it

## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] broom_0.3.5   ggplot2_1.0.0 dplyr_0.4.1  
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   colorspace_1.2-4 DBI_0.3.1        digest_0.6.8    
##  [5] evaluate_0.5.5   formatR_1.0      grid_3.1.2       gtable_0.1.2    
##  [9] htmltools_0.2.6  knitr_1.9        labeling_0.3     lazyeval_0.1.10 
## [13] magrittr_1.5     MASS_7.3-40      mnormt_1.5-1     munsell_0.4.2   
## [17] parallel_3.1.2   plyr_1.8.1       proto_0.3-10     psych_1.5.1     
## [21] Rcpp_0.11.4      reshape2_1.4.1   rmarkdown_0.5.1  scales_0.2.4    
## [25] stringr_0.6.2    tidyr_0.2.0      tools_3.1.2      yaml_2.1.13

Homework 4: Logistic regression and judges

your name here

Due: Sunday, April 19, 2015

The command below is helpful for debugging, please don’t change it