- Background on the development of targeted learning
- Theory of TMLE
- Application of TMLE in R
- Extensions of TMLE
This presentation, the data (with documentation) and R code is available at: https://github.com/sfgrey/Super-Learner-Presentation.git
April 12, 2016
This presentation, the data (with documentation) and R code is available at: https://github.com/sfgrey/Super-Learner-Presentation.git
"Essentially, all models are wrong, but some are useful"
- George Box, 1979
Mantra of statisticians regarding the development of statistical models for many years
In the 1990s an awareness developed among statisticians (Breiman, Harrell) that this approach was wrong
Simultaneously, computer scientists and some statisticians developed the machine learning field to address the limitations of parametric models
Combines advanced machine learning with efficient semiparametric estimation to provide a framework for answering causal questions from data
Central motivation is the belief that statisticians treat estimation as Art not Science
Random variable \(O\), observed \(n\) times, defined in a simple case as if we are without common issues such as missingness and censoring
This data structure makes for an effective example, but data structures found in practice are much more complicated
General case: Observe \(n\) i.i.d. copies of random variable \(O\) with probability distribution
The data-generating distribution is also known to be an element of a statistical model
A statistical model \(M\) is the set of possible probability distributions for ; it is a collection of probability distributions
If all we know is that we have \(n\) i.i.d. copies of \(O\), this can be our statistical model, which we call a non-parametric statistical model
A statistical model can be augmented with additional non-testable assumptions, allowing one to enrich the interpretation of ; This does not change the statistical model
We refer to the statistical model augmented with a possibly additional assumptions as a causal model
In the Neyman-Rubin causal inference framework, assumptions include
Potential outcomes: every individual \(i\) has a different potential outcome depending on their treatment "assignment"
and
The "fundamental problem with causal inference" is that we can only observe one of these potential outcomes
If we randomly assign \(i\) to receive \(A\), then the groups will be equivalent and causal inference can be inferred:
Define the parameter of the probability distribution \(P\) as function of
In a causal inference framework, a target parameter for the effect of \(A\) could be
Or, if we wish to use a ratio instead of a difference: Where
The target parameter depends on through the conditional mean and the marginal distribution of \(W\); or
Where is an estimator of , shortened to
An estimator is an algorithm that can be applied to any empirical distribution to provide a mapping from the empirical distribution to the parameter space
Both effect and prediction research questions are inherently estimation questions, but they are distinct in their goals
Maximum-likelihood-based substitution estimators will be of the type where this estimate is obtained by plugging in into the mapping \(\Psi\)
Estimating-equation-based function is a function of the data \(O\) and the parameter of interest. If \(D(\psi)(O)\) is an estimating function, then is a solution that satisfies:
It is an iterative procedure that:
Generates an initial (super learner) estimate of the relevant part of the data generating distribution , noted as
Updates an initial estimate, possibly using an estimate of a nuisance parameter,
Produces a well-defined, unbiased, efficient substitution estimator of target a parameter \(\Psi\)
- Is semi-parametric, no need to make assumptions about
- Uses machine learning techniques to get initial estimates
Step 1: Use the super learner procedure to generate an initial estimate
Step 2: Estimate , the conditional distribution of \(A\) given \(W\) (a propensity score, called a nuisance parameter if \(A\) is randomized), denoted
Step 3: Construct a "clever covariate" that will be used to fluctuate the initial estimate
Step 4: Use maximum likelihood to obtain , the estimated coefficient of in:
Step 5: plug-in the substitution estimator using updated estimates and and the empirical distribution of \(W\) into formula:
Step 6: Inference using an infuence curve (IC)
IC is a function that describes estimator behavior under slight perturbations of the empirical distribution.
IC has mean 0 at the true parameter value, so it can be used as an estimating equation:
The empirical mean of IC for regular asymptotically linear (RAL) estimator provides a linear approximation of estimator. Thus, VAR(IC) provides asymptotic variance of estimator
We then calculate the sample variance of the estimated influence curve values:
After which standard errors, confidence intervals and p-values can be calculated in the standard fashion
Also possible to utilize bootstrapping to calculate standard errors, but computationally expensive
Created by Susan Gruber in collaboration with Mark van der Laan
library(tmle) effA1 <- tmle(Y=Y, A=A, W=W, Q.SL.library = c(), g.SL.library = c(), family = "binomial", cvQinit = TRUE, verbose = TRUE)
Y
- The outcomeA
- Binary treatment indicator, 1 treatment, 0 controlW
- A matrix of covariatesQ.SL.library
- a character vector of prediction algorithms for initial \(Q\)g.SL.library
- a character vector of prediction algorithms for \(g\)family
- 'gaussian' or 'binomial' to describe the error distributioncvQinit
- estimates cross-validated predicted values for initial \(Q\), if TRUEid
- Subject or group identifier if observations are related. Causes corrected standard errors to be calculatedverbose
- helpful to set this to TRUE
to see the progress of the estimationDelta
- Indicator of missing outcome or treatment assignmentZ
- Binary mediating variablePermits the use of multiple machine learning algorithms to generate the initial estimate of \(Q\)
Currently, SL should not be used to estimate \(g\)
Does placing a right heart catheter change 30 day mortality?
The ARF dataset has 2490 patients admitted to an ICU and 47 variables including:
Only works with numeric matrices; can be specified in-line, i.e. Y= dataset$Y
Data must be pre-processed:
Y
, X
must be removed/imputed# Impute missing X values # library("VIM") # Scale cont vars # library(arm) cont <- c("age","edu","das2d3pc","aps1","scoma1","meanbp1","wblc1","hrt1", "resp1","temp1","pafi1","alb1","hema1","bili1","crea1","sod1", "pot1","paco21","ph1","wtkilo1") arf[,cont] <- data.frame(apply(arf[cont], 2, function(x) {x <- rescale(x, "full")})); rm(cont) # standardizes by centering and # dividing by 2 sd # Create dummy vars # arf$rhc <- ifelse(arf$swang1=="RHC",1,0) arf$white <- ifelse(arf$race=="white",1,0) arf$swang1 <- arf$race <- NULL
system.time({ eff <- tmle(Y=arf$death, A=arf$rhc, W=arf[1:44], Q.SL.library = c("SL.gam","SL.knn","SL.step"), g.SL.library = c("SL.glmnet"), family = "binomial", cvQinit = TRUE, verbose = TRUE) })[[3]] # Obtain computation time
Run time on laptop: 15.43 min.
print(eff)
Odds Ratio
Parameter Estimate: 1.207
p-value: 0.063956
95% Conf Interval: (0.98914, 1.4728)
Interpretation: Right heart catheterization does not appear to change 30 day mortality
Incorporates machine learning so the limitations of parametric methods are avoided
Is “double robust” meaning that estimates are asymptotically unbiased if either the initial SL estimate or the propensity score is correctly specified
Can be extended to a variety of situations
van der Laan, M.J. and Rubin, D. (2006), Targeted Maximum Likelihood Learning. The International Journal of Biostatistics, 2(1). http://www.bepress.com/ijb/vol2/iss1/11/
van der Laan, M.J. and Rose, S. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, Berlin Heidelberg New York, 2011. http://www.targetedlearningbook.com/
M.J. van der Laan, E.C. Polley, and A.E. Hubbard. Super learner. Stat Appl Genet Mol, 6(1): Article 25, 2007.
Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. http://www.jstatsoft.org/v51/i13/
Sekhon, Jasjeet (2007). "The Neyman-Rubin Model of Causal Inference and Estimation via Matching Methods" (PDF). The Oxford Handbook of Political Methodology. http://sekhon.berkeley.edu/papers/SekhonOxfordHandbook.pdf
F.R. Hampel. “The influence curve and its role in robust estimation” JASA, 69(346): 383-393, 1974.
tmle: Targeted Maximum Likelihood Estimation https://cran.r-project.org/web/packages/tmle/index.html
SuperLearner: Super Learner Prediction https://cran.r-project.org/web/packages/SuperLearner/index.html
M. Petersen and L. Balzer. Introduction to Causal Inference. UC Berkeley, August 2014. http://www.ucbbiostat.com/
This presentation, the data (with documentation) and R code is available at: https://github.com/sfgrey/Super-Learner-Presentation.git