Literature review of smoothing non-Gaussian sequence data

Last updated: 2018-11-14

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Repository version: 7771721
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
```
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store

Untracked files:
    Untracked:  analysis/chipexoeg.Rmd
    Untracked:  analysis/efsd.Rmd
    Untracked:  analysis/talk1011.Rmd
    Untracked:  data/chipexo_examples/
    Untracked:  data/chipseq_examples/
    Untracked:  talk.Rmd
    Untracked:  talk.pdf

Unstaged changes:
    Modified:   analysis/fda.Rmd
    Modified:   analysis/sigma.Rmd
```
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	7771721	Dongyue Xie	2018-11-14	wflow_publish(“analysis/literature.Rmd”)
html	99b3deb	Dongyue Xie	2018-10-13	Build site.
Rmd	44e7e44	Dongyue Xie	2018-10-13	add tf
html	a21393a	Dongyue Xie	2018-10-05	Build site.

This is a broad list of literature on smoothing Gaussian/non-Gaussian data.

Nonparametric methods

KNN
Kernel smootihng methods: Chapter 6 of ESL.
Local regression(linear & higher order): Loader, C. (2006). Local regression and likelihood. Springer Science & Business Media.

Note: Local regression makes no global assumptions about the function but assume that locally it can be well approximated with a member of a simple class of parametric function. Only observations in certain window are used.

Splines: regression splines, smoothing splines; More generally, reproducing kernel Hilbert space: Chapter 5 of ESL
Locally adaptive estimators: wavelet( Mallat, S. (1999). A wavelet tour of signal processing. Elsevier ), Locally adaptive regression splines(A varaiant of smooting splines achieves better local adaptivity. Mammen, E., & van de Geer, S. (1997). Locally adaptive regression splines. The Annals of Statistics, 25(1), 387-413.), Trend filtering( Kim, S. J., Koh, K., Boyd, S., & Gorinevsky, D. (2009). l_1 Trend Filtering. SIAM review, 51(2), 339-360. ).
Additive models: Sparse additive models, Generalized additive mixed models.

More on trend filtering and additive models:

Trend filtering:

Wang, Y. X., Sharpnack, J., Smola, A. J., & Tibshirani, R. J. (2016). Trend filtering on graphs. The Journal of Machine Learning Research, 17(1), 3651-3691.

Ramdas, A., & Tibshirani, R. J. (2016). Fast and flexible ADMM algorithms for trend filtering. Journal of Computational and Graphical Statistics, 25(3), 839-858. The R package for this algo is glmgen.

Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics, 42(1), 285-323.

Additive models:

Sadhanala, V., & Tibshirani, R. J. (2017). Additive Models with Trend Filtering. arXiv preprint arXiv:1702.05037.

Petersen, A., Witten, D., & Simon, N. (2016). Fused lasso additive model. Journal of Computational and Graphical Statistics, 25(4), 1005-1025.

Yin Lou, Jacob Bien, Rich Caruana & Johannes Gehrke (2016) Sparse Partially Linear Additive Models, Journal of Computational and Graphical Statistics, 25:4, 1126-1140

Generalized Sparse Additive Models

Chouldechova, A., & Hastie, T. (2015). Generalized additive model selection. arXiv preprint arXiv:1506.03850.

Exponential family

Nonparametric regression for exponential family:

Brown, Lawrence D., T. Tony Cai, and Harrison H. Zhou. “Nonparametric regression in exponential families.” The annals of statistics 38.4 (2010): 2005-2046.

Cleveland, W. S., Mallows, C. L., & McRae, J. E. (1993). ATS methods: Nonparametric regression for non-Gaussian data. Journal of the American Statistical Association, 88(423), 821-835.

Zhang, H. H., & Lin, Y. (2006). Component selection and smoothing for nonparametric regression in exponential families. Statistica Sinica, 1021-1041.

Bianco, A. M., Boente, G., & Sombielle, S. (2011). Robust estimation for nonparametric generalized regression. Statistics & Probability Letters, 81(12), 1986-1994.

Fryzlewicz, P. (2017). Likelihood ratio Haar variance stabilization and normalization for Poisson and other non-Gaussian noise removal. arXiv preprint arXiv:1701.07263.

Local Likelihood Estimation

Generalized addtive model

O’sullivan, F., Yandell, B. S., & Raynor Jr, W. J. (1986). Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association, 81(393), 96-103.

Poisson

Kolaczyk, E. (1999). Bayesian Multiscale Models for Poisson Processes. Journal of the American Statistical Association, 94(447), 920-933. doi:10.2307/2670007

Fryzlewicz, P., & Nason, G. P. (2004). A Haar-Fisz algorithm for Poisson intensity estimation. Journal of computational and graphical statistics, 13(3), 621-638.

Timmerman, K., & Nowak, R. D. (1999). Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging. IEEE Transactions on Information Theory, 45(3), 846-842.

Binomial

Marchand, P., & Marmet, L. (1983). Binomial smoothing filter: A way to avoid some pitfalls of least‐squares polynomial smoothing. Review of scientific instruments, 54(8), 1034-1041.

Hansen, K. D., Langmead, B., & Irizarry, R. A. (2012). BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome biology, 13(10), R83.

Note: It actually uses local likelihood moother and assumes \(logit(\pi)\) is approximated by a second degree polynomial. They assume that data follow a binomial distribution and the parameters defining the polynomial are estimated by fitting a weighted generalized linear model to the data inside the genomic window.

This reproducible R Markdown analysis was created with workflowr 1.1.1

Literature review of smoothing non-Gaussian sequence data

Dongyue Xie

2018-09-29

Nonparametric methods

Exponential family

Poisson

Binomial