Welcome to the homepage of the **Statistics and Machine Learning Working Group** at **Carnegie Mellon University**!

We are group of faculty and students in Statistics and Machine Learning broadly interested in research at the intersection of these two disciplines.

Unless otherwise notified, our regular meeting is on **Tuesdays at noon in GHC-8102** every week. Please email *vsadhana AT cs.cmu.edu* if you would like join our mailing list.

Supported graciously by Microsoft Research.

**Abstract:**
In many real world problems such as online advertisement, web search, movie recommendations, personalized medical treatment. One can only observe outcomes of the actions (be it ads, search results, movies or treatments) that were taken. Let the algorithm used to generate these actions be a “policy”, we consider the problem of **off-policy** evaluation, where we collect data using a policy and then we try to evaluate the performance of a different policy. In other word, this is to answer the “What-If” question: what if the other policy was deployed, what would the outcomes be?
In this talk, I will do the following:

- Formulate the problem as a minimax-estimation problem and states its minimax risk. The result suggests that the simplest possible approach: “Importance Sampling” a.k.a “Inverse Propensity Scoring” in some sense cannot be improved, if no model assumptions are made.
- Illustrate that the state-of-the-art “Doubly Robust” estimator that incorporates model-based approaches is strictly suboptimal.
- Describe a new adaptive estimator that overcomes the issues with “Doubly Robust”.

Relevant slides: http://www.cs.cmu.edu/~yuxiangw/docs/minimax_eval_talk.pdf

**Abstract:**
Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.

Preprint: http://arxiv.org/abs/1507.00747

**Abstract:**
In this talk I will introduce a framework for constructing goodness of fit tests in both low and high-dimensional linear models. The idea involves applying regression methods to the scaled residuals following either an ordinary least squares or Lasso fit to the data, and using some proxy for prediction error as the final test statistic. We call this family Residual Prediction (RP) tests. We show that simulation can be used to obtain the critical values for such tests in the low-dimensional setting, and demonstrate that some form of the parametric bootstrap can do the same when the high-dimensional linear model is under consideration. We show that RP tests can be used to test for significance of groups or individual variables as special cases, and here they compare favourably with state of the art methods, but we also argue that they can be designed to test for as diverse model misspecifications as heteroscedasticity and different types of nonlinearity.
This is joint work with Peter Bühlmann.

Preprint: http://www.statslab.cam.ac.uk/~rds37/papers/RPtests

**Abstract:**
Modern learning algorithms are often seen as prediction-only tools, meaning that the interpretability and intuition provided by a more traditional modeling approach are sacrificed in order to achieve superior predictions. In this talk, we argue that this black-box perspective need not always be the case and develop formal statistical inference procedures for predictions generated by supervised learning ensembles. Ensemble methods based on bootstrapping, such as bagging and random forests, usually improve the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic. As such, predictions for individual feature vectors are asymptotically normal, allowing for confidence intervals to accompany predictions. In practice, a subset of subsamples is used for computational speed; here our estimators take the form of incomplete U-statistics and equivalent results are derived. We further demonstrate that this setup provides a framework for testing the significance of features. Moreover, the internal estimation method we develop allows us to estimate the variance parameters and perform these inference procedures at no additional computational cost. Demonstrations are provided using data from the ebird project hosted at Cornell University.

Here is the link to the relevant paper: http://jmlr.org/papers/v17/14-168.html

If you would like to present in an upcoming meeting, please signup here.

Topics of choice are flexible. As a guideline, here is a list of interesting papers that we hope to read this semster.