Active and compressive queries are powerful tools that can help minimize sample complexity in many problems. Active queries refers to feedback-driven data collection and compressive queries refers to collection of data in a compressed representation. In this project, we are focusing on how to leverage these type of queries for efficiently learning structure in data in the form of clusters, graphs, subspaces and manifolds.
Under active query setting, this extends previous work which has mostly focused on "supervised" active learning where we can only choose whether or not to observe the label of a data point. In the unsupervised context, we are focusing on whether or not to observe a data point or a particular feature of a data point, given that the data has some (unknown) structure. The questions we are investigating include: how many active pairwise similarity measurements are needed for active clustering? how many active entries in a matrix or tensor are needed for completion and prototype selection? how many active samples are needed for learning a graphcial model? In these problems, what other benefits (better computation, storage, larger model class, ...) do active queries provide?
Under compressive query setting, we focus on active compressive queries where the subset of datapoints or features that are compressed can be adaptively changed based on feedback from prior measurements. Another theme is to consider a different compression operator per data point that can help minimize number of queries beyond what is possible with fixed compression provided there are enough data points.
This research has been supported in part by NSF CAREER award IIS-1247658, grants IIS-1252412, IIS-1116458, AFOSR Young Investigator award FA9550-14-1-0285 and grant FA9550-10-1-0382.
Some relevant papers:
"Column Subset Selection with Missing Data via Active Sampling", Yining Wang and Aarti Singh, 2015.
"Subspace Learning from Extremely Compressed Measurements", Akshay Krishnamurthy, Martin Azizyan, and Aarti Singh, 2014. (Code)
"Low-Rank Matrix and Tensor Completion via Adaptive Sampling", Akshay Krishnamurthy and Aarti Singh, 2013. (Code)
"Recovering Graph-Structured Activations using Adaptive Compressive Measurements", Akshay Krishnamurthy, James Sharpnack, and Aarti Singh, 2013.
"Efficient Active Algorithms for Hierarchical Clustering", Akshay Krishnamurthy, Sivaraman Balakrishnan, Min Xu, and Aarti Singh, 2012. (Code)
"Trend filtering" refers to a family of estimators for locally adaptive nonparametric regression, defined by the minimization of a penalized least squares criterion. The solutions of this problem take the form of piecewise polynomial functions, with adaptively chosen knot points. Trend filtering balances the strengths of other common piecewise polynomial estimators: smoothing splines are highly computationally efficient but are not minimax optimal (for estimating functions whose derivatives are of bounded variation); locally adaptive regression splines are minimax optimal but are relatively inefficient in terms of computation; trend filtering is both minimax optimal and computationally comparable to smoothing splines.
Our current work is focused on extending the trend filtering framework to nonparametric estimation in high-dimensions, and over graphs.
Some relevant papers:
"Fast and Flexible ADMM Algorithms for Trend Filtering", Aaditya Ramdas and Ryan Tibshirani, 2014.
"Trend Filtering on Graphs", Yu-Xiang Wang, James Sharpnack, Alex Smola, and Ryan Tibshirani, 2014.
"The Falling Factorial Basis and Its Statistical Applications", Yu-Xiang Wang, Alex Smola, and Ryan Tibshirani, 2014.
"Adaptive Piecewise Polynomial Estimation via Trend Filtering", Ryan Tibshirani, 2013.
Shards are low dimensional structures in the ambient space that represent high density regions. Some common examples of shards include density level sets, density local modes, filaments and density ridges. Detecting shards from a given data is important in many scientific areas including astronomy, biology and neuroscience. Our work focus on (i) deriving asymptotic behavior for shards, (ii) establishing a valid procedure for statistical inference for shard hunting, and (iii) visualizing shards in high dimensions.
Some relevant papers:
"Density Level Sets: Asymptotics, Inference, and Visualization", Yen-Chi Chen, Christopher Genovese, and Larry Wasserman, 2015.
"Nonparametric Modal Regression", Yen-Chi Chen, Christopher Genovese, Ryan Tibshirani, and Larry Wasserman, 2014.
"Asymptotic Theory for Density Ridges", Yen-Chi Chen, Christopher Genovese, and Larry Wasserman, 2014.
"Generalized Mode and Ridge Estimation", Yen-Chi Chen, Christopher Genovese, and Larry Wasserman, 2014.
"A Comprehensive Approach to Mode Clustering", Yen-Chi Chen, Christopher Genovese, and Larry Wasserman, 2014.
"Nonparametric Ridge Estimation", Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman, 2014.
Sparse principal component analysis (PCA) is a tool for simultaneous dimension reduction and variable selection in high dimensional multivariate data. In addition to finding a few directions in the high dimensional space that explain most of the variability in observed data, sparse PCA also expects that these directions involve only a few of the variables. The theoretical and methodological questions we are trying to answer include: (i) How does the assumption of sparsity help in accurately estimating the principal components? (ii) How to efficiently incorporate the sparsity structure into algorithms? (iii) How to tell if the sparsity assumption is appropriate for a given PCA problem? (iv) How to use the sparse PCA algorithm in other relevant problems, such as high dimensional clustering?
Some relevant papers:
"Localized Functional Principal Component Analysis", Kehui Chen and Jing Lei, 2015.
"Sparsistency and Agnostic Inference in Sparse PCA", Jing Lei and Vincent Vu, 2014.
"Minimax Sparse Principal Subspace Estimation in High Dimensions", Vincent Vu, and Jing Lei, 2012.
"Fantope Projection and Selection: A Near-Optimal Convex Relaxation of Sparse PCA", Vincent Vu, Juhee Cho, Jing Lei, and Karl Rohe, 2013.
Two-sample testing deals with deciding if two random variables have the same distribution, given samples from both. This problem has applications in biology and neuroscience, and one is often interested in the nonparametric setting, when one does not make any parametric assumptions about the underlying distributions. We are trying to understand the behavior of various two-sample tests in the high-dimensional setting as well as other interesting directions including connecting two sample testing with classification, and creating provable sequential testing procedures.
Some relevant papers:
"On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-Shift Alternatives", Sashank Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, and Larry Wasserman, 2015.
"On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions", Sashank Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, and Larry Wasserman, 2015.
Another interesting direction that is being pursued is that of hypothesis testing after model selection. In regression, for example, many basic selection and estimation procedures like forward stepwise regression and the lasso are very well-understood in terms of their estimation capabilities, but not in terms of their inferential properties. Our work has focused on formulating test statistics for forward stepwise and the lasso, for the purpose of post model selection hypothesis testing, whose null distribution is known either exactly (in finite samples) or asymptotically, under a normal error model. Currently we are broadening this perspective to allow for non-normal errors or possibly even heteroskedastic errors, by calling upon tools like the bootstrap and sample splitting.
Some relevant papers:
"Uniform Asymptotic Inference and the Bootstrap After Model Selection", Ryan Tibshirani, Alessandro Rinaldo, Rob Tibshirani, and Larry Wasserman, 2015.
"Exact Post-selection Inference for Sequential Regression Procedures"Ryan Tibshirani, Jonathan Taylor, Richard Lockhart, and Rob Tibshirani, 2014.
"A Significance Test for the Lasso", Richard Lockhart, Jonathan Taylor, Ryan Tibshirani, and Rob Tibshirani, 2014.
Topological Data Analysis (TDA) is a collection of methods for extracting useful features from data. Clustering is a simple example of TDA. More sophisticated forms of TDA include manifold estimation, ridge hunting and persistent homology. The latter is concerned with multiscale methods for finding holes (low density regions) in point clouds.
Please visit http://www.stat.cmu.edu/topstat/ for more information.