Thesis Defense: Statistical Inference for Topological Data Analysis

April 16 (Thursday) at 12 noon
Baker Hall 232 M

Speaker: Fabrizio Lecci

Abstract: Topological Data Analysis (TDA) is an emerging area of research at the intersection of algebraic topology and computational geometry, aimed at describing, summarizing and analyzing possibly high-dimensional data using low-dimensional algebraic representations. Recent advances in computational topology have made it possible to actually compute topological invariants from data. These novel types of data summaries have been used successfully in a variety of applied problems, and their potential for high-dimensional statistical inference appears to be significant. Nonetheless, the statistical properties of the data summaries produced in TDA and, more generally, of the usually heuristic data-analytic methods they are part of, have remained largely unexplored by statisticians. Our analysis involves the tools of persistent homology, the main method of TDA for measuring the topological features of shapes and functions at different resolutions. A major part of our research also focuses on cluster trees, which provide a simple yet meaningful abstraction of the input domain of a function by means of the topological changes in its level sets. The main goal of this thesis is to contribute to the development of a statistical theory for TDA and to propose new and statistically principled methodologies to improve and extend the applicability of the algorithms of TDA. In particular, we (1) construct confidence sets to separate topological signal from topological noise; (2) explore new methods for topological dimensional reduction; (3) determine how our methods contribute to reduce computational costs, which currently represent an obstacle in TDA.