Discovery and Visualization of Nonstationary Causal Models

Speaker: Yu-Xiang Wang

Abstract: Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. However, the use of ML on sensitive data sets involving medical, financial and behavioral data are greatly limited due to privacy concern. In this talk, we consider the problem of statistical learning with privacy constraints. Under Vapnik's general learning setting and the formalism of differential privacy (DP), we establish simple conditions that characterizes the private learnability, which reveals a mixture of positive and negative insight. We then identify generic methods that reuse existing randomness to effectively solve private learning in practice; and discuss a weaker notion of privacy — on-avg KL-privacy — that allows for orders-of-magnitude more favorable privacy-utility tradeoff, while preserving key properties of differential privacy. Moreover, we show that On-Average KL-Privacy is **equivalent** to generalization for a large class of commonly-used tools in statistics and machine learning that sample from Gibbs distributions---a class of distributions that arises naturally from the maximum entropy principle. Finally, I will describe a few exciting future directions that use statistics/machine learning tools to advance he state-of-the-art for privacy, and use privacy (and privacy inspired techniques) to formally address the problem of p-hacking in scientific discovery. References:

Yu-Xiang Wang, Jing Lei, and Stephen E. Fienberg. "Learning with differential privacy: Stability, learnability and the sufficiency and necessity of ERM principle." *Journal of Machine Learning Research* 17.183 (2016): 1-40.
Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. "Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo." *ICML*. 2015.
Yu-Xiang Wang, Jing Lei, and Stephen E. Fienberg. "On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms." *International Conference on Privacy in Statistical Databases*. Springer International Publishing, 2016.

Towards practical machine learning with differential privacy and beyond