Using Knockoffs to Find Important Variables with Statistical Guarantees // Lucas Janson, Stats Dept., Harvard University

Date: 

Friday, November 17, 2017, 12:30pm to 2:00pm

Location: 

Harvard University, Jefferson 250, 17 Oxford Street, Cambridge MA 02138

Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, Dr. Janson and his colleagues propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Their approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although they show preliminary evidence that their procedure is robust to unknown/estimated distributions. As they require no knowledge/assumptions about the conditional distribution of the response, they effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To their knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, they demonstrate the superior power of knockoffs through simulations. They have applied their procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

This is joint work with Emmanuel Candes at Stanford and Yingying Fan and Jinchi Lv at USC.

IACS Seminars are free and open to the public.  Lunch will be served from 12:30-1pm on a first-come, first served basis.  The talk will begin promptly at 1pm.

See also: Seminar