Previous Seminars

November 17, 2017

Using Knockoffs to Find Important Variables with Statistical Guarantees

Speaker: Lucas Janson, Assistant Professor in Statistics, Harvard University

Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, Dr. Janson and his colleagues propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Their approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although they show preliminary evidence that their procedure is robust to unknown/estimated distributions. As they require no knowledge/assumptions about the conditional distribution of the response, they effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To their knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, they demonstrate the superior power of knockoffs through simulations. They have applied their procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

This is joint work with Emmanuel Candes at Stanford and Yingying Fan and Jinchi Lv at USC.

November 10, 2017

Extreme Scale Computing, Big Data Science and Web of Life Network Science

Speaker: Manju Manjunathaiah, Lecturer on Computation, Harvard University

The first part of Professor Manjunathaiah’s talk will explore two leading formal models of concurrency in computer science, the polyhedral and CSP, as a distinct approach to extreme scale computing.  In the second part, he will present three grand challenge areas as exemplars of extreme scale big data science: environmental science (climate modelling), genomics life science (tree of life) and computational neuroscience (deep learning).  Here the underlying scaling characteristics are energy-efficiency, resilience and predictive capability.  Prof. Manjunathaiah will highlight some current research which explores distance minimisation, self-organising and asynchronous data flow computational principle for extreme scale data science.

The final part of the talk is a curiosity driven exploratory research under the “theoretical computer science meets biological phenomena” premise that radical advancements in deep measurements of all life on this planet is bringing two grand biological phenomena into the realms of computer science and with deep computations at the extreme scales offers new avenues for a big data science from productive cross-collaboration between the sciences.  Professor Manjunathaiah will highlight some computational principles to investigate this grand goal of modelling the eco-system continuous dynamics of “web of life” to account for “information domain” (network dynamics) of biological phenomena along with matter and energy. Networks permeate all scales of life — from genes to the web of life.

Dr. Manjunathaiah's presentation slides can be found here.

November 3, 2017

Adventures in Analytics

Speaker: Bob Rogers, Chief Data Scientist for Analytics and AI, Intel

The world is an amazing place for data scientists. Bob Rogers, Chief Data Scientist for Analytics and AI at Intel Corporation, will describe his experiences as a leader in analytics and AI. He will share his perspective on what makes a great data scientist, how he defines data science, analytics and Artificial Intelligence, real insights into the day to day life of a data scientist, and an overview of the model creation pipeline. Bob believes that the opportunities to apply advanced analytics to improve the lives of people are boundless, and will demonstrate his work with the “Intel Inside, Safer Children Outside” program, which applies analytics and AI to fight child sex trafficking and child exploitation online.

October 27, 2017

Reinforcement Learning for Healthcare

Speaker: Finale Doshi-Velez, Assistant Professor of Computer Science, Harvard University

Many healthcare problems require thinking not only about the immediate effect of a treatment, but possible long-term ramifications. For example, a certain drug cocktail may cause an immediate drop in viral load in HIV, but also cause the presence of resistance mutations that will reduce the number of viable treatment options in the future. Within machine learning, the reinforcement learning framework is designed to think about decision-making under uncertainty when decisions may have long-lasting effects. However, translating these formalisms to real settings with messy, partial data creates many challenges. Prof. Doshi-Velez will discuss innovations in her research group to apply these paradigms to real problems in healthcare: treating patients with sepsis and managing patients with HIV.

October 20, 2017

Theory Methods to Describe Transport and Dynamics in Quantum Materials

Speaker: Prineha Narang, Assistant Professor of Computational Materials Science

There is consensus in the field that in the post-Moore’s law era of electronics, there is a critical need to understand ultrafast dynamics of materials, non-equilibrium transport and discover new quantum-engineered materials to design devices of the future. In this context Dr. Narang will share her research group’s recent computational work in two interconnected areas: quantum materials-by-design, including electron-electron and electron-phonon calculations in van der Waals heterostructures, and a new far-from-equilibrium transport method, applied to faceted nanostructures. Narang will also present some ideas in defects as engineered quantum emitters to surpass the vacancy centers in diamond that her group is working on.

September 15, 2017

Machine Learning for Small Business Lending

Speaker: Thomson Nguyen, Head of Data Science, Square Capital

In the era of Electronic Health Records, it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care—generating evidence from the collective experience of patients. We will discuss methods that transform unstructured EHR data into a de-identified, temporally ordered, patient-feature matrix.  We will also review use-cases, which use the resulting de-identified data, for pharmacovigilance, to reposition drugs, build predictive models, and drive comparative effectiveness studies in a learning health system.

September 8, 2017

Big Data Software: What's Next?

Speaker: Mike Franklin, Chairman, Department of Computer Science, University of Chicago

Starting a business is hard--at least 65% of small businesses in the United States fail in their first five years of operation. Among the biggest reasons cited for business failure is a lack of working capital to get started or to scale. In this talk, Nguyen will share his team's current work in machine learning on small business loan eligibility as it pertains to credit default risk mitigation, as well as challenges and opportunities in the lending space with some of the more esoteric ML approaches (e.g. why a deep learning black box isn't going to cut it.)