2016-2017

April 14, 2017

Exascale and Extreme Data Science at NERSC

Speaker: Sudip Dosanjh, Lawrence Berkeley National Laboratory

Dr. Dosanjh's talk will explore the extreme data science occuring at the National Energy Research Scientific Computing (NERSC) Center.  NERSC’s primary mission is to accelerate scientific discovery at the U.S. Department of Energy's Office of Science through high performance computing and data analysis. NERSC supports the largest and most diverse research community of any computing facility within the DOE complex, providing large-scale, state-of-the-art computing for DOE’S unclassified research programs in alternative energy sources, environmental science, materials research, astrophysics and other science areas related to DOE’s science mission.

NERSC’s new supercomputer, Cori, is  deployed in Berkeley Laboratory’s new Computational Research and Theory (CRT) Facility. Cori has over 9300 manycore Intel Knight’s Landing processors, which introduce several technological advances, including higher intra-node parallelism; high-bandwidth, on-package memory; and longer hardware vector lengths. These enhanced features are expected to yield significant performance improvements for applications running on Cori. In order to take advantage of the new features, however, application developers will need to make code modifications because many of today’s applications are not optimized to take advantage of the manycore architecture and on-package memory.

April 7, 2017

Building A Machine Learning Health System

Speaker: Dr. Nigam Shah, Stanford University

In the era of Electronic Health Records, it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care—generating evidence from the collective experience of patients. We will discuss methods that transform unstructured EHR data into a de-identified, temporally ordered, patient-feature matrix.  We will also review use-cases, which use the resulting de-identified data, for pharmacovigilance, to reposition drugs, build predictive models, and drive comparative effectiveness studies in a learning health system.

March 31, 2017

Computing..Data..Science..Society: On Connecting the Dots

Speaker: Dr. Petros Koumoutsakos, ETH

Big data and exascale computers suggest tremendous potential for progress in science and society —  they are instrumental in the design of every car, in the mapping of the human genome, and in raising awareness of climate change. Large-scale computing can assist commuting and shopping and fuels the growth of companies that can decipher patterns of our personal preferences. The power of big data and exascale computers stems from their potential for pattern recognition and prediction that surpasses human capabilities for handling complexity. 

We often hear that machines that learn can offer major advances, yet also create danger for mankind. In this talk, Dr. Koumoutsakos will take a critical stand on this premise,  considering that data and computers are powerful tools for advancing science across all disciplines and have always been embedded in human thinking that strives for understanding and prediction assisted by observations.  The focus on digital-everything may create false impressions about science for society, and even more, endanger the personality, liberty and property of the members of society. We must be reminded that it is the acquisition and processing of data that largely defines the data’s value.  Our own assumptions largely determine computer outputs.  Computers can assist but cannot repace fundamental forms of human thinking, including philosophy and mathematics. Computing — the process of integrating computers and transforming data and thinking into insight and prediction — is an essential part of humanity that we must continue to curate.

March 24, 2017

Building Training Sets for Astronomical Data: A Bayesian Feature Transformation for Domain Adaptation

Speaker: Dr. Pavlos Protopapas, IACS Scientific Program Director, Harvard University

Supervised data mining and machine learning rely on the availability of labeled data. When sufficient training data is available, supervised models achieve high performance in many domains. However, labeled data is scarcer than unlabeled data and much more expensive and difficult to obtain. Moreover, when models that perform well in one setting are applied to data from a different but related domain -- e.g. from a different telescope or sensor -- performance often drops significantly. Additionally, the enormous rate at which unlabeled data is being generated in astronomy greatly surpasses the rate at which labeled data becomes available. Domain adaptation aims to learn from a domain where labeled data is available, the 'domain', and through some adaptation perform well on a different domain, the 'target domain'. In this talk, I will present a new probabilistic model that represents the source and target distributions as two Gaussian mixtures and finds a transformation between the feature spaces of the domains to transfer labeled data between them. Our approach allows working with data available in one domain as if it belonged to the other, enabling the training of models in the target domain from training sets adapted from the source domain. We evaluate our proposal in simulated data and the problem of variable star classification. In the latter, we use data from multiple different astronomical surveys with different characteristics in terms of sensor sensitivity, atmospheric conditions, and data sampling frequency, among others.

March 3, 2017

Socially Assistive Robotics: Creating Robots That Care

Speaker: Dr. Maja Matarić, University of Southern California

Annual Dean's Lecture on Computational Science and Engineering.

Socially assistive robotics (SAR) is a new field of intelligent robotics that focuses on developing machines capable of assisting users through social rather than physical interaction. The robot’s physical embodiment is at the heart of SAR’s effectiveness, as it hinges on the inherently human tendency to engage with lifelike (but not necessarily human-like or otherwise biomimetic) agents. People readily ascribe intention, personality, and emotion to robots; SAR leverages this engagement stemming from non-contact social interaction involving speech, gesture, movement demonstration and imitation, and encouragement, to develop robots capable of monitoring, motivating, and sustaining user activities and improving human learning, training, performance and health outcomes. Human-robot interaction (HRI) for SAR is a growing multifaceted research area at the intersection of engineering, health sciences, neuroscience, social, and cognitive sciences.  This talk will describe our research into embodiment, modeling and steering social dynamics, and long-term user adaptation for SAR. The research will be grounded in projects involving analysis of multi-modal activity data, modeling personality and engagement, formalizing social use of space and non-verbal communication, and personalizing the interaction with the user over a period of months, among others. The presented methods and algorithms will be validated on implemented SAR systems evaluated by human subject cohorts from a variety of user populations, including stroke patients, children with autism spectrum disorder, and elderly with Alzheimer's and other forms of dementia.

February 24, 2017

Computer Vision for Connectomics

Speaker: Toufiq Parag, Harvard University

Connectomics is an area of neuroscience aimed to discover the neuron shape and interconnections (neural network) within animal brain. A comprehensive knowledge of the biological neural network is critically important for a complete understanding of brain functionality. Recent advances in Electron Microscopic (EM) imaging have enabled us to capture the minuscule cellular processes at nanometer scale.  However, such ultra-high resolution recording also produces a massive amount of data that must be analyzed to extract the neurons. Practical neural reconstruction approaches rely on machine learning and computer vision algorithms such as segmentation and synaptic junction detection to perform this task. In this talk, I will describe how these tasks are addressed in multiple EM neural reconstruction efforts and mention a few biological discoveries made from some of these efforts.

February 17, 2017

Boltzman and The Lattice: A Very Happy Computational Marriage

Speaker: Sauro Succi, IAC-CNR Rome, Italy and Harvard IACS

Over the last three decades, lattice formulations of Boltzmann's kinetic equation have blossomed into a very powerful tool for the numerical simulation of complex states of flowing matter across a broad range of scales.  From fully-developed turbulence to multiphase microflows, all the way down to biopolymer translocation in nanopores and, lately, even electronic structure calculations and quantum-relativistic hydrodynamics in curved spaces.  After a brief introduction to the main ideas behind the Lattice Boltzmann (LB) method, we shall illustrate a few representative applications and outline prospects for future large-scale LB simulations in physics, biology and the frontier thereof.

December 2, 2016

Opportunities and Perils in Data Science

Speaker: Alfred Z. Spector, Two Sigma Investments

Over the last few decades, empiricism has become the third leg of computer science, adding to the field’s traditional bases in mathematical analysis and engineering.  This shift has occurred due to the sheer growth in the scale of computation, networking and usage as well as progress in machine learning and related technologies.  Resulting data-driven approaches have led to extremely powerful prediction and optimization techniques and hold great promise, even in the humanities and social sciences.  However, no new technology arrives without complications:  In this presentation, I will balance the opportunities provided by big data and associated A.I. approaches with a discussion of the various challenges.   I’ll enumerate ten categories including those which are technical (e.g., resilience and complexity), societal (e.g., difficulties in setting objective functions or understanding causation), and humanist (e.g., issues relating to free-will or privacy).  I’ll provide many example problems, and make suggestions on how to address some of the unanticipated consequences of Big Data.

November 18, 2016

Controlling Multi-Contact Robot Behaviors with Optimization

Speaker: Scott Kuindersma, Harvard University

Despite the existence of incredibly capable robot hardware, the limitations of our best planning and control algorithms have prevented us from unleashing these machines in critical exploration, automation, and disaster response applications. Many key behaviors, including locomotion and manipulation, involve robots making intermittent frictional contact with their environments. This simple fact has significant computational ramifications, often leading to challenging mixed-integer or nonlinear complementarity problems. This talk will summarize the Harvard Agile Robotics Lab's research on designing optimization algorithms that improve our ability to plan and control contact-rich motions with humanoid robots.

November 11, 2016

Random Sampling: From Surveys to Big Data

Speaker: Edith Cohen, Research scientist at Google (Mountain View); Visiting Professor at the School of Computer Science at Tel Aviv University in Israel

Random sampling is a classic tool for surveying properties and statistics of populations: Samples capture the essence of the data so that properties of the data can be approximated by estimators applied to the sample. Sampling schemes are tailored to the tasks at hand, and seeks to balance size, approximation quality, and computation.

Historically, sampling is as old as human learning. Some landmarks are its use by Laplace (1802) to estimate the population of France, and first use by the US census (1938) to estimate unemployment rate.  With the emergence of massive data sets, sampling became an essential tool for scaling up computation (numerical optimization, clustering, submodular maximization) and leveraging data such as traffic or activity logs that are too large to process or store longer term.

In this talk, Dr. Cohen will highlight some favourite selected applications and sampling schemes.  In particular, samples as locality-sensitive hashes, multi-objective samples, and sampling of streamed or distributed data.

November 4, 2016

Making Data Matter: Visualization As Communication Medium

Speakers: Fernanda Viegas and Martin Wattenberg, Google

Data is ubiquitous in our lives. It describes our neighborhoods, our cities, weather patterns, it helps track illnesses and contextualize social patterns. In an increasingly data-rich society, there’s a critical need for tools to help people understand and reason about complex information. Our research seeks to make data visualization accessible to everyone: from lay users to data experts. We will present work that exposes kids to complex data, explores the artistic expressiveness of data, uncovers the underworld of cyber crime and augments our knowledge of scientific fields such as machine learning. This approach to visualization as an inclusive communication medium points the way to a future where every citizen can more fully participate in a data-driven society.

October 21, 2016

Socially Assistive Robotics: Creating Robots That Care

Speakers: Maja Mataric, University of Southern California

Socially assistive robotics (SAR) is a new field of intelligent robotics that focuses on developing machines capable of assisting users through social rather than physical interaction. The robot’s physical embodiment is at the heart of SAR’s effectiveness, as it hinges on the inherently human tendency to engage with lifelike (but not necessarily human-like or otherwise biomimetic) agents. People readily ascribe intention, personality, and emotion to robots; SAR leverages this engagement stemming from non-contact social interaction involving speech, gesture, movement demonstration and imitation, and encouragement, to develop robots capable of monitoring, motivating, and sustaining user activities and improving human learning, training, performance and health outcomes. Human-robot interaction (HRI) for SAR is a growing multifaceted research area at the intersection of engineering, health sciences, neuroscience, social, and cognitive sciences.  This talk will describe our research into embodiment, modeling and steering social dynamics, and long-term user adaptation for SAR. The research will be grounded in projects involving analysis of multi-modal activity data, modeling personality and engagement, formalizing social use of space and non-verbal communication, and personalizing the interaction with the user over a period of months, among others. The presented methods and algorithms will be validated on implemented SAR systems evaluated by human subject cohorts from a variety of user populations, including stroke patients, children with autism spectrum disorder, and elderly with Alzheimer's and other forms of dementia. 

October 7, 2016

Computational Lensing Imaging: Using Optics for Computation and Computation for Optics in Miniature Sensors and Imagers

Speaker: David G. Stork, Rambus Labs

The central insight underlying the field of computational sensing and imaging is that the joint design of optics and signal processing to yield a final digital image or estimate of some property of the scene can relax the traditional constraints on optical elements need to make an optical image that "looks good."  In our lensless imagers, binary diffraction gratings with special mathematical properties yield blurry, blob-like optical images that nevertheless contain sufficient information that a digital image of the scene can be computed.

September 23, 2016

Machine Learning for Materials Discovery: Low-LTC Compounds, GrainBoundaries and Superlattices

Speaker: Koji Tsuda, Professor, Department of Computational Biology and Medical Sciences Graduate School of Frontier Sciences, The University of Tokyo

Material discovery driven by machine learning is a reality. I report successful case studies in discovery of low LTC compounds from database, grain boundary optimization and automated design of Si-Ge superlattices.

September 9, 2016

Achieving Superintelligence: Definitions, Datasets, and Defenses

Speaker: Alex Wissner-Gross, President and Chief Scientist of Gemedy

What is the critical path to achieving artificial superintelligence? This talk will explore the computational science and engineering issues associated with defining intelligence, the role of large datasets in accelerating AI breakthroughs, and strategies for detecting and managing the emergence of superhuman AI.