IACS Seminars

IACS seminars are generally held every other Friday at lunchtime during the academic year. (Lunch is served at 12:30pm on a first-come, first served basis, with the seminar beginning promptly at 1pm.) Unless otherwise indicated, all seminars will be held in Maxwell Dworkin G115. Students, faculty and others interested in computational science and applied computation are welcome to attend. Join the IACS Mailing List to find out about future seminars.

2014-15 IACS Seminar Series:


Spring 2015

  • 2/6 Brian Hayes (IACS) on Orderly Randomness: Quasirandom Numbers and Quasi-Monte Carlo
  • 2/13 Ray Jones (IACS) on Connectomics: Extracting Neural Connectivity From Very Large Data Sets
  • 2/20 Delaney Granizo-Mackenzie & Rich Frank (Quantopian) on Free Software in Finance
  • 3/6 Harvard Graduate Students on Chile-Harvard Innovative Learning Exchange
  • 3/13 Alan Aspuru-Guzik (Harvard University) on Billions and Billions of Molecules: Exploring Chemical Space
  • 3/27 Jeff Bilmes (University of Washington) on Summarizing Large Data Sets ***Location: Harvard Univ. Science Ctr. Hall E (1 Oxford Street, Cambridge MA 02138)***
  • 4/10 Budhendra Bhaduri (Oak Ridge National Laboratory) on Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy ***Time change: 12-1pm***
  • 4/24 Christian Rudder (OkCupid)

February 6, 2015

Brian Hayes
Associate at Harvard IACS

Title: Orderly Randomness: Quasirandom Numbers and Quasi–Monte Carlo
VIDEO: .MP4

Modern computing has an insatiable appetite for randomness. Cryptography and other kinds of adversarial computation demand “true” random numbers, which have three key properties: They are unpredictable, uncorrelated, and unbiased. Most other applications rely on pseudorandom numbers, which give up unpredictability but are still uncorrelated and unbiased. A third kind of randomness is even weaker. Quasirandom numbers are neither unpredictable nor uncorrelated; they claim only to be unbiased. They don’t even “look” random. Nevertheless, in some circumstances quasirandom numbers seem to be superior to pseudorandom ones. For example, they allow faster convergence or better error bounds in certain Monte Carlo simulations. Although quasirandom numbers have been known since the 1950s, some of their useful properties have been recognized only in the past few years, and they are not yet fully understood. 

February 13, 2015

Ray Jones
IACS Lecturer, Harvard SEAS

Title: Connectomics: Extracting Neural Connectivity From Very Large Data Sets
VIDEO: .MP4

Connectomics is the study of neural connectivity. Using images from electron microscopes, we automatically identify nanometer-scale structures in microtomed brain tissue. We work with slices and images at a resolution where a cubic sample one millimeter on a side produces more than one petabyte of image data.


To deal with such large datasets requires a mostly-automated approach. Our analysis pipeline uses computer vision, machine learning, and cluster computing to produce an initial version of the connectome within a sample. We then use a web-based proofreading and annotation tool to allow multiple simultaneous users to explore and correct the data within the volume.

February 20, 2015

Delaney Granizo-Mackenzie, Rich Frank, and Andrew Campbell
Quantopian

Title: Free Software in Finance
VIDEO: .MP4

Write code to invest money. Quantopian offers a free platform for you to develop, backtest, and execute trading strategies against the market. Our platform is based entirely in Python and we are currently getting ready to launch our newest tool, an in-browser iPython notebook with access to historical market data. We will go over how our platform can be used to develop a trading strategy and how you can run a hedge fund from your home.

March 6, 2015

Diana Zhang, Sabrina Zhou, Xufei Wang, Lyla Fadden, Kim Minjae

Title: Chile-Harvard Innovative Learning Exchange
VIDEO:

Five Harvard graduate students will discuss their recent research trip to Chile to analyze data from the Dark Energy Camera (DECam).  The DECam is one of the instruments used in the Dark Energy Survey (DES), an international effort “designed to probe the origin of the accelerating universe and help uncover the nature of dark energy."  Collaborating with students at the University of Chile, Harvard students were separated into small teams and ask to identify and classify objects as galaxies, stars or asteroids.

Designed by IACS Scientific Program Director and Lecturer Pavlos Protopapas, the two-week Chile-Harvard Innovative Learning Exchange Program is in its second year and aims to provide students the experience of working on international teams with noisy and imperfect data sets.

March 13, 2015

Alan Aspuru-Guzik
Department of Chemisty and Chemical Biology, Harvard University

Title: Billions and Billions of Molecules: Exploring Chemical Space
VIDEO:

Many of the challenges of the twenty-first century are related to molecular processes such as the generation, transmission, and storage of clean energy, water purification and desalination. These transformations require a next generation of more efficient and ecologically-friendly materials. In the life sciences, we face similar challenges, for example drug-resistant bacterial strains require novel antibiotics. One of the paradigm shifts that the theoretical and experimental chemists needs to embrace is that of accelerated molecular discovery: The design cycles need to be sped up by the constant interaction of theoreticians and experimentalists, the use of high-throughput computational techniques, tools from machine learning and big data, and the development of public materials databases. I will describe three projects from my research group that aim to operate in this accelerated design cycle. First, I will describe our efforts on the Harvard Clean Energy Project, a search for materials for organic solar cells. I will continue by talking about our work on developing organic molecules for energy storage in flow batteries. Finally, I will describe our work towards the discovery of novel molecules for organic light-emitting diodes.

March 27, 2015

Location Change: Harvard Univ. Science Ctr. Hall E (1 Oxford Street, Cambridge MA 02138)

Jeffrey A. Bilmes
University of Washington

Title: Summarizing Large Data Sets
VIDEO:


The recent growth of available data is both a blessing and a curse for the field of data science. While large data sets can lead to improved predictive accuracy and can motivate research in parallel computing, they can also be plagued with redundancy, leading to wasted computation. In this talk we will discuss a class of approaches to data summarization and subset selection based on submodular functions. We will see how a form of "combinatorial dependence" over data sets can be naturally induced via submodular functions, and how resulting submodular programs (that often have approximation guarantees) can yield practical and high-quality data summarization strategies. The effectiveness of this approach will be demonstrated based on results from a wide range of applications, including document summarization, machine learning training data subset selection (for speech recognition, machine translation, and handwritten digit recognition), image summarization, and assay selection in functional genomics.

April 10, 2015

Budhendra Bhaduri
Corporate Research Fellow & Leads the Geographic Information Science and Technology (GIST) group at Oak Ridge National Laboratory

Title: Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy

VIDEO:

In this rapidly urbanizing world, unprecedented rate of population growth is not only mirrored by increasing demand for energy, food, water, and other natural resources, but has detrimental impacts on environmental and human security.  Much of our scientific and technological focus has been to ensure a sustainable future with healthy people living on a healthy planet where energy, environment, and mobility interests are simultaneously optimized.  Current geoanalytics are limited in dealing with temporal dynamics that describe observed and/or predicted behaviors of entities i.e. physical and socioeconomic processes.  With increasing temporal resolution of geographic data, there is a compelling motivation to couple the powerful modeling and analytical capability of a GIS to perform spatial-temporal analysis and visualization on dynamic data streams.  However, the challenge in processing large volumes of high-resolution earth observation and simulation data by traditional GIS has been compounded by the drive towards real-time applications and decision support.  The ability to observe and measure through direct instrumentation of our environment and infrastructures, from buildings to planet scale, coupled with explosion of data from citizen sensors, brings much promise for capturing the social/behavioral dimension. Additionally, it provides a unique opportunity to manage and increase efficiencies of existing built environments as well as design a more sustainable future.  This presentation will explore the intriguing developments in the world of Big Data, geospatial computing, and plausible ways citizens can all become part of the open data economy for advancing science and society.

This talk is also part of the Geography Colloquium hosted at the Center for Geographic Analysis: http://gis.harvard.edu/events/seminar-series/colloquium.

April 24, 2015

Christian Rudder
OkCupid

Title: tbd
VIDEO:

Fall 2014

September 12, 2014

Mauricio Santillana
Lecturer in applied mathematics at the Harvard School of Engineering and Applied Sciences and an instructor at the Harvard Medical School

Title: Using big data in epidemiology for digital disease detection: Lessons learned and new directions
VIDEO: .MP4

Preventing outbreaks of communicable diseases is one of the top priorities of public health officials from all over the world. Although traditional clinical methods to track the incidence of diseases are essential to prevent outbreaks, they frequently take weeks to spot critical epidemiological events. This is mainly due to the multiple clinical steps needed to confirm the appearance and incidence of diseases. Recently, the real time analysis of big data sets such as search queries from Google, posts from Facebook, tweets from Twitter, and article views from Wikipedia, has allowed researchers to identify epidemic events in multiple communities, giving rise to the creation of internet-based public health surveillance tools. These new tools often provide timely epidemiological information to public health decision makers up to two or three weeks ahead of traditional reports.

September 19, 2014

Chris Rycroft
Assistant Professor of Applied Mathematics, Harvard SEAS

Title: High-throughput screening of crystalline porous materials
VIDEO: .MP4

Crystalline porous materials, such as zeolites, contain complex networks of void channels that are exploited in many industrial applications. A key requirement for the success of any nanoporous material is that the chemical composition and pore topology must be optimal for a given application. However, this is a difficult task, since the number of possible pore topologies is extremely large: thousands of materials have been already been synthesized, and databases of millions of hypothetical structures are available.

This talk will describe the development of tools for rapid screening of these large databases, to automatically select materials whose pore topology may make them most appropriate for a given application. The methods are based on computing the Voronoi tessellation, which provides a map of void channels in a given structure. This is carried out using the software library Voro++, which has been modified to properly account for three-dimensional non-orthogonal periodic boundary conditions. Algorithms to characterize and screen the databases will be described, and an application of the library to search for materials for carbon capture and storage will be discussed.

October 3, 2014

Nima Dehghani
Wyss Institute
VIDEO: .MP4

Title: Computational network dynamics of the neocortex

Network activity is a key aspect of neocortical computation. Whether the system portrays spatiotemporal assemblies, acts in a balanced regime, or if it follows a self-organized critical regime, are all among the fundamental organizing principles of neocortical computation. This talk will overview our recent findings from high-density ensemble recordings from the neocortex of humans and higher mammals such as monkey and cat. I will portray a detailed morpho-functional characterization of neuronal activity, functional connectivity at the microcircuit level, and the interplay of excitation and inhibition in the human neocortex. The discussion will extend to the examination of self-organized criticality in neural avalanche dynamics in different in vivo preparations during wakefulness, slow-wave sleep, and REM sleep, from cat to monkey and man. I will then show that the large ensemble of units show a remarkable excitatory and inhibitory balance, at multiple temporal scales, and for all brain states, except seizures, showing that balanced excitation-inhibition is a fundamental feature of normal brain activity.

October 10, 2014

D.E. Shaw Research
Gennette Gill & Alexander Ramek

Title: D. E. Shaw Research Information Session

D. E. Shaw Research is an independent research laboratory that conducts basic scientific research in the field of computational biochemistry under the direct scientific leadership of Dr. David E. Shaw. Our group is currently focusing on molecular simulations involving proteins and other biological macromolecules of potential interest from both a scientific and a pharmaceutical perspective. Members of the lab include computational chemists and biologists, computer scientists and applied mathematicians, and computer architects and engineers, all working collaboratively within a tightly coupled interdisciplinary research environment.

Our lab has designed and constructed multiple generations of a massively parallel supercomputer called Anton specifically for the execution of molecular dynamics (MD) simulations. Each Anton computer can simulate a single MD trajectory as much as a millisecond or so in duration -- a timescale at which biologically significant phenomena occur. Anton has already generated the world’s longest MD trajectory.

Join us for an overview of our work on parallel algorithms and machine architectures for high-speed MD simulations and a description of the simulations that have helped elucidate the dynamics and functional mechanisms of biologically important proteins.

October 17, 2014

Ashish Mahabal
Staff Scientist in Computational Astronomy at Caltech
VIDEO: .MP4

Title: Marrying domain knowledge and computational methods

Astronomy datasets have been large and are getting larger by the day (TB to PB). This necessitates the use of advanced statistics and machine learning for many purposes. However, the datasets are often so large that small contamination rates imply large number of wrong results. This makes blind applications of methodologies unattractive. Astronomical transients are one area where rapid follow-up observations are required based on very little data. We show how the use of domain knowledge in the right measure at the right juncture can improve classification performance. We will bring up various computational methods, some established, some not so established, that are being used for detecting outliers and choosing optimal ones for best science returns. With an eye on PB-sized datasets coming up soon, we use time-series data from existing sky-surveys like the Catalina Real-Time transient Survey (along with auxiliary data) which has covered 80% of the sky several tens to a few hundreds of times over the last decade. We will also bring up an unconnected problem with some parallels - our JPL collaboration for the search of Cancer biomarkers in Early Detection Research Network (EDRN).

October 31, 2014

Chris Wiggins
Chief Data Scientist, The New York Times
VIDEO: .MP4

Title: Data Science at The New York Times

The New York Times is a technology company which aims not only to produce great content, but also to ensure the reach and impact of its journalism.  In terms of machine learning tasks, there is a growing effort within the engineering division to reframe many of the central and most necessary business goals to maximize the paper's reach.  Chris will give examples of machine learning challenges he has addressed in his role as Chief Data Scientist at The New York Times, and illustrate how they compare with Data Science as understood in the natural sciences. He will also answer questions about working at NYT.

November 14, 2014

William D. Henshaw
Margaret A. Darrin Distinguished Professor in Applied Mathematics at Rensselaer Polytechnic Institute
VIDEO: .MP4

Title: Over-coming the fluid-structure added-mass instability for incompressible flows

The added-mass instability has, for decades, plagued partitioned fluid-structure interaction (FSI) simulations of incompressible flows coupled to light solids and structures. Many current approaches require tens or hundreds of expensive sub-iterations per time-step. In this talk two new stable partitioned algorithms for coupling incompressible flows with both compressible elastic bulk solids and thin structural shells are described. These added-mass partitioned (AMP) schemes require no sub-iterations, can be made fully second-or higher-order accurate, and remain stable even in the presence of strong added-mass effects. Extensions of the schemes to treat large solid motions using deforming overlapping grids and the Overture framework will also be described.

November 21, 2014

Aaron Adcock and Shankar Kalyanaraman
Facebook

Talk 1: Tree-like Structure in Social and Information Networks
Presenter: Aaron Adcock

It is often noted that social and information networks exhibit tree-like structure and properties.  In the past few years several tools have been developed to more closely quantify this structure.  I will discuss some of the results of applying these tools to real-world social and information networks.  In particular, I will discuss two alternatives for measuring this structure: Gromov hyperbolicity and tree-width.

Talk 2: Data-mining for development
Presenter: Shankar Kalyanaraman

Over the last few years, we have witnessed innovative uses of big data to model and predict complex human behavior and patterns. Google's use of search query data to accurately forecast flu incidence and Ushahidi's crowdsourced crisis maps following the Haiti earthquake in 2010 for quicker and more effective deployment of humanitarian aid are two leading examples in this domain. My research interests draw inspiration from these examples; and in this talk, I will showcase some previous work I have done in disease surveillance and post-conflict violence prevention.

In the end, time-permitting, we’ll briefly chat about data science at Facebook.

2013-14 IACS Seminars
2012-13 IACS Seminars
2011-12 IACS Seminars