IACS seminars are generally held every other Friday at lunchtime during the academic year. (Lunch is served at 12:30pm on a first-come, first served basis, with the seminar beginning promptly at 1pm.) Unless otherwise indicated, all seminars will be held in Maxwell Dworkin G115. Students, faculty and others interested in computational science and applied computation are welcome to attend. Join the IACS Mailing List to find out about future seminars.
2014-15 IACS Seminar Series:
- 2/6 Brian Hayes (IACS) on Orderly Randomness: Quasirandom Numbers and Quasi-Monte Carlo
- 2/13 Ray Jones (IACS) on Connectomics: Extracting Neural Connectivity From Very Large Data Sets
- 2/20 Delaney Granizo-Mackenzie & Rich Frank (Quantopian) on Free Software in Finance
- 3/6 Harvard Graduate Students on Chile-Harvard Innovative Learning Exchange
- 3/13 Alan Aspuru-Guzik (Harvard) on Billions and Billions of Molecules: Exploring Chemical Space
- 3/27 Jeff Bilmes (University of Washington) on Summarizing Large Data Sets ***Location: Harvard Univ. Science Ctr. Hall E (1 Oxford Street, Cambridge MA 02138)***
- 4/10 Budhendra Bhaduri (Oak Ridge National Laboratory) on Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy ***Time change: 12-1pm***
- 4/24 Christian Rudder (OkCupid) on Data: A Love Story: How data science, and a great deal of tinkering, created the biggest dating site in the U.S.
February 6, 2015
Title: Orderly Randomness: Quasirandom Numbers and Quasi–Monte Carlo
Modern computing has an insatiable appetite for randomness. Cryptography and other kinds of adversarial computation demand “true” random numbers, which have three key properties: They are unpredictable, uncorrelated, and unbiased. Most other applications rely on pseudorandom numbers, which give up unpredictability but are still uncorrelated and unbiased. A third kind of randomness is even weaker. Quasirandom numbers are neither unpredictable nor uncorrelated; they claim only to be unbiased. They don’t even “look” random. Nevertheless, in some circumstances quasirandom numbers seem to be superior to pseudorandom ones. For example, they allow faster convergence or better error bounds in certain Monte Carlo simulations. Although quasirandom numbers have been known since the 1950s, some of their useful properties have been recognized only in the past few years, and they are not yet fully understood.
February 13, 2015
Title: Connectomics: Extracting Neural Connectivity From Very Large Data Sets
February 20, 2015
Delaney Granizo-Mackenzie, Rich Frank, and Andrew Campbell
Title: Free Software in Finance
Write code to invest money. Quantopian offers a free platform for you to develop, backtest, and execute trading strategies against the market. Our platform is based entirely in Python and we are currently getting ready to launch our newest tool, an in-browser iPython notebook with access to historical market data. We will go over how our platform can be used to develop a trading strategy and how you can run a hedge fund from your home.
March 6, 2015
Diana Zhang, Sabrina Zhou, Xufei Wang, Lyla Fadden, Kim Minjae
Title: Chile-Harvard Innovative Learning Exchange
Five Harvard graduate students will discuss their recent research trip to Chile to analyze data from the Dark Energy Camera (DECam). The DECam is one of the instruments used in the Dark Energy Survey (DES), an international effort “designed to probe the origin of the accelerating universe and help uncover the nature of dark energy." Collaborating with students at the University of Chile, Harvard students were separated into small teams and ask to identify and classify objects as galaxies, stars or asteroids.
Designed by IACS Scientific Program Director and Lecturer Pavlos Protopapas, the two-week Chile-Harvard Innovative Learning Exchange Program is in its second year and aims to provide students the experience of working on international teams with noisy and imperfect data sets.
March 13, 2015
Title: Billions and Billions of Molecules: Exploring Chemical Space
Many of the challenges of the twenty-first century are related to molecular processes such as the generation, transmission, and storage of clean energy, water purification and desalination. These transformations require a next generation of more efficient and ecologically-friendly materials. In the life sciences, we face similar challenges, for example drug-resistant bacterial strains require novel antibiotics. One of the paradigm shifts that the theoretical and experimental chemists needs to embrace is that of accelerated molecular discovery: The design cycles need to be sped up by the constant interaction of theoreticians and experimentalists, the use of high-throughput computational techniques, tools from machine learning and big data, and the development of public materials databases. I will describe three projects from my research group that aim to operate in this accelerated design cycle. First, I will describe our efforts on the Harvard Clean Energy Project, a search for materials for organic solar cells. I will continue by talking about our work on developing organic molecules for energy storage in flow batteries. Finally, I will describe our work towards the discovery of novel molecules for organic light-emitting diodes.
March 27, 2015
Location Change: Harvard Univ. Science Ctr. Hall E (1 Oxford Street, Cambridge MA 02138)
Title: Summarizing Large Data Sets
April 10, 2015
Title: Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy
VIDEO: No video available; presentation slides can be downloaded here.
In this rapidly urbanizing world, unprecedented rate of population growth is not only mirrored by increasing demand for energy, food, water, and other natural resources, but has detrimental impacts on environmental and human security. Much of our scientific and technological focus has been to ensure a sustainable future with healthy people living on a healthy planet where energy, environment, and mobility interests are simultaneously optimized. Current geoanalytics are limited in dealing with temporal dynamics that describe observed and/or predicted behaviors of entities i.e. physical and socioeconomic processes. With increasing temporal resolution of geographic data, there is a compelling motivation to couple the powerful modeling and analytical capability of a GIS to perform spatial-temporal analysis and visualization on dynamic data streams. However, the challenge in processing large volumes of high-resolution earth observation and simulation data by traditional GIS has been compounded by the drive towards real-time applications and decision support. The ability to observe and measure through direct instrumentation of our environment and infrastructures, from buildings to planet scale, coupled with explosion of data from citizen sensors, brings much promise for capturing the social/behavioral dimension. Additionally, it provides a unique opportunity to manage and increase efficiencies of existing built environments as well as design a more sustainable future. This presentation will explore the intriguing developments in the world of Big Data, geospatial computing, and plausible ways citizens can all become part of the open data economy for advancing science and society.
This talk is also part of the Geography Colloquium hosted at the Center for Geographic Analysis: http://gis.harvard.edu/events/seminar-series/colloquium.
April 24, 2015
Title: Data: A Love Story: How data science, and a great deal of tinkering, created the biggest dating site in the U.S.
On Friday April 24th, Christian Rudder, Co-founder of OkCupid, will present the Second Annual Dean's Lecture on Computational Science and Engineering (CSE). As part of his talk, Christian will discuss OkCupid's data-based social engineering and online experiments. Free & open to the public.
September 12, 2014
Title: Using big data in epidemiology for digital disease detection: Lessons learned and new directions
Preventing outbreaks of communicable diseases is one of the top priorities of public health officials from all over the world. Although traditional clinical methods to track the incidence of diseases are essential to prevent outbreaks, they frequently take weeks to spot critical epidemiological events. This is mainly due to the multiple clinical steps needed to confirm the appearance and incidence of diseases. Recently, the real time analysis of big data sets such as search queries from Google, posts from Facebook, tweets from Twitter, and article views from Wikipedia, has allowed researchers to identify epidemic events in multiple communities, giving rise to the creation of internet-based public health surveillance tools. These new tools often provide timely epidemiological information to public health decision makers up to two or three weeks ahead of traditional reports.
September 19, 2014
Title: High-throughput screening of crystalline porous materials
This talk will describe the development of tools for rapid screening of these large databases, to automatically select materials whose pore topology may make them most appropriate for a given application. The methods are based on computing the Voronoi tessellation, which provides a map of void channels in a given structure. This is carried out using the software library Voro++, which has been modified to properly account for three-dimensional non-orthogonal periodic boundary conditions. Algorithms to characterize and screen the databases will be described, and an application of the library to search for materials for carbon capture and storage will be discussed.
October 3, 2014
Title: Computational network dynamics of the neocortex
October 10, 2014
D.E. Shaw Research
Title: D. E. Shaw Research Information Session
Our lab has designed and constructed multiple generations of a massively parallel supercomputer called Anton specifically for the execution of molecular dynamics (MD) simulations. Each Anton computer can simulate a single MD trajectory as much as a millisecond or so in duration -- a timescale at which biologically significant phenomena occur. Anton has already generated the world’s longest MD trajectory.
Join us for an overview of our work on parallel algorithms and machine architectures for high-speed MD simulations and a description of the simulations that have helped elucidate the dynamics and functional mechanisms of biologically important proteins.
October 17, 2014
Title: Marrying domain knowledge and computational methods
Astronomy datasets have been large and are getting larger by the day (TB to PB). This necessitates the use of advanced statistics and machine learning for many purposes. However, the datasets are often so large that small contamination rates imply large number of wrong results. This makes blind applications of methodologies unattractive. Astronomical transients are one area where rapid follow-up observations are required based on very little data. We show how the use of domain knowledge in the right measure at the right juncture can improve classification performance. We will bring up various computational methods, some established, some not so established, that are being used for detecting outliers and choosing optimal ones for best science returns. With an eye on PB-sized datasets coming up soon, we use time-series data from existing sky-surveys like the Catalina Real-Time transient Survey (along with auxiliary data) which has covered 80% of the sky several tens to a few hundreds of times over the last decade. We will also bring up an unconnected problem with some parallels - our JPL collaboration for the search of Cancer biomarkers in Early Detection Research Network (EDRN).
October 31, 2014
Title: Data Science at The New York Times
November 14, 2014
William D. Henshaw
Title: Over-coming the fluid-structure added-mass instability for incompressible flowsThe added-mass instability has, for decades, plagued partitioned fluid-structure interaction (FSI) simulations of incompressible flows coupled to light solids and structures. Many current approaches require tens or hundreds of expensive sub-iterations per time-step. In this talk two new stable partitioned algorithms for coupling incompressible flows with both compressible elastic bulk solids and thin structural shells are described. These added-mass partitioned (AMP) schemes require no sub-iterations, can be made fully second-or higher-order accurate, and remain stable even in the presence of strong added-mass effects. Extensions of the schemes to treat large solid motions using deforming overlapping grids and the Overture framework will also be described.
November 21, 2014
Aaron Adcock and Shankar Kalyanaraman
Talk 1: Tree-like Structure in Social and Information Networks
Presenter: Aaron Adcock
It is often noted that social and information networks exhibit tree-like structure and properties. In the past few years several tools have been developed to more closely quantify this structure. I will discuss some of the results of applying these tools to real-world social and information networks. In particular, I will discuss two alternatives for measuring this structure: Gromov hyperbolicity and tree-width.
Talk 2: Data-mining for development
Presenter: Shankar Kalyanaraman
Over the last few years, we have witnessed innovative uses of big data to model and predict complex human behavior and patterns. Google's use of search query data to accurately forecast flu incidence and Ushahidi's crowdsourced crisis maps following the Haiti earthquake in 2010 for quicker and more effective deployment of humanitarian aid are two leading examples in this domain. My research interests draw inspiration from these examples; and in this talk, I will showcase some previous work I have done in disease surveillance and post-conflict violence prevention.
In the end, time-permitting, we’ll briefly chat about data science at Facebook.