IACS seminars are generally held every other Friday at lunchtime during the academic year. Students, faculty and others interested in computational science and applied computation are welcome to attend. See the calendar page for ways to find out about future seminars.
Spring 2014 IACS Seminar Series
2014 
January 31, 2014: Bo Peng, Data Scope Analytics (video linked below) February 14, 2014: Leslie Greengard, NYU February 28, 2014: Stratos Idreas, Harvard SEAS (video linked below) March 7, 2014: Johan Bollen, Indiana University (video linked below) March 14, 2014: Raul Jimenez, University of Barcelona (video linked below) March 28, 2014: Devavrat Shah, MIT (video linked below) April 4, 2014: Yaron Singer, Harvard SEAS (video linked below) April 11, 2014: Hadley Wickham, R Studio and Rice University (video linked below) April 25, 2014: Spiros Mancoridis, Drexel University

2014 

April 25, 2014
Spiros Mancoridis The complex computing systems employed by governments, corporations, and other institutions are frequently targeted by cyberattacks designed for espionage and sabotage. The malicious software used in such attacks are typically customdesigned or obfuscated to avoid detection by traditional antivirus software. Our goal is to create a malware detection and classification system that can quickly and accurately detect and classify such malware. We pose the problem of malware detection as a multichannel changepoint detection problem, wherein the goal is to identify the point in time when a system changes from a known clean state to an infected state.


April 11, 2014
Hadley Wickham Expressing yourself in R There are three main time sinks in any data analysis: 1. Figuring out what you want to do. A welldesigned domain specific language (or DSL) tightly coupled to the problem domain can make all three pieces faster. In this talk, I’ll discuss two DSLs built in R: ggvis for visualisation and dplyr for data manipulation. These build on my previous packages ggplot2 and plyr, improving both expressivity and speed. Data visualisation and manipulation are key parts of data analysis. ggvis makes it easy to declaratively describe interactive web graphics. It combines a declarative syntax based on ggplot2 with shiny’s reactive programming model and vega’s declarative JS rendering 

April 4, 2014
Yaron Singer
Information Diffusion Through Adaptive Seeding
In recent years social networking platforms have developed into extraordinary channels for spreading and consuming information. Along with the rise of such infrastructure, there is continuous progress on techniques for spreading information effectively through influential users. 

March 28, 2014
Devavrat Shah Effective CrowdSourcing Crowdsourcing systems provide means to harness human ability at a largescale to solve a variety of problems effectively. Examples abound of classical surveys for collecting opinion of a group to the modern setting of social recommendations. In this talk, we shall discuss effective ways to design crowdsourcing experiments as well as aggregate the information collected. In the context of Mechanical Turk framework, this leads to automated approach for getting a task done at the minimum possible cost. Timepermitting, different variations of the theme will be discussed. This is based on joint work with D. Karger (MIT) and S. Oh (UIUC). 

March 14, 2014
Raul Jimenez Too Many Data and Too Few Parameters: Mapping and Analysing the Whole Universe, A Challenge in Computational Science


March 7, 2014
Johan Bollen Quantifying Collective States from Online Social Networking Data 

February 28, 2014 Stratos Idreos Soon Everyone Will Be a "Data Scientist" or "Data Explorer." What Data Systems Will They Be Using? How far away are we from a future where a data management system sits in the critical path of everything we do? Already today we need to go through a data system in order to do several basic tasks, e.g., to pay at the grocery store, to book a flight, to find out where our friends are and even to get coffee. Businesses and sciences are increasingly recognizing the value of storing and analyzing vast amounts of data. Other than the expected path towards an exploding number of datadriven businesses and scientific scenarios in the next few years, in this talk we also envision a future where data becomes readily available and its power can be harnessed by everyone. What both scenarios have in common is a need for new kinds of data systems which are tailored for data exploration, which are easy to use, and which can quickly absorb and adjust to new data and access patterns onthefly. In this talk, we will discuss this vision as well as recent and on going advances towards systems which are tailored for data exploration. 

February 14, 2014
Leslie Greengard
Dean's Lecture on Computational Science 

January 31, 2014
Bo Peng
Data Science and Design: Fickleness and How We Solve It
When solving problems, data scientists often encounter added layers of complexity when the problems to be solved are not well defined, and their solutions unclear. In these cases, standard, more straightforward approaches fall short, as they are not amenable to vague problems, and are thus not guaranteed to reliably produce useful results. At Datascope Analytics, we adopt methodologies from the design community and use a "continuous feedback loop" to iteratively improve dashboards, algorithms, and data sources to ensure that the resulting tool will be useful and well received. During this talk, I will illustrate our approach by sharing a detailed example from one of our projects. I will end by showing a live demo version of our final visualization tool, using movie data from the Internet Movie Database (IMDB). 

2013 

November 22, 2013 Hugo Larochelle Deep Learning for Distribution Estimation
Deep learning methods attempt to learn a deep and distributed representation of data directly from its lowlevel representation. The motivating argument is that highdimensional data in AIrelated domains (speech, computer vision, natural language) can take a more meaningful representation as a decomposition into several layers of abstractions, decomposing its different factors of variation. Deep learning methods thus try to discover and learn this representation directly from data.
In this talk, I will first discuss the basic concepts and methods behind deep learning, reviewing in particular the impressive advancements to the stateoftheart it has recently permitted in speech recognition and visual object recognition.
I will then present my recent research on using neural networks for the task of distribution/density estimation, a fundamental problem in machine learning. Specifically, I will discuss the neural autoregressive distribution estimator (NADE), a stateoftheart estimator of the probability distribution of data. I will also describe a deep version of NADE, which again illustrates the statistical modelling power of deep models. 

November 8, 2013 Ben Vigoda Probabilistic Programming and Probability Processing We are developing a computing stack for Bayesian inference and machine learning, including integrated circuits, probabilistic programming languages, compilers, and applications. Our first probability processor hardware demonstrates orders of magnitude wins on machine learning and statistical inference benchmarks. We are developing opensource probability programming languages that help enable rapid prototyping and development of statistical machine learning applications. We will demonstrate some applications that we are building on top of the probability processing stack. 

October 25, 2013
Mercè Crosas 10 Simple Rules for the Care and Feeding of Scientific Data
Increasingly, scientific publications and claims are based on everincreasing volumes of data. Once the publication is complete, it is often difficult for others to locate the data and accompanying analyses, and once located, often challenging to make sense of them. For scientific results to continue being subject to verification and extension, we in the scientific community must ensure that good data management, with sufficient transparency and accessibility of data and analyses, become essential and ordinary elements of the research cycle. In this paper, we present 10 simple rules to help scientists towards this goal. 

October 11, 2013 Dmitri "Mitya" Chklovskii Our brains constantly handle big data streamed by our sensory organs. Yet, how this is done in neurons, elementary building blocks of the brain, is not understood. We propose to view a neuron as a signal processing device representing its highdimensional input by a synaptic weight vector scaled by its output. A neuron accomplishes this task by running two online algorithms: a slow algorithm which adjusts synaptic weights to extract the most nonGaussian projection of the highdimensional input, and a fast algorithm which estimates the projection amplitude. Both online algorithms rely on sparsityinducing regularizers and have provable regret bounds. The steps of these algorithms account for the salient physiological features of neurons such as leaky integration, nonlinear output function, Hebbian synaptic plasticity rules, and sparse connectivity and activity. Thus, our work should help model biological neural circuits and develop biologically inspired computing. 

September 27, 2013 Sadasivan Shankar Prediction, Renaissance, and Cognition  3 Questions for Computing With the increasing power of computing, humans appear to be on the verge of a golden era in use of computing to address problems in all areas including energy, health, and information. Extrapolating the everincreasing efficacy of hardware and software, it appears that we are moving towards being totally predictive and even exceeding the computing power of the brain. Based on our work on several aspects of modeling covering areas of chemistry and materials science, we will address the feasibility of such a vision and look back to history and renaissance to distill the lessons for the future of computing. In this journey, we hope to take you back to the future in which prediction has been one of the most sought after goals for humans. 

September 13, 2013 Efthimios Kaxiras Using Computation to Diagnose and Predict Heart Disease The patterns of blood flow in arteries are crucial in determining the onset and progression of heart disease. These patterns can only be captured by simulations, assuming that the important details at different scales are properly described. This presentation will give an overview of our efforts to construct multiscale models of arterial blood flow based on the lattice Boltzmann equation. 
