Building Training Sets for Astronomical Data: Bayesian Feature Trans for Domain Adapt// Dr. Protopapas, IACS Scientific Prgm Director


Friday, March 24, 2017, 12:30pm to 2:00pm

See also: Seminar


Harvard University, Northwest B103, 52 Oxford Street, Cambridge MA 02138

Lunch WILL NOT be served at this seminar. The talk will begin promptly at 1pm.

Abstract: Supervised data mining and machine learning rely on the availability of labeled data. When sufficient training data is available, supervised models achieve high performance in many domains. However, labeled data is scarcer than unlabeled data and much more expensive and difficult to obtain. Moreover, when models that perform well in one setting are applied to data from a different but related domain -- e.g. from a different telescope or sensor -- performance often drops significantly. Additionally, the enormous rate at which unlabeled data is being generated in astronomy greatly surpasses the rate at which labeled data becomes available. Domain adaptation aims to learn from a domain where labeled data is available, the 'domain', and through some adaptation perform well on a different domain, the 'target domain'. In this talk, I will present a new probabilistic model that represents the source and target distributions as two Gaussian mixtures and finds a transformation between the feature spaces of the domains to transfer labeled data between them. Our approach allows working with data available in one domain as if it belonged to the other, enabling the training of models in the target domain from training sets adapted from the source domain. We evaluate our proposal in simulated data and the problem of variable star classification. In the latter, we use data from multiple different astronomical surveys with different characteristics in terms of sensor sensitivity, atmospheric conditions, and data sampling frequency, among others.

Presenter Bio: Pavlos Protopapas is the scientific program director at the Institute for Applied Computational Science (IACS) and a lecturer at the Harvard John A. Paulson School of Engineering and Applied Sciences. He holds a Ph.D. in theoretical physics from the University of Pennsylvania. During his time at Penn he served as the associate director of the National Scalable Cluster Project (NSCP), one of the initial attempts at large-scale distributed computing on a grid-like model. An active collaborator and mentor in the astrostatistics research community, Dr. Protopapas holds a research appointment at the Harvard-Smithsonian Center for Astrophysics and served as senior scientist/project leader for the Time Series Center, a project launched by the Harvard Initiative in Innovative Computing. His general research interests lie in planetary transits, the outer solar system, photometric variability, microlensing; his computer science interests include large databases and data mining in astronomy, with emphasis on feature extraction, anomaly detection, and similarity searches in time series. Protopapas teaches several courses at Harvard including: Applied Math 207, Extreme Computing and the CSE Capstone Project course.

Free and open to the public; no registration required.