The Big Data revolution has been enabled by a wealth of innovation in software platforms for data storage, analytics, and machine learning. The design of Big Data platforms such as Hadoop and Spark focused on scalability, fault-tolerance and performance. As these and other systems increasingly become part of the mainstream, the next set of challenges are becoming clearer. Requirements for performance are changing as workloads and hardware evolve. But more fundamentally, other issues are moving to the forefront. These include ease of use for a wide range of users, security, concerns about privacy and potential bias, and the perennial problems of data quality and integration from heterogeneous sources. In this talk, Dr. Franklin will give an overview of how we got here, with an emphasis on the development of the Apache Spark system. He will then focus on these emerging issues with an eye towards where the academic research community can most effectively engage.
Michael Franklin is the Liew Family Chair of Computer Science at the University of Chicago where he also serves as senior advisor to the provost on computation and data science. Previously he was at UC Berkeley where he was the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Science Division. He co-founded Berkeley’s AMPLab, a leading academic big data analytics research center, and served as an executive committee member for the Berkeley Institute for Data Science, a campus-wide initiative to advance data science environments. Michael is a Fellow of ACM and a two-time recipient of the ACM SIGMOD “Test of Time” award.
IACS Seminars are free and open to the public. Lunch will be served from 12:30-1pm on a first-come, first served basis. The talk will begin promptly at 1pm.