Spark and other modern analytics platforms are based on an in-memory computing model. Thus, it is no surprise that making effective use of memory is key to good performance of Spark jobs. But how does Spark use memory? And what can be done to use it more effectively? In this talk I’ll give an overview of memory usage in Spark, identify some memory management challenges, and consider how they might be addressed. This talk is based on joint work with Michael Mior.
Ken Salem is a professor in the Cheriton School of Computer Science at the University of Waterloo, where he is a member of the Data Systems group. He has been involved in data systems research for more than thirty years. Recent honors include best paper awards at VLDB’11 and at ACM’s Symposium on Cloud Computer (SoCC) in 2015. He’s proud to be serving as program co-chair for the 2017 VLDB conference in Munich.