Machine learning on big data - Mahout, SAMOA and Spark

15:25 - 16:10

Ann Chen

Senior Research Engineer

Ann is a research engineer in Yahoo. She was responsible for implementing recommendation/personalization for Yahoo eCommerce sites. Selected projects include U Want Wall (慾望牆), Yahoo store homepage, and Yahoo shopping mobile homepage. Currently she switched focus to improve eCommerce search experience in Yahoo. Her current interest is to boost CTR and/or conversion by building machine learning models from big data.

Different tools are applicable for different situations while performing machine learning algorithm on big data. In this work, we will introduce two different machine learning tools - Mahout and SAMOA. They are used in batch training and online streaming calculation, respectively. Furthermore, Spark is the next generation map-reduce platform with great performance gain compared to hadoop. Mahout and other machine learning tools are moving toward it. Though it is not very mature yet, we can peek some progresses in this talk.