Deploy your own Spark cluster in 4 minutes using sbt.

Pishen Tsai

Engineer
KKBOX

I'm a software developer from the research center of KKBOX, where we do data analysis, data engineering, and machine learning projects on the listening behavior of KKBOX users. I have 5 years Java and 2 years Scala experience. Recently, I'm interested in Scala-related stuff, including functional programming, big data technology (Spark), event-driven distributed system (Akka), and web application (Play Framework). I enjoy using Scala to solve the problems I met both in KKBOX and my daily life.

Setting up multiple machines for a Spark cluster is always a pain, especially for a Hadoop newbie like me. Too many commands and steps to memorize! As a Scala programmer, I know how to use sbt, and I know how to write a Spark job as a sbt project, and these should be all I have to know. In this talk, I will demo a sbt plugin called "spark-deployer", which is able to handle all the dirty works for you, from creating machines to submitting Spark jobs, on a cloud service like AWS EC2.

1. 本場次為中文