In the last few years we have been witnessing advances in hardware and software systems that enabled us to train complex Machine Learning (ML) models on massive datasets. To name a few of these hardware and software systems, we can refer to new generation of GPUs, as well as open source frameworks such as Apache Spark, TensorFlow and Ray. Moreover, advances in parallelization, job scheduling, and robustness have empowered us to build complex ML models more efficiently and at scale. In this course we will carry out a comprehensive survey of the latest trends in ML systems designs and present different techniques to build such systems. The course covers the main components of ML systems, starting from fundamental concepts of ML to more advanced topics such as parallelization and robustness in designing ML systems. Participants in the course will be required to reflect on the arrangement of different techniques, rules, and guidelines to build ML systems and suggest possible extensions to the technology from their own research domains.


Course Staff