Random Forests going Serverless
Serverless computing has received growing interest in recent years for supporting machine learning tasks. This computational model has desirable advantages as it allows for parallelism of training tasks, exploiting the undoubtedly seamless mechanism for scaling and elastic usage of resources based on the applications’ demands, and improves manageability without the need to know the internals of the underlying technology. Training a machine learning model on top of a serverless environment is a nontrivial procedure since several challenges must be addressed, such as the communication cost of the training data, the communication patterns, the training time, and the cost of execution. In this work, we focus on Random Forests, a state-of-the-art technique in many machine learning applications. We propose STRATA, a cost-effective framework to train Random Forests on top of a serverless environment that addresses the aforementioned training challenges practically and efficiently by at least 57% on average, as we illustrate in our extensive experimental evaluation
- Published in:
MIDDLEWARE '24: Proceedings of the 25th International Middleware Conference - Type:
Inproceedings - Authors:
Tomaras, Dimitrios; Buschjäger, Sebastian; Kalogeraki, Vana; Morik, Katharina; Gunopulos, Dimitrios - Year:
2024
Citation information
Tomaras, Dimitrios; Buschjäger, Sebastian; Kalogeraki, Vana; Morik, Katharina; Gunopulos, Dimitrios: Random Forests going Serverless, MIDDLEWARE '24: Proceedings of the 25th International Middleware Conference, 2024, https://dl.acm.org/doi/abs/10.1145/3652892.3654791, Tomaras.etal.2024a,
@Inproceedings{Tomaras.etal.2024a,
author={Tomaras, Dimitrios; Buschjäger, Sebastian; Kalogeraki, Vana; Morik, Katharina; Gunopulos, Dimitrios},
title={Random Forests going Serverless},
booktitle={MIDDLEWARE '24: Proceedings of the 25th International Middleware Conference},
url={https://dl.acm.org/doi/abs/10.1145/3652892.3654791},
year={2024},
abstract={Serverless computing has received growing interest in recent years for supporting machine learning tasks. This computational model has desirable advantages as it allows for parallelism of training tasks, exploiting the undoubtedly seamless mechanism for scaling and elastic usage of resources based on the applications' demands, and improves manageability without the need to know the internals of...}}