How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching
Software-based prefetching is a powerful method for tolerating access penalties that are encountered by data processing systems: memory latency. Although the idea appears straightforward—simply informing the CPU about upcoming data accesses—the intricacies of its implementation remain insufficiently understood. Existing works demonstrate how to rewrite algorithms for prefetching, yet they often overlook the limitations and hardware implications of bringing data into the cache hierarchy. In this paper, we examine software-based prefetching thoroughly by delving into its implementation and identifying pitfalls across various platforms. Furthermore, we provide actionable insights and recommendations for developers seeking to boost their applications through this technique.
- Published in:
DaMoN '24: Proceedings of the 20th International Workshop on Data Management on New Hardware - Type:
Inproceedings - Authors:
Kühn, Roland; Mühlig, Jan; Teubner, Jens - Year:
2024
Citation information
Kühn, Roland; Mühlig, Jan; Teubner, Jens: How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching, DaMoN '24: Proceedings of the 20th International Workshop on Data Management on New Hardware, 2024, https://dl.acm.org/doi/abs/10.1145/3662010.3663451, Kuehn.etal.2024a,
@Inproceedings{Kuehn.etal.2024a,
author={Kühn, Roland; Mühlig, Jan; Teubner, Jens},
title={How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching},
booktitle={DaMoN '24: Proceedings of the 20th International Workshop on Data Management on New Hardware},
url={https://dl.acm.org/doi/abs/10.1145/3662010.3663451},
year={2024},
abstract={Software-based prefetching is a powerful method for tolerating access penalties that are encountered by data processing systems: memory latency. Although the idea appears straightforward---simply informing the CPU about upcoming data accesses---the intricacies of its implementation remain insufficiently understood. Existing works demonstrate how to rewrite algorithms for prefetching, yet they...}}