#92 - IPC1

None

Rejected

Submission (886kB) May 1, 2020, 11:38:01 AM UTC · e76e219b0381906b04f5afa9065dcea264f08341acadaaaba4444ee2a3fb4a0ae76e219b

L1 instruction cache (L1i) misses are a major source of performance degradation when the instruction footprint cannot be captured in the L1i. Sequential prefetchers like a next-line prefetcher are a common solution to mitigate this problem. However, this approach falls short of efficiency when the program has lots of complex control flow changes that occur frequently. This observation has motivated researchers to suggest a myriad of sophisticated proposals to address this problem. However, we find that there is still significant room for improvement. Hence, in this paper, we introduce a new instruction prefetcher to exploit the available potential. In this paper, we address the L1i cache miss problem using a divide and conquer approach. We carefully analyze why an instruction cache miss occurs and how it can be eliminated. We divide instruction cache misses into sequential and discontinuity misses. A sequential miss is a cache miss that is spatially right after the last accessed block, and the remaining misses are discontinuities. While sequential and discontinuity prefetchers are already proposed, in this paper, we show that the conventional implementation of these prefetchers cannot adequately cover the misses because of their shortcomings. Accordingly, we recommend an enhanced implementation of each prefetcher. We find that for a sequential prefetcher, there is a trade-off between timeliness and accuracy. Consequently, we propose SN4L+wrong SNL prefetcher that attempts to provide both accurate and timely prefetches. Moreover, a conventional discontinuity prefetcher uses a single discontinuity target for each record, and as such, its lookahead is limited to a single discontinuity ahead of the execution stream, which limits its efficiency. On top of that, it records an instruction block address per record that results in a considerable storage cost. We introduce Dis prefetcher to address the shortcomings. Our proposal offers 25% speedup as compared to the baseline without any prefetcher when 128 KB storage budget is provided for it and outperforms the state-of-the-art prefetcher by 3% when a small 8 KB storage budget is provided to it.

A. Ansari, F. Golshan, P. Lotfi-Kamran, H. Sarbazi-Azad

Ali Ansari (Sharif University of Technology) <aansari@ce.sharif.edu>
Fatemeh Golshan (Sharif University of Technology) <fgolshan@ce.sharif.edu>
Pejman Lotfi-Kamran (Institute for Research in Fundamental Sciences (IPM)) <plotfi@ipm.ir>
Hamid Sarbazi-Azad (Sharif University of Technology) <azad@sharif.edu>

code (58kB)

To edit this submission, sign in using your email and password.

	OveMer	RevExp
Review #92A	3	3
Review #92B	2	3
Review #92C	3	3
Review #92D	3	4

Reviews in plain text

PC conflicts

Abstract

Authors (anonymous)