IPC1 Paper #92 Reviews and Comments =========================================================================== Paper #92 Divide and Conquer Instruction Cache Misses Review #92A =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 3. Knowledgeable Paper summary ------------- The paper proposes a combination of an enhanced next 4 lines prefetcher (to capture sequential accesses) and a discontinuity prefetcher. The next 4 lines prefetcher uses a couple of tagless direct mapped tables: The first table attempts to only sequentially prefetch previously demanded lines while the second table aims to catch sequential misses that were incorrectly filtered out by the first table. The discontinuity prefetcher expands on a published work (HPCA-2005), by adding tags to the previous proposed direct mapped tagless table as well as making each entry a circular buffer to catch instruction fanouts. Comments for author ------------------- I like the use of tagless direct mapped tables for the SeqTable and the wrong_SNL_table. I also like the detailed analysis and explanation that lead to the inclusion of the wrong_SNL_table to catch lines filtered away by the SeqTable. Does the SeqTable also need to be periodically reset at some interval? Since it is tagless and direct mapped, over long periods, it will also see aliasing that can incorrectly predict a line to be demanded. Is there any filtering that prevents the wrong_SNL table from storing lines that are actually discontinuities? Since the condition to insert into wrong_SNL seems to be an absence in the RLU filtering queue, it seems like both discontinuities and missed sequential lines can go into the wrong_SNL table. Review #92B =========================================================================== Overall merit ------------- 2. Weak reject Reviewer expertise ------------------ 3. Knowledgeable Paper summary ------------- This paper proposes a kind of hybrid instruction prefetcher that combines an extended sequential prefetcher and discontinuity prefetcher. The proposed algorithm use three tables to keep track of the situation of instruction prefetch. SeqTable, Dis_single and Dis_multiple. Based on the performance evaluation, the proposed prefetcher increases performance 25% better than baseline without any prefetch. Comments for author ------------------- (1) The idea of hybrid instruction prefetcher consists of extended sequential prefetcher and BTB based discontinuity prefetcher is interesting. (2) However, evaluated results are not impressive because it is 25% higher than base-line performance without any prefetching. (3) Each element of hybrid prefetcher is well known and published. Review #92C =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 3. Knowledgeable Paper summary ------------- The prefetcher combines two versions of a sequential prefetcher with a discontinuity prefetcher enhanced over previous work to consider more than one discontinuous target. Comments for author ------------------- The proposal is an improvement over the previous work of sequential+discontinuity prefetching. It performs favorably to several configurations and next-line prefetching from previous work. The speedup is reasonable. Section IIB is a little hard to follow but I believe I got the gist of the idea. Review #92D =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 4. Expert Paper summary ------------- The authors propose a hybrid prefetching scheme comprised of an improved version of a sequential prefetcher and an improved version of a discontinuity prefetcher. The authors describe how the discontinuity prefetcher could use pre-decode bits to reduce their storage requirements, but do not currently rely on this. The proposed scheme is shown to outperform RDIP and shotgun. Comments for author ------------------- For the experiments with Boomerang, what indirect predictor do you assume? The pattern that is captured by the discontinuity component would potentially be captured by an indirect predictor, which would make Boomerang get these patterns. When you enhance the prefetchers to support SeqonDis and DisonDis how many cycles do you assume per prefetch request generated? It would be nice to see the performance of each of the different components and their contribution (maybe a stacked bar in Figure 2).