COMP3211/COMP9211 Computer Architecture Week 7 Tutorial Exercises Q01. [P7.7,7.8,7.20,7.21,7.22] Here is a series of address references given as word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17 a) Assuming a direct-mapped cache with 16 one-word blocks that is initially empty, label each reference in the list as a hit or miss and show the final contents of the cache. b) Show the hits and misses and final cache contents for a direct-mapped cache with four-word blocks and a total size of 16 words. c) Show the hits and misses and final cache contents for a two-way set-associative cache with one-word blocks and a total size of 16 words. Assume LRU Replacement. d) Show the hits and misses and final cache contents for a fully associative cache with one-word blocks and a total size of 16 words. Assume LRU Replacement. e) Show the hits and misses and final cache contents for a fully associative cache with four-word blocks and a total size of 16 words. Assume LRU Replacement. f) Which of the above cache organisations is best for the sequence of references given? Q02. [P7.11] Consider a memory hierarchy using one of the three organisations for main memory shown in the figure of Slide 28 in Lecture 10, http://www.cse.unsw.edu.au/~cs3211/coursework/L101_slides.pdf. Assume that the cache block size is 16 words, that the width of the "Wide memory organisation" of the figure is four words, and that the number of banks in the "Interleaved memory organisation" is four. If the main memory latency for a new access is 10 cycles and the transfer time is 1 cycle, what are the miss penalties for each of these organisations? Q03. [P7.12] Suppose a processor with a 16-word block size has an effective miss rate per instruction of 0.5%. Assume that the CPI without cache misses is 1.2. Using the memories described in the figure of Q02, how much faster is the processor when using the wide memory than when using narrow or interleaved memories? Q04. [Based on P7.38] If all misses are classified into one of three categories - compulsory, capacity, or conflict - which misses are likely to be reduced when a program is rewritten so as to require less memory? How about if the clock rate of the machine that the program is running on is increased? How about if the main data structure used is changed from an array to a tree structure in order to aid searching? Q05. With reference to Slide 27 of Lecture 11, http://www.cse.unsw.edu.au/~cs3211/coursework/L11_slides.pdf, which depicts a five stage pipelined datapath, what factors should be considered in the design of the I- and D-cache? What would style and size of cache would you recommend? State any assumptions you have made.