COMP3211/COMP9211 Computer Architecture Week 10 Tutorial Exercises Q01. [P3.2] The following code fragment processes an array and produces two important values in registers $v0 and $v1. Assume that the array consists of 5000 words indexed 0 through 4999, and its base address is stored in $a0 and its size (5000) in $a1. Describe in one sentence what this code does. Specifically, what will be returned in $v0 and $v1? add $a1, $a1, $a1 add $a1, $a1, $a1 add $v0, $zero, $zero add $t0, $zero, $zero outer: add $t4, $a0, $t0 lw $t4, 0($t4) add $t5, $zero, $zero add $t1, $zero, $zero inner: add $t3, $a0, $t1 lw $t3, 0($t3) bne $t3, $t4, skip addi $t5, $t5, 1 skip: addi $t1, $t1, 4 bne $t1, $a1, inner slt $t2, $t5, $v0 bne $t2, $zero, next add $v0, $t5, $zero add $v1, $t4, $zero next: addi $t0, $t0, 4 bne $t0, $a1, outer Q02. [P3.3] Assume that the code from Q01 is run on a machine with a 500-MHz clock that requires the following number of cycles for each instruction: Instruction Cycles ----------- ------ add, addi, slt 1 lw, bne 2 In the worst case, how many seconds will it take to execute this code? Q03. [P5.1] Describe the effect that a single stuck-at-0 fault (i.e., regardless of what it should be, the signal is always 0) would have on the multiplexors in the single-cycle datapath in Fig 5.19 on p. 360 [similar to Slide 33, Week 9 Friday lecture]. Which instructions [of those we based our design upon], if any, would still work? Consider each of the following faults separately: RegDst = 0, ALUSrc = 0, MemtoReg = 0, Zero [Equal] = 0. Q04. [P5.12] Consider the following idea: Let's modify the instruction set architecture and remove the ability to specify an offset for memory access instructions. Specifically, all load-store instructions with non-zero offsets would become pseudoinstructions and would be implemented using two instructions. For example: addi $at, $t1, 104 # add the offset to a temporary lw $t0, $at # new way of doing lw $t0, 104 ($t1) What changes would you make to the single-cycle datapath and control if this simplified architecture were to be used? Q05. [P5.13] If the modifications described in Q04 are implemented, there are some definite trade-offs with regard to performance. Specifically, the cycle time may be affected, and all load-store instructions with nonzero offsets would now require an extra addi instruction (a good compiler might find ways to reduce the need for extra addi instructions, but you can ignore this). If there are too many load-store instructions with nonzero offsets, it is likely that the modification would not improve performance. Assuming delays as specified on page 373, what is the highest percentage of load-store instructions with offsets that could be tolerated (i.e., that would still result in the modification having a positive impact on performance)? Q06. [P5.14] In estimating the performance of the single-cycle implementation, we assumed that only the major functional units had any delay (i.e., the delay of the multiplexors, control unit, PC access, sign extension unit, and wires was considered to be negligible). Assume that we change the delays specified on page 373 such that we use a different type of adder for simple addition: * ALU: 2 ns * adder for PC + 4: X ns * adder for branch address computation: Y ns a. What would the cycle time be if X = 3 and Y = 3? b. What would the cycle time be if X = 5 and Y = 5? c. What would the cycle time be if X = 1 and Y = 8?