SGI F77 COMPILER BASIC TEST - O2 R1000 175 MHz Updated 99/04/13 OBJECTIVE First tests on O2/R10000: to comaper with Origin 2000. DESCRIPTION Machine: O2, R10000, IP32: 175 MHz (350 MFLOPS). Data cache size: 32 Kbytes Instruction cache size: 32 Kbytes Secondary unified instruction/data cache size: 1 Mbyte on Processor 0 Main memory size: 128 Mbytes The machine was running a normal multi-desktop X-windows session during the tests. Program: Matrix multiplication using nested DO loops. Matrix dimensions: ND = 800 (reservation). Timing: dtime (tempd.f) Program details: P25: I-loop inside (vector style), no directives: PROGRAM P25 ... IMPLICIT REAL*8 (A-H,O-Z) PARAMETER (ND = 800, NIN = 5, NOUT = 6) DIMENSION A(ND,ND), B(ND,ND), C(ND,ND) ... DO 26 J = 1,N DO 24 K = 1,N DO 22 I = 1,N C(I,J) = C(I,J) + A(I,K) * B(K,J) 22 CONTINUE 24 CONTINUE 26 CONTINUE ... P25V: calls DGEMM RESULTS Clearly this is not primarily a computational machine, its best results are about 67% compared to over 80% on other SGI R8000 and R10000 machines. Table I. ND = 800, N = 800. Theoretical CPU time lower limit is 2.9 seconds (175 MHz R10000). -------------------------------------------------------------------------- Program Compiler call Threads CPU time -------------------------------------------------------------------------- p25 f77 -Ofast=ip32 6.7 -LNO:ou=2 8.7 -LNO:ou=4 7.0 f77 -r10000 -mips4 -n32 -O3 6.4 *1 -LNO:ou=2 8.7 -LNO:ou=4 6.9 -LNO:ou=6 7.5 p25v f77 -Ofast=ip32 -lcomplib.sgimath 4.4 *2 f77 -r10000 -mips4 -n32 -O3 -lcomplib.sgimath 4.4 -------------------------------------------------------------------------- *1 R = 2.9/6.4 = 0.46, while on R8000 the best R without DGEMM was 0.53 using advanced options. *2 R = 2.9/4.4 = 0.66. Table II. ND = 1600, N = 1600. Theoretical CPU time lower limit is 23.4 seconds (175 MHz R10000). There was no swapping activity associated with the main calculation. -------------------------------------------------------------------------- Program Compiler call Threads CPU time -------------------------------------------------------------------------- p25 f77 -Ofast=ip32 54 *1 -LNO:ou=2 197 -LNO:ou=4 93 -LNO:ou=6 79 f77 -r10000 -mips4 -n32 -O3 53.7 *1 -LNO:ou=2 188 -LNO:ou=4 93 -LNO:ou=6 78 p25v f77 -Ofast=ip32 -lcomplib.sgimath 35.0 *2 f77 -r10000 -mips4 -n32 -O3 -lcomplib.sgimath 35.4 -------------------------------------------------------------------------- *1 R = 23.4/54 = 0.43, while on R8000 the best R without DGEMM was 0.53 using advanced options. *2 R = 23.4/35.0 = 0.67.