COMPUTER BENCHMARKS

[Scales icon]

INTRODUCTION

This page presents benchmarks and code efficiency data measured at the Department of Theoretical Physics as well as comparisons with well-known benchmarks. Lack of time and portability requirements often prevent code from being optimized, but a minimum efficiency should be obtained for better use of facilities.

SUMMARY

I. USER BENCHMARKS

I.1. Operation counts, absolute efficiencies

  • The CFHHM benchmark (R. Krivec). Measurements of my physics code which is capable of the most precise solution of the three-particle Schroedinger equation. The code mostly performs linear algebra, so its performance is correlated with the Linpack benchmark. Example: on a Power Challenge L, efficiency is up to 70 percent (matrix multiplication reaches 83 percent). The same compiler options are favorable in both cases.
  • Matrix multiplication summary (R. Krivec) on different machines. Also exposes the computational sancta simplicitas. For more detail see here.
  • Ladhall (P. Prelovsek) (in progress).
  • QLM (R. Krivec) (in progress).

I.2. No operation counts, correlations only

  • Wiener (I. Vilfan). Correlated with MIPS on most machines.
  • Klrp (A. Ramsak). Correlated with theoretical MFLOPS.
  • Quantum Monte Carlo (D. Veberic). Correlated with theoretical MFLOPS.
  • Kondo (K. Haule). Correlated with theoretical MFLOPS using gcc or SGI CC.

II. GENERAL BENCHMARKS

II.1. HINT benchmark

  • HINT benchmark: linear scale and logarithmic scale comparison on selected machines (Saturn: SGI Power Challenge L, 6 proc., 2 GB; Dune: Sun Enterprise 4500, 8 proc., 8 GB; Tink: Digital PW 433au (Alpha), 128 MB.)
  • HINT benchmark: results overview on local machines, logarithmic scale.
  • HINT benchmark: gcc, kcc comparison on Dune (Sun Enterprise 4500, 8 proc., 8GB). We get the opposite result, see Kondo (K. Haule) results.

II.2. Memory throughput

  • Memory throughput benchmark by D. Veberic (variant of HINT): overview 1 overview 2 overview 3 on different machines, logarithmic scale; Sun E4500 (14 proc., 28 GB RAM) (this is called "Sun 400 MHz in the previous "overview" graphs). For GS160 (32 GB), shows crossbar speed to local memory (getting 500 MB/s out of theoretical 800 MB/s, unidirectional) and presumably a 1:3 latency to non-local memory (getting 200 MB/s out of theoretical 400 MB/s unidirectional), appearing above 32 GB. This benchmark is correlated with 1/2 of STREAM COPY benchmark, and also with processor MHz! It does reveal bad scaling in some crossbar machines when going from 1 to 2 processors; see attempted explanation.

III. TESTS AND OPTIMIZATIONS ON LOCAL MACHINES

IV. REFERENCE BENCHMARKS

IV.1. Local

IV.2. Selected data from well-known benchmarks

IV.3. Related information

IV.4. Benchmark code (R. Krivec)

ACKNOWLEDGEMENT

I thank my colleagues for supplying their measurements and Mark Martinec for HINT measurements on our machines.

R. Krivec