You are the visitor since last update.

Authors:
Jen-Shiang K. Yu, Jenn-Kang Hwang, Chuan Yi Tang and Chin-Hui Yu
Bioinformatics Center, Department of Biological Science and Technology,
National Chiao Tung University, Hsinchu 300, TAIWAN.

Introduction

This benchmark update (with G98 rev.A11.3) has been performed using the following revisions of Fortran compilers as well as numerical libraries on new platforms including IA32, IA64 and AMD64 under 32-bit mode (AMD64-32): PGI Fortran compiler version 5.0 (IA32 and AMD64-32) and Intel Fortran compiler version 7.1, build 20030307Z (IA32, IA64 and AMD64-32); ATLAS 3.4.1 (IA32), ATLAS 3.5.7 (AMD64-32), Intel MKL 6.0 (IA32, IA64 and AMD64-32), Kazushige's threaded GOTO 0.6 (IA32), threaded GOTO 0.7 (IA64), and AMD Core Math Library (ACML 1.0, AMD64-32). The test results in IBM P690 system is listed as a reference. All of the benchmark utilized a similar strategy to our previous publication. In addition to single-CPU tests, we have tried to perform multiple copies of identical job concurrently (not parallel processing!) in the same machine to quantify the performance impact by multitasking. The outcome has provided information about the ability of memory bus architecture to load multitasking computation.

For further details, please refer to our publication in J. Chem. Inf. Comput. Sci., 44, 635-642 (2004).

 

Tables

Table 1. Hardware Specifications and the Software Configurations of the Tested Platforms in Detail.
Table X. Description of GAUSSIAN 98 Test Jobs.
Table 2. CPU Time Consumption (in Minutes) of Each Test Job by the Alpha500 Machine and Intel Xeon Systems with CL2.5.
Table 3. CPU Time Consumption (in Minutes) of Each Test Job by the Intel Xeon Systems with CL2 as well as the IA64 System.
Table 4. CPU Time Consumption (in Minutes) of Each Test Job by the AMD Opteron System and IBM P690.
Table 5. Performance Correlation between the SpecFP2000 Benchmark and GAUSSIAN 98 Results.
Table 6. The CPU Time Consumption (in Minutes) of Each Test Job Concurrently Executed in Duplicate in the E7505 System.
Table 7. The CPU Time Consumption (in Minutes) of Each Test Job Concurrently Executed in Duplicate in the zx6000 and K8-32 Systems.
Table 8. Throughput Correlation between the SpecFP2000rate Benchmark and GAUSSIAN 98 Results.

 

Conclusion

  1. The revisions of both Fortran compilers (PGI 3.3 to 5.0 and Intel Fortran 6.0 to 7.1) deliver about 3% of performance advantage.
  2. For 32-bit executables, the Intel Fortran can equally accelerate the performance of the processors with SSE2 instruction sets regardless the CPU manufacturers (IA32 or AMD64-32), and can generate better-performing binary codes.
  3. For IA32 systems, the improvements by the optimized numerical libraries, in terms of ATLAS, GOTO, and MKL, are nearly identical, with differences less than 2% in the system with Intel E7505 chipsets. For the AMD64 architecture running 32-bit application in 64-bit Linux OS, ifc can tune binaries as if on Pentium4 clones and invariably accelerate the double-precision FP operations. Significant speed variations between the numerical libraries are observed in the AMD64 platform.
  4. Adjusting the CAS latency to CL2 in the E7505 system can additionally accelerate the speed by 5% compared to the default setting of CL2.5.
  5. The IA64 and AMD64 machines are more efficient to perform multiple computations concurrently than the IA32 architecture, probably due to these machines' larger memory bandwidths.

Technical Notes

The following compilation experiences may be useful to scientists who would like to tune the performance of numerical crunching codes with different C and Fortran compilers as well as numerical libraries, however, the resultant executables need to be examined by careful tests to make sure that they give correct answers. Furthermore, linking the target binaries statically is strongly recommended with ifc 7.1, as it can prevent from the executables to reference mixed version of shared libraries of different compiler revisions which may causes trouble at run-time. Although the binaries occupy larger disk-space, static-linking makes everything clearer especially when the system is installed with muiltiple revisions of Intel or PGI compilers. For Intel compiler 8.0, dynamical linking against libguide.so is the default, as in the release notes performance issues is claimed while linking libguide dynamically.

The authors are NOT responsible for any numerical errors, data loss or system damage resulted from the suggestions that follow.