Authors:
Jen-Shiang K. Yu, Jenn-Kang
Hwang, Chuan Yi Tang and Chin-Hui Yu
Bioinformatics Center, Department of Biological Science and Technology,
National Chiao Tung University,
Hsinchu 300, TAIWAN.
Introduction
This benchmark update (with G98 rev.A11.3) has been performed using the following revisions of Fortran compilers as well as numerical libraries on new platforms including IA32, IA64 and AMD64 under 32-bit mode (AMD64-32): PGI Fortran compiler version 5.0 (IA32 and AMD64-32) and Intel Fortran compiler version 7.1, build 20030307Z (IA32, IA64 and AMD64-32); ATLAS 3.4.1 (IA32), ATLAS 3.5.7 (AMD64-32), Intel MKL 6.0 (IA32, IA64 and AMD64-32), Kazushige's threaded GOTO 0.6 (IA32), threaded GOTO 0.7 (IA64), and AMD Core Math Library (ACML 1.0, AMD64-32). The test results in IBM P690 system is listed as a reference. All of the benchmark utilized a similar strategy to our previous publication. In addition to single-CPU tests, we have tried to perform multiple copies of identical job concurrently (not parallel processing!) in the same machine to quantify the performance impact by multitasking. The outcome has provided information about the ability of memory bus architecture to load multitasking computation.
For further details, please refer to our publication in J. Chem. Inf. Comput. Sci., 44, 635-642 (2004).
Tables
Table 1. Hardware Specifications and the Software Configurations of the Tested
Platforms in Detail.
Table X. Description of GAUSSIAN 98 Test Jobs.
Table 2. CPU Time Consumption (in Minutes) of Each Test Job by the Alpha500
Machine and Intel Xeon Systems with CL2.5.
Table 3. CPU Time Consumption (in Minutes) of Each Test Job by the Intel Xeon
Systems with CL2 as well as the IA64 System.
Table 4. CPU Time Consumption (in Minutes) of Each Test Job by the AMD Opteron
System and IBM P690.
Table 5. Performance Correlation between the SpecFP2000 Benchmark and GAUSSIAN
98 Results.
Table 6. The CPU Time Consumption (in Minutes) of Each Test Job Concurrently
Executed in Duplicate in the E7505 System.
Table 7. The CPU Time Consumption (in Minutes) of Each Test Job Concurrently
Executed in Duplicate in the zx6000 and K8-32 Systems.
Table 8. Throughput Correlation between the SpecFP2000rate Benchmark and GAUSSIAN
98 Results.
Conclusion
Technical Notes
The following compilation experiences may be useful to scientists who would like to tune the performance of numerical crunching codes with different C and Fortran compilers as well as numerical libraries, however, the resultant executables need to be examined by careful tests to make sure that they give correct answers. Furthermore, linking the target binaries statically is strongly recommended with ifc 7.1, as it can prevent from the executables to reference mixed version of shared libraries of different compiler revisions which may causes trouble at run-time. Although the binaries occupy larger disk-space, static-linking makes everything clearer especially when the system is installed with muiltiple revisions of Intel or PGI compilers. For Intel compiler 8.0, dynamical linking against libguide.so is the default, as in the release notes performance issues is claimed while linking libguide dynamically.
The authors are NOT responsible for any numerical errors, data loss
or system damage resulted from the suggestions that follow.
For 32-bit Linux distributions that incorporate the new native POSIX threading
library, such as RedHat 9.0, an undefined reference to "__ctype_b"
may appear and can be solved by the "-i_dynamic" option at the linking
stage using ifc 7.1 with MKL. In the case of linking against GOTO library,
options of "-lpthread -lsvml" are useful to resolve other undefined
references.
Generation of 32-bit binaries in the AMD64 system needs several special compiling
and linking options, since it is the default to produce 64-bit executables
in 64-bit Linux system. The architectural tuning option of PGI compiler should
be set to "-tp k8-32", while the option of "-m32" is required
to specify 32-bit compilation with GNU compilers. Furthermore, the "-melf_i386"
option is necessary to link the executables as 32-bit ELF format (the native
Linux binary format) at the linking stage, and to pass the above options to
the linker, the options of "-Wl,-melf_i386" should be used if the
linking is to be done by the compilers (gcc, g77 and ifc,
etc.) rather than by ld. Note that mixed-linking among 32-bit and 64-bit
object files is not allowed.
In the AMD64 system, complains of undefined references to "e_wsfe",
"s_wsfe" and "do_fio" appear at the linking stage when
using ifc in combination with the ACML (v1.0) gnu32 library, and the
errors can be cleared up by additionally linking the object file of GOTO's
xerbla.f, which is recompiled by ifc with "-c" option.
On the other hand, to successfully link binaries against ACML pgi32 library,
pgf90 should be used instead of pgf77 to eliminate various undefined
references since the pgi32 version of ACML is built with Fortran90 rather
than Fortran77.
Using ifc to generate 32-bit executables for AMD64 simply requires
the "-Wl, -melf_i386" options at the linking stage. Optimization
options of "-tpp7 -axW" are to activate the SSE2 support for the
double-precision FP acceleration since ifc is able to treat the AMD64
hardware as a Pentium4 compatible derivative while performing the 32-bit compilation.