The openQCD-FASTSUM build system is an improvement of the original openQCD build system, and allows for compiling the code in a different folder to where the source directory is located. This makes it easier to compile multiple versions of the code, something that is often necessary due to the fact that the lattice dimension is hard coded at compile time.
During compilation a source file containing the current date and the compile flags used will be generated and this information will be prepended to the log file of every run. If the code location also is a git repository, the current SHA will also be added to the log (along with a release tag and a dirty specifier if the source files have been modified).
The makefile is written so that it is easy to compile exactly the component one needs. There are two targets listing all targets:
help: prints a formatted help message with general information
list-targets: automatically generated list of non-rule targets
The main executables can either be specified individually (
…), or compile all executables with the
all is also the
default target if no other target is specified.
The tests can be compiled with the
tests target. One can also compile the
tests individually, for example
will build all stout smearing tests, while for example
will build a specific test in that module.
It is also possible to build an archive of a single module as every module is a valid target. However this is not all that useful due to the way the modules are structured.
The compilation is configured by editing the
compile_settings.txt file. An
example file could read:
CODELOC .. COMPILER /usr/bin/mpicc MPI_INCLUDE /usr/include/mpi CFLAGS -std=c89 -O2 -DAVX -DFMA3 -Werror -Wall LDFLAGS NPROC0_TOT 1 NPROC1_TOT 1 NPROC2_TOT 1 NPROC3_TOT 1 L0 8 L1 8 L2 8 L3 8 NPROC0_BLK 1 NPROC1_BLK 1 NPROC2_BLK 1 NPROC3_BLK 1
The options are as follows:
CODELOC: The location of the openqcd-fastsum root directory, accepts both relative and absolute paths
COMPILER: Specify which compiler to use
MPI_INCLUDE: Location of the
CFLAGS: Compiler flags passed to the compilation stages, see the next section for a list of available flags for the openQCD software
LDFLAGS: Compiler flags passed to the linker
NPROCx_TOT: Number of nodes in the x’th direction. Corresponds to the
NPROCxmacros in the source code.
Lx: Local number of lattice sites in the x’th direction. Corresponds to the
Lxmacros in the source code.
NPROCx_BLK: Processor block size. Corresponds to the
NPROCx_BLKmacros in the source code.
The code can be further specialised through a set of compiler flags specified
CFLAGS option in the
compile_settings.txt file. There are two
different types of these flags: intrinsics, specifying which intrinsic
vectorisation instructions to use, and debugging, specifying the debugging
Current Intel and AMD processors are able to perform arithmetic operations on short vectors of floating-point numbers in just one or two machine cycles, using SSE and/or AVX instructions.
Many programs in the module directories include inline-assembly SSE and AVX
code. Inline assembly is a GCC extension of the C language that may not be
supported by other compilers. On 64bit systems the code can be activated by
setting the compiler flags
-DAVX, respectively. Furthermore, one
can enable AVX-512 instructions with the flag
-DAVX512. These are only
available on modern architectures (currently Xeon Phi x200 and Skylake-X), and
are also only supported by modern Intel and GNU compilers. The BlueGeneQ
optimizations are enabled by using the flag
-DQPX, and are only supported by
the IBM xlc compiler. In addition, SSE prefetch instructions will be used if
one of the following options is specified:
-DP4: Assume that prefetch instructions fetch 128 bytes at a time (Pentium 4 and related Xeons).
-DPM: Assume that prefetch instructions fetch 64 bytes at a time (Athlon, Opteron, Pentium M, Core, Core 2 and related Xeons).
-DP3: Assume that prefetch instructions fetch 32 bytes at a time (Pentium III).
These options have an effect only if
-DAVX is set. The option
-Dx64. If none of these options is set, the programs do not
make use of any C language extensions and are fully portable.
The latest x86 processors furthermore support fused multiply-add (FMA3)
instructions. OpenQCD makes use of these if the option
-DFMA3 is set in
-DFMA3 alone has no effect).
On recent x86-64 machines the recommended compiler flags are thus
-std=c89 -O -mno-avx -DAVX -DFMA3 -DPM
For older machines that do not support the AVX instruction set, the recommended flags are
-std=c89 -O -mno-avx -Dx64 -DPM
Aggressive optimization levels such as
-O3 tend to have little
effect on the execution speed of the programs, but the risk of generating wrong
code is higher.
AVX instructions and the option
-mno-avx may not be known to old versions of
the GCC compiler, in which case one may be limited to SSE accelerations with
-std=c89 -O -Dx64 -DPM (or no acceleration at all).
If compilers other than GCC are used together with the option
-DAVX, it is strongly recommended to verify the correctness of the compilation
using the check programs in the devel directory.
For troubleshooting and parameter tuning, it may helpful to switch on some debugging flags at compilation time. The simulation program then prints a detailed report to the log file on the progress made in specified subprogram.
The available flags are:
-DCGNE_DBG: CGNE solver.
-DFGCR_DBG: GCR solver.
-DFGCR4VD_DBG: GCR solver for the little Dirac equation.
-DMSCG_DBG: MSCG solver.
-DDFL_MODES_DBG: Deflation subspace generation.
-DMDINT_DBG: Integration of the molecular-dynamics equations.
-DRWRAT_DBG: Computation of the rational function reweighting factor.