openQCD-FASTSUM build system
The openQCD-FASTSUM build system is an improvement of the original openQCD build system, and allows for compiling the code in a different folder to where the source directory is located. This makes it easier to compile multiple versions of the code, something that is often necessary due to the fact that the lattice dimension is hard coded at compile time.
During compilation a source file containing the current date and the compile flags used will be generated and this information will be prepended to the log file of every run. If the code location also is a git repository, the current SHA will also be added to the log (along with a release tag and a dirty specifier if the source files have been modified).
Targets
The makefile is written so that it is easy to compile exactly the component one needs. There are two targets listing all targets:
help
: prints a formatted help message with general informationlist-targets
: automatically generated list of non-rule targets
The main executables can either be specified individually (qcd1
, ms1
, ym1
,
…), or compile all executables with the all
target. all
is also the
default target if no other target is specified.
The tests can be compiled with the tests
target. One can also compile the
tests individually, for example
make devel/stout_smearing
will build all stout smearing tests, while for example
make devel/stout_smearing/test01_omega_matrices
will build a specific test in that module.
It is also possible to build an archive of a single module as every module is a valid target. However this is not all that useful due to the way the modules are structured.
Configuration
The compilation is configured by editing the compile_settings.txt
file. An
example file could read:
CODELOC ..
COMPILER /usr/bin/mpicc
MPI_INCLUDE /usr/include/mpi
CFLAGS -std=c89 -O2 -DAVX -DFMA3 -Werror -Wall
LDFLAGS
NPROC0_TOT 1
NPROC1_TOT 1
NPROC2_TOT 1
NPROC3_TOT 1
L0 8
L1 8
L2 8
L3 8
NPROC0_BLK 1
NPROC1_BLK 1
NPROC2_BLK 1
NPROC3_BLK 1
The options are as follows:
-
CODELOC
: The location of the openqcd-fastsum root directory, accepts both relative and absolute paths -
COMPILER
: Specify which compiler to use -
MPI_INCLUDE
: Location of thempi.h
header file -
CFLAGS
: Compiler flags passed to the compilation stages, see the next section for a list of available flags for the openQCD software -
LDFLAGS
: Compiler flags passed to the linker -
NPROCx_TOT
: Number of nodes in the x’th direction. Corresponds to theNPROCx
macros in the source code. -
Lx
: Local number of lattice sites in the x’th direction. Corresponds to theLx
macros in the source code. -
NPROCx_BLK
: Processor block size. Corresponds to theNPROCx_BLK
macros in the source code.
Compile flags
The code can be further specialised through a set of compiler flags specified
through the CFLAGS
option in the compile_settings.txt
file. There are two
different types of these flags: intrinsics, specifying which intrinsic
vectorisation instructions to use, and debugging, specifying the debugging
output.
Intrinsics
Current Intel and AMD processors are able to perform arithmetic operations on short vectors of floating-point numbers in just one or two machine cycles, using SSE and/or AVX instructions.
Many programs in the module directories include inline-assembly SSE and AVX
code. Inline assembly is a GCC extension of the C language that may not be
supported by other compilers. On 64bit systems the code can be activated by
setting the compiler flags -Dx64
or -DAVX
, respectively. Furthermore, one
can enable AVX-512 instructions with the flag -DAVX512
. These are only
available on modern architectures (currently Xeon Phi x200 and Skylake-X), and
are also only supported by modern Intel and GNU compilers. The BlueGeneQ
optimizations are enabled by using the flag -DQPX
, and are only supported by
the IBM xlc compiler. In addition, SSE prefetch instructions will be used if
one of the following options is specified:
-
-DP4
: Assume that prefetch instructions fetch 128 bytes at a time (Pentium 4 and related Xeons). -
-DPM
: Assume that prefetch instructions fetch 64 bytes at a time (Athlon, Opteron, Pentium M, Core, Core 2 and related Xeons). -
-DP3
: Assume that prefetch instructions fetch 32 bytes at a time (Pentium III).
These options have an effect only if -Dx64
or -DAVX
is set. The option
-DAVX
implies -Dx64
. If none of these options is set, the programs do not
make use of any C language extensions and are fully portable.
The latest x86 processors furthermore support fused multiply-add (FMA3)
instructions. OpenQCD makes use of these if the option -DFMA3
is set in
addition to -DAVX
(setting -DFMA3
alone has no effect).
On recent x86-64 machines the recommended compiler flags are thus
-std=c89 -O -mno-avx -DAVX -DFMA3 -DPM
For older machines that do not support the AVX instruction set, the recommended flags are
-std=c89 -O -mno-avx -Dx64 -DPM
Aggressive optimization levels such as -O2
and -O3
tend to have little
effect on the execution speed of the programs, but the risk of generating wrong
code is higher.
AVX instructions and the option -mno-avx
may not be known to old versions of
the GCC compiler, in which case one may be limited to SSE accelerations with
option string -std=c89 -O -Dx64 -DPM
(or no acceleration at all).
If compilers other than GCC are used together with the option -Dx64
or
-DAVX
, it is strongly recommended to verify the correctness of the compilation
using the check programs in the devel directory.
Debugging
For troubleshooting and parameter tuning, it may helpful to switch on some debugging flags at compilation time. The simulation program then prints a detailed report to the log file on the progress made in specified subprogram.
The available flags are:
-
-DCGNE_DBG
: CGNE solver. -
-DFGCR_DBG
: GCR solver. -
-DFGCR4VD_DBG
: GCR solver for the little Dirac equation. -
-DMSCG_DBG
: MSCG solver. -
-DDFL_MODES_DBG
: Deflation subspace generation. -
-DMDINT_DBG
: Integration of the molecular-dynamics equations. -
-DRWRAT_DBG
: Computation of the rational function reweighting factor.