These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.
Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.
This uses the same syntax as used in 22aa81f for zarch (s390x).
Benchmarks should allocate with cacheline (often 64 bytes) alignment
to avoid unreliable timings. This technique, storing the offset in the
byte before the pointer, doesn't require C11's aligned_alloc for
compatibility with older compilers.
For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is
not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844