OpenBLAS/kernel
Marius Hillenbrand 22aa81f3e5 s390x: fix cscal and zscal implementations
The implementation of complex scalar * vector multiplication for Z14
makes some LAPACK tests fail because the numerical differences to the
reference implementation exceed the threshold (as can be seen by running
make lapack-test and replacing kernel/zarch/cscal.c with a generic
implementation for comparison).

The complex multiplication uses terms of the form a * b + c * d for both
real and imaginary parts. The assembly code (and compiler-emitted code
as well) uses fused multiply add operations for the second product and
sum. The results can be "surprising", for example when both terms in the
imaginary part nearly cancel each other out. In that case, the second
product contributes more digits to the sum than the first product that
has been rounded before.

One option is to use separate multiplications (which then round the same
way) and a distinct add. Change the code to pursue that path, by (1)
requesting the compiler not to contract the operations into FMAs and (2)
replacing the assembly kernel with corresponding vectorized C code
(where change 1 also applies).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 13:10:05 +02:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 2020-07-23 20:40:13 +00:00
arm64 Rename KERNEL.SILICON to KERNEL.VORTEX 2020-09-03 08:44:20 +02:00
generic powerpc: Optimized SHGEMM kernel for POWER10 2020-06-25 22:19:08 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power Optimize daxpy/zaxpy for POWER10 2020-09-17 12:56:28 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
x86_64 Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 2020-09-14 15:00:19 +02:00
zarch s390x: fix cscal and zscal implementations 2020-09-21 13:10:05 +02:00
CMakeLists.txt Merge pull request #2780 from Guobing-Chen/CPL_build_support 2020-08-20 19:54:29 +02:00
Makefile Fix typo 2020-08-19 16:36:55 +02:00
Makefile.L1 Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 Merge pull request #2780 from Guobing-Chen/CPL_build_support 2020-08-20 19:54:29 +02:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00