OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Marius Hillenbrand	22aa81f3e5	s390x: fix cscal and zscal implementations The implementation of complex scalar * vector multiplication for Z14 makes some LAPACK tests fail because the numerical differences to the reference implementation exceed the threshold (as can be seen by running make lapack-test and replacing kernel/zarch/cscal.c with a generic implementation for comparison). The complex multiplication uses terms of the form a * b + c * d for both real and imaginary parts. The assembly code (and compiler-emitted code as well) uses fused multiply add operations for the second product and sum. The results can be "surprising", for example when both terms in the imaginary part nearly cancel each other out. In that case, the second product contributes more digits to the sum than the first product that has been rounded before. One option is to use separate multiplications (which then round the same way) and a distinct add. Change the code to pursue that path, by (1) requesting the compiler not to contract the operations into FMAs and (2) replacing the assembly kernel with corresponding vectorized C code (where change 1 also applies). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-21 13:10:05 +02:00
Marius Hillenbrand	87e5bbd887	s390x: avoid variable-length arrays in struct for asm operands ... since it is not required and clang does not support that gcc extension. Instead, use a variable-length array directly for these operands. Note that, while the actual inline assembly code does not directly use these memory operands, they serve to inform the compiler that it cannot reorder reads or writes to/from the input and output data across the inline asm statements. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
maamountki	77fe70019f	[ZARCH] Fix constraints and source code formatting	2019-02-11 16:01:13 +02:00
maamountki	7039770165	[ZARCH] Undo the last commit	2019-02-06 20:11:44 +02:00
maamountki	11a43e8116	[ZARCH] Set alignment hint for vl/vst	2019-02-05 19:17:08 +02:00
maamountki	81daf6bc38	[ZARCH] Format source code, Fix constraints	2019-02-05 07:30:38 +02:00
maamountki	23229011db	[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization	2018-08-06 18:20:40 +03:00
QWR QWR	28ca97015d	power8:Added initial zgemv_(t\|n) ,i(d\|z)amax,i(d\|z)amin,dgemv_t(transposed),zrot z13: improved zgemv_(t\|n)_4,zscal,zaxpy	2018-03-27 14:54:41 +00:00
the mslm	f946a89432	zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes	2018-01-26 19:25:27 -08:00
the mslm	2619ad7ea5	Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan names). Some fixes and tunings	2018-01-19 19:24:35 -08:00
Abdurrauf	1cfdb2295d	Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)	2017-09-06 16:41:08 +04:00

11 Commits