Martin Kroeker
2367726578
Remove redundant status message
2020-09-30 23:28:49 +02:00
Martin Kroeker
5464eb13ea
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:59:41 +02:00
Martin Kroeker
e1574cbc83
Change ifdef linux to __linux for C11 compatibility
...
and add a fallback for unsupported operating systems in detect()
2020-09-30 22:50:21 +02:00
Martin Kroeker
0b2bb5696a
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:47:25 +02:00
Martin Kroeker
a7d5d0078d
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:46:25 +02:00
Martin Kroeker
be40440ec5
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:45:18 +02:00
Martin Kroeker
2bf70c8e3b
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:43:25 +02:00
Qiyu8
60e6c68e38
Adapt ARM architect
2020-09-29 16:36:14 +08:00
Martin Kroeker
64629cb5c7
Merge pull request #91 from xianyi/develop
...
rebase
2020-09-28 22:48:53 +02:00
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-09-28 20:36:53 +08:00
Martin Kroeker
0d98ce202c
Merge pull request #2866 from RajalakshmiSR/p10_dcopy
...
Optimize dcopy/zcopy for POWER10
2020-09-28 07:22:54 +02:00
Rajalakshmi Srinivasaraghavan
2df4235e00
Optimize dcopy/zcopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-27 21:42:32 -05:00
Thomas Hisch
fe8cd5ae7e
Consolidate usage of backticks for build options
...
There were some build options in the README that were not
highlighted. Now all are highlighted.
2020-09-28 00:42:17 +02:00
Martin Kroeker
ba31c8f5f9
Merge pull request #2853 from Qiyu8/usimd-daxpy
...
Optimize the performance of daxpy by using universal intrinsics
2020-09-27 23:19:59 +02:00
Martin Kroeker
e961d4d609
Merge pull request #2864 from martin-frbg/lapack445
...
FIx underflow/rounding errors in LAPACK (S,D)LANV2
2020-09-27 23:11:17 +02:00
Martin Kroeker
7ed25e9e10
FIx underflow/rounding errors in LAPACK (S,D)LANV2
...
Reference-LAPACK PR 445, fixing their issue 263
2020-09-27 22:59:20 +02:00
Martin Kroeker
7b169379e0
Merge pull request #2863 from martin-frbg/readmefixes
...
Readmefixes
2020-09-27 22:50:25 +02:00
Martin Kroeker
7f539fb850
Update cpu list, outline cmake build, clarify scope of set_num_threads extension
2020-09-27 22:48:41 +02:00
Martin Kroeker
caf7a12295
Merge pull request #90 from xianyi/develop
...
rebase
2020-09-27 22:35:45 +02:00
Martin Kroeker
72b5b73647
Merge pull request #2850 from xiaojiayuan111/develop
...
fix a bug of trmm
2020-09-27 12:12:35 +02:00
Qiyu8
881c15179f
remove default support for FMA4 on zen architect
2020-09-27 09:35:50 +08:00
Martin Kroeker
896bbd55e1
Add support for building only selected variable types
2020-09-26 23:25:55 +02:00
Martin Kroeker
c5a32288c6
Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16
2020-09-26 23:24:37 +02:00
Martin Kroeker
dfaafd3b55
Merge pull request #2854 from martin-frbg/travis-graviton
...
Add an AWS-Graviton2 build to Travis CI
2020-09-23 21:59:18 +02:00
Martin Kroeker
f2e9a24e1a
Add AWS Graviton2 build
2020-09-23 19:02:20 +02:00
Martin Kroeker
98153875e9
Adapt tests to having only a subset of types in the library
2020-09-22 23:28:57 +02:00
Martin Kroeker
0eaae30e8c
Adapt tests to having only a subset of types in the build
2020-09-22 23:28:03 +02:00
Martin Kroeker
dfbc62ef7e
Support building only a subset of types
2020-09-22 23:25:59 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Martin Kroeker
357bff06b5
Add BUILD_vartype defines
2020-09-22 23:24:22 +02:00
Martin Kroeker
988a6f429e
Add BUILD_vartype defines
2020-09-22 23:23:33 +02:00
Martin Kroeker
e5e2fbd593
Support building only selected types
2020-09-22 23:21:30 +02:00
Martin Kroeker
3287848c8f
Support building only seleced types
2020-09-22 23:20:51 +02:00
Martin Kroeker
26611af8e1
fix grouping of sources used for more than one type
2020-09-22 23:20:05 +02:00
Martin Kroeker
b886bd672b
add defines for building a subset of types
2020-09-22 23:18:55 +02:00
Martin Kroeker
61fae59298
Merge pull request #88 from xianyi/develop
...
rebase
2020-09-22 23:15:33 +02:00
Martin Kroeker
33d22f99f1
Merge pull request #2851 from martin-frbg/travis-xcode12
...
Add an OSX build with xcode12
2020-09-22 21:44:55 +02:00
Martin Kroeker
5ba01dd1a8
Add an OSX build with xcode12
2020-09-22 17:26:19 +02:00
Qiyu8
14f7dad3b7
performance improved
2020-09-22 16:52:15 +08:00
y00512012
06cf73a239
fix a bug of trmm
2020-09-22 16:47:10 +08:00
Qiyu8
325b539c26
Optimize the performance of daxpy by using universal intrinsics
2020-09-22 10:38:35 +08:00
Martin Kroeker
0f112077e6
Merge pull request #2847 from mhillenibm/fixup_cscal
...
s390x: fix cscal and zscal implementations
2020-09-21 22:22:43 +02:00
Marius Hillenbrand
22aa81f3e5
s390x: fix cscal and zscal implementations
...
The implementation of complex scalar * vector multiplication for Z14
makes some LAPACK tests fail because the numerical differences to the
reference implementation exceed the threshold (as can be seen by running
make lapack-test and replacing kernel/zarch/cscal.c with a generic
implementation for comparison).
The complex multiplication uses terms of the form a * b + c * d for both
real and imaginary parts. The assembly code (and compiler-emitted code
as well) uses fused multiply add operations for the second product and
sum. The results can be "surprising", for example when both terms in the
imaginary part nearly cancel each other out. In that case, the second
product contributes more digits to the sum than the first product that
has been rounded before.
One option is to use separate multiplications (which then round the same
way) and a distinct add. Change the code to pursue that path, by (1)
requesting the compiler not to contract the operations into FMAs and (2)
replacing the assembly kernel with corresponding vectorized C code
(where change 1 also applies).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 13:10:05 +02:00
Marius Hillenbrand
77ea73f5e5
s390x: for clang use fp-contract=on instead of fast
...
Make clang slightly more cautious when contracting floating-point
operations (e.g., when applying fused multiply add) by setting
-ffp-contract=on (instead of fast).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Marius Hillenbrand
f91057cbad
s390x: move common vector definitions and utils into header
...
... to facilitate reuse beyond gemm_vec.c and avoid code duplication.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Martin Kroeker
992d7ca63d
Merge pull request #2845 from martin-frbg/lapack443
...
Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)
2020-09-18 23:18:41 +02:00
Martin Kroeker
7e4d5c237c
Fix workspace query in xGELQ (Reference-LAPACK PR443)
2020-09-18 09:19:46 +02:00
Martin Kroeker
8d12027a79
Merge pull request #86 from xianyi/develop
...
rebase
2020-09-18 09:17:49 +02:00
Martin Kroeker
b1e0bcceec
Merge pull request #2844 from RajalakshmiSR/daxpy_p10
...
Optimize daxpy/zaxpy for POWER10
2020-09-17 23:46:32 +02:00
Rajalakshmi Srinivasaraghavan
be43d2cb96
Optimize daxpy/zaxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-17 12:56:28 -05:00