Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
2021-03-19 11:44:31 +01:00
Martin Kroeker
251a09ec90
Typo fix
2020-07-24 16:04:58 +00:00
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
2020-07-24 10:13:46 +00:00
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
2020-07-15 10:00:07 +02:00
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
2020-07-14 18:11:19 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
...
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Martin Kroeker
06208c8d01
Limit this fix to ELFv2 builds
2020-04-22 14:16:40 +02:00
Martin Kroeker
f5c4c28b98
Work around POWER8BE bugs on FreeBSD (ELFv2)
...
for #2299
2020-04-21 17:17:17 +02:00
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
2020-02-19 18:09:54 +01:00
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
...
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
2020-02-17 19:55:39 +01:00
Martin Kroeker
d483e9270a
Update KERNEL.POWER8
2020-02-16 17:29:35 +01:00
Martin Kroeker
01834aee33
Merge pull request #29 from xianyi/develop
...
rebase
2020-02-16 17:28:10 +01:00
Martin Kroeker
d92bd5be24
Update KERNEL.POWER8
2020-02-15 23:07:50 +01:00
Martin Kroeker
46e4b12946
Update KERNEL.POWER8
2020-02-15 23:06:51 +01:00
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
2020-02-12 19:56:52 +01:00
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
2019-11-17 23:12:10 +01:00
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
...
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
2019-04-09 14:20:16 +10:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
...
Fixed idamin,icamin choosing the first occurance index of equal minimals
2019-01-16 15:16:21 +00:00
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
...
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
2017-11-28 18:38:07 +01:00
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
2016-03-28 14:12:08 +02:00
Werner Saar
294f933869
added optimized zasum kernel for POWER8
2016-03-28 13:37:32 +02:00
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
2016-03-28 12:44:25 +02:00
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
2016-03-28 12:17:15 +02:00
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
2016-03-27 18:31:37 +02:00
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
2016-03-27 16:31:50 +02:00
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
2016-03-27 11:05:56 +02:00
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
2016-03-27 08:57:11 +02:00
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
2016-03-25 17:34:55 +01:00
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
2016-03-25 16:54:25 +01:00
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
2016-03-25 16:06:56 +01:00
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
2016-03-25 15:27:34 +01:00
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
2016-03-25 14:35:43 +01:00
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
2016-03-25 13:03:02 +01:00
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
2016-03-25 09:42:08 +01:00
Werner Saar
55eda3813b
added optimized zaxpy kernel for POWER8
2016-03-23 11:20:23 +01:00
Werner Saar
0664ba4c97
added optimized daxpy kernel for POWER8
2016-03-22 14:50:03 +01:00