Commit Graph

50 Commits

Author SHA1 Message Date
Martin Kroeker 874df65491
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker 877023e1e1
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker 265142edd5
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
maamountki 29416cb5a3
[ZARCH] Add Z13 version for max/min functions 2019-01-31 19:11:11 +02:00
maamountki 48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax 2019-01-31 18:52:11 +02:00
maamountki fcd814a8d2
[ZARCH] Fix bug in max/min functions 2019-01-29 17:59:38 +02:00
maamountki dc4d3bccd5
[ZARCH] Fix icamax/icamin 2019-01-29 03:47:49 +02:00
maamountki c7143c1019
[ZARCH] Fix iamax/imax single precision 2019-01-28 17:52:23 +02:00
maamountki 04873bb174
[ZARCH] Undo the last commit 2019-01-28 17:32:24 +02:00
maamountki c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin 2019-01-28 17:16:18 +02:00
maamountki b111829226
[ZARCH] Update max/min functions 2019-01-21 15:56:04 +02:00
maamountki b815a04c87
[ZARCH] fix a bug in max/min functions 2019-01-15 21:04:22 +02:00
maamountki 1a7925b3a3
[ZARCH] Update dgemv_n_4.c 2019-01-11 17:43:11 +02:00
maamountki 406f835f00
[ZARCH] update cgemv_n_4.c 2019-01-11 17:39:17 +02:00
maamountki 621dedb37b
[ZARCH] Update cgemv_t_4.c 2019-01-11 17:37:11 +02:00
maamountki b731e8246f
Update sgemv_t_4.c 2019-01-11 17:14:04 +02:00
maamountki ecc31b743f
Update dgemv_t_4.c 2019-01-11 17:13:02 +02:00
maamountki 5d89d6b143
[ZARCH] fix sgemv_n_4.c 2019-01-11 17:08:24 +02:00
maamountki 67432b23c2
[ZARCH] fix cgemv_n_4.c 2019-01-11 16:44:46 +02:00
maamountki be66f5d5c2
[ZARCH] fix data prefetch type in sdot 2019-01-09 16:50:07 +02:00
maamountki c2ffef8156
[ZARCH] fix data prefetch type in ddot 2019-01-09 16:49:44 +02:00
maamountki e7455f500c
[ZARCH] fix dsdot.c 2019-01-09 16:33:54 +02:00
maamountki 3eafcfa650
[ZARCH] fix cgemv_n_4.c 2019-01-09 07:43:45 +02:00
maamountki 94cd946b96
[ZARCH] fix cgemv_n_4.c 2019-01-04 17:45:56 +02:00
maamountki 1aa840a0a2
[ZARCH] fix sgemv_t_4.c 2019-01-04 01:38:18 +02:00
maamountki e6c0e39492
Optimize Zgemv 2018-08-13 12:23:40 +03:00
maamountki 23229011db
[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization 2018-08-06 18:20:40 +03:00
Martin Kroeker c7b55b6082
Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd  kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
QWR QWR 28ca97015d power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker 22167170b3
Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Martin Kroeker 58f236ad73
Use generic/dot.c for DSDOT on zarch 2018-02-25 19:52:14 +01:00
Martin Kroeker e207107150
Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
the mslm c5425daa6b power8 ?gemm_tcopy save/restore 2018-02-16 23:36:46 +00:00
Abdelrauf 60596a1abc
Merge branch 'develop' into develop 2018-01-31 16:17:04 -08:00
Abdelrauf afd514c25d small fix inside ifdef z13mvc . (z13mvc code is not used in production) 2018-01-31 18:30:59 -05:00
Martin Kroeker f45776ec1f
Merge pull request #1440 from quickwritereader/develop
small corrections
2018-01-31 23:48:47 +01:00
Abdelrauf f653e7a18d
small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
the mslm f946a89432 zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes 2018-01-26 19:25:27 -08:00
Martin Kroeker e4c71a799a
Merge pull request #1426 from quickwritereader/develop
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
2018-01-20 17:34:54 +01:00
the mslm 2619ad7ea5 Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
names). Some fixes and tunings
2018-01-19 19:24:35 -08:00
Martin Kroeker 3d23f45107
Merge pull request #1415 from quickwritereader/develop
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
2018-01-11 08:35:02 +01:00
Abdelrauf 87669d1c0a small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined 2018-01-10 20:36:53 -05:00
Andrew 7e9b29b9b8 fix spurious compiler warning (no code change) 2017-11-24 18:36:37 +01:00
Abdurrauf 1cfdb2295d Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Abdurrauf 08786c4b95 strmm and ctrmm 2017-03-13 01:23:16 +04:00
Abdurrauf 82e80fa82b initial strmm(sgemm). not tuned yet 2017-03-06 04:27:40 +04:00
Abdurrauf e831d6924e changed to conventional register save area 2017-03-01 03:13:21 +04:00
Abdurrauf 848cb27b1e ztrmm kernel. 2017-02-26 06:14:12 +04:00
Abdurrauf 6418667818 dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
Zhang Xianyi dd43661cfd Init IBM z system (s390x) porting. 2016-04-15 18:02:24 -04:00