Martin Kroeker
a17cf36225
Merge pull request #2153 from quickwritereader/develop
...
improved power9 zgemm,sgemm
2019-06-06 07:42:56 +02:00
AbdelRauf
148c4cc5fd
conflict resolve
2019-06-05 20:50:50 +00:00
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
2019-06-05 20:07:16 +00:00
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
2019-06-04 07:11:30 +00:00
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
2019-05-30 15:31:25 +00:00
Martin Kroeker
74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp
2019-05-30 11:38:11 +02:00
Martin Kroeker
c5495d2056
Ensure correct output for DAMAX with softfp
2019-05-30 11:25:43 +02:00
Martin Kroeker
c70496b108
Separate implementations of AMAX and IAMAX on arm
...
As noted in #1912 and comment on #1942 , the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
2019-05-29 15:02:51 +02:00
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
...
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
2019-05-09 14:42:36 +02:00
Martin Kroeker
6a8b4269b5
Merge pull request #2111 from martin-frbg/issue1955
...
Disable the SkyLakeX DGEMMIxCOPY kernels as well
2019-05-05 18:08:49 +02:00
Martin Kroeker
b1561ecc68
Disable DGEMMINCOPY as well for now
...
#1955
2019-05-05 15:52:01 +02:00
Martin Kroeker
7ed8431527
Disable the SkyLakeX DGEMMITCOPY kernel as well
...
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
2019-05-04 22:54:41 +02:00
Martin Kroeker
3f427c0cf9
Merge pull request #2107 from quickwritereader/develop
...
sgemm/strmm kernel for power9
2019-05-02 07:56:57 +02:00
AbdelRauf
47f892198c
conflict resolve
2019-05-01 19:36:22 +00:00
AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
2019-04-29 08:57:44 +00:00
AbdelRauf
0f105dd8a5
sgemm/strmm
2019-04-29 08:49:50 +00:00
Martin Kroeker
ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
...
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
2019-04-09 14:20:16 +10:00
Martin Kroeker
c04a729081
Add ?sum definitions for generic kernel
2019-03-31 13:55:49 +02:00
Martin Kroeker
100d94f94e
Add ?sum
2019-03-31 13:55:05 +02:00
Martin Kroeker
246ca29679
Add ZARCH implementation of ?sum
...
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
2019-03-30 22:49:05 +01:00
Martin Kroeker
9d717cb5ee
Add x86_64 implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:27:04 +01:00
Martin Kroeker
e3bc83f2a8
Add x86 implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:26:10 +01:00
Martin Kroeker
70f2a4e0d7
Add SPARC implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
2019-03-30 22:25:06 +01:00
Martin Kroeker
706dfe263b
Add POWER implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
2019-03-30 22:23:42 +01:00
Martin Kroeker
688fa9201c
Add MIPS64 implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
2019-03-30 22:22:15 +01:00
Martin Kroeker
cdbe0f0235
Add MIPS implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:20:14 +01:00
Martin Kroeker
f8b82bc6dc
Add ia64 implementation of ?sum
...
as trivial copy of asum with the fabs calls removed
2019-03-30 22:18:03 +01:00
Martin Kroeker
3e3ccb9011
Add ARM64 implementations of ?sum
...
as trivial copies of the respective ?asum kernels with the fabs calls removed
2019-03-30 22:13:36 +01:00
Martin Kroeker
94ab4e6fb2
Add ARM implementations of ?sum
...
(trivial copies of the respective ?asum with the fabs calls removed)
2019-03-30 22:11:38 +01:00
Martin Kroeker
c3cfc6986b
Add implementations of ssum/dsum and csum/zsum
...
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
2019-03-30 22:05:11 +01:00
Martin Kroeker
b9f4943a14
Add ?sum
2019-03-30 22:01:13 +01:00
Martin Kroeker
32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
...
Disable the AVX512 DGEMM kernel (again)
2019-03-30 21:21:38 +01:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Martin Kroeker
e608d4f7fe
Disable the AVX512 DGEMM kernel (again)
...
Due to as yet unresolved errors seen in #1955 and #2029
2019-03-13 22:10:28 +01:00
Martin Kroeker
03d7110900
Merge pull request #2042 from maomao194313/develop
...
add TARGET support for HiSilicon tsv110 CPUs
2019-03-12 22:57:39 +01:00
Martin Kroeker
f18ab6c17b
Merge pull request #2051 from martin-frbg/issue2048
...
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
2019-03-09 16:39:35 +01:00
Martin Kroeker
5b95534afc
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
...
for issue #2048
2019-03-09 11:21:16 +01:00
Celelibi
b7f59da42d
Fix crash in sgemm SSE/nano kernel on x86_64
...
Fix bug #2047 .
Signed-off-by: Celelibi <celelibi@gmail.com>
2019-03-07 16:55:13 +01:00
maomao194313
783ba8058f
HiSilicon tsv110 CPUs optimization branch
...
add HiSilicon tsv110 CPUs optimization branch
2019-03-04 16:30:50 +08:00
Andrew
6eee1beac5
move fix to right place
2019-02-24 20:41:02 +02:00
Martin Kroeker
e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4
...
Fix inline assembly constraints in Bulldozer TRSM kernels
2019-02-17 11:49:15 +01:00
Martin Kroeker
1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3
...
Fix inline assembly constraints in various x86_64 GEMVN kernels
2019-02-17 11:48:57 +01:00
Martin Kroeker
f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels
...
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
2019-02-16 20:06:48 +01:00
Martin Kroeker
efb9038f72
Fix inline assembly constraints
2019-02-16 18:46:17 +01:00
Martin Kroeker
e976557d29
Fix inline assembly constraints
...
rework indices to allow marking argument lda as input and output.
2019-02-16 18:36:39 +01:00
Martin Kroeker
9d8be15789
Fix inline assembly constraints
...
rework indices to allow marking argument lda4 as input and output. For #2009
2019-02-16 18:24:11 +01:00
Martin Kroeker
d752799a0f
Merge pull request #2021 from martin-frbg/gcc9fixes2
...
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
2019-02-16 18:05:40 +01:00
Martin Kroeker
c26c0b77a7
Fix wrong constraints in inline assembly
...
for #2009
2019-02-15 15:08:16 +01:00