Commit Graph

1194 Commits

Author SHA1 Message Date
wjc404 9b04baeaee
Update dgemm_kernel_4x8_haswell.S 2019-07-17 23:50:03 +08:00
wjc404 8a074b3965
Update dgemm_kernel_4x8_haswell.S 2019-07-17 23:47:30 +08:00
wjc404 211ab03b14
Update dgemm_kernel_4x8_haswell.S 2019-07-17 22:39:15 +08:00
wjc404 1733f927e6
Update dgemm_kernel_4x8_haswell.S 2019-07-17 21:27:41 +08:00
wjc404 182b06d6ad
Update dgemm_kernel_4x8_haswell.S 2019-07-17 17:02:35 +08:00
wjc404 7a9050d681
Update dgemm_kernel_4x8_haswell.S 2019-07-17 00:55:06 +08:00
wjc404 0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
2019-07-17 00:46:51 +08:00
Martin Kroeker 6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
2019-07-01 21:06:02 +02:00
AbdelRauf a97b301aaa cgemm/ctrmm power9 2019-07-01 14:07:54 +00:00
Piotr Kubaj eebfeba768 Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
2019-06-25 10:58:56 +02:00
kavanabhat a575f1e4c7
Update dtrmm_kernel_16x4_power8.S 2019-06-19 15:27:14 +05:30
AbdelRauf cdbfb891da new sgemm 8x16 2019-06-17 15:33:38 +00:00
Martin Kroeker a17cf36225
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
2019-06-06 07:42:56 +02:00
AbdelRauf 148c4cc5fd conflict resolve 2019-06-05 20:50:50 +00:00
AbdelRauf d0c3543c3f power9 zgemm ztrmm optimized 2019-06-05 20:07:16 +00:00
AbdelRauf a469b32cf4 sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 2019-06-04 07:11:30 +00:00
AbdelRauf 8fe794f059 improved zgemm power9 based on power8 2019-05-30 15:31:25 +00:00
Martin Kroeker 74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp 2019-05-30 11:38:11 +02:00
Martin Kroeker c5495d2056
Ensure correct output for DAMAX with softfp 2019-05-30 11:25:43 +02:00
Martin Kroeker c70496b108
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
2019-05-29 15:02:51 +02:00
Martin Kroeker 9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125)
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
2019-05-09 14:42:36 +02:00
Martin Kroeker 6a8b4269b5
Merge pull request #2111 from martin-frbg/issue1955
Disable the SkyLakeX DGEMMIxCOPY kernels as well
2019-05-05 18:08:49 +02:00
Martin Kroeker b1561ecc68
Disable DGEMMINCOPY as well for now
#1955
2019-05-05 15:52:01 +02:00
Martin Kroeker 7ed8431527
Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
2019-05-04 22:54:41 +02:00
Martin Kroeker 3f427c0cf9
Merge pull request #2107 from quickwritereader/develop
sgemm/strmm kernel for power9
2019-05-02 07:56:57 +02:00
AbdelRauf 47f892198c conflict resolve 2019-05-01 19:36:22 +00:00
AbdelRauf 628b335e83 Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop 2019-04-29 08:57:44 +00:00
AbdelRauf 0f105dd8a5 sgemm/strmm 2019-04-29 08:49:50 +00:00
Martin Kroeker ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Rashmica Gupta bcdf1d4917 Add in runtime CPU detection for POWER. 2019-04-09 14:20:16 +10:00
Martin Kroeker c04a729081
Add ?sum definitions for generic kernel 2019-03-31 13:55:49 +02:00
Martin Kroeker 100d94f94e
Add ?sum 2019-03-31 13:55:05 +02:00
Martin Kroeker 246ca29679
Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
2019-03-30 22:49:05 +01:00
Martin Kroeker 9d717cb5ee
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:27:04 +01:00
Martin Kroeker e3bc83f2a8
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:26:10 +01:00
Martin Kroeker 70f2a4e0d7
Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
2019-03-30 22:25:06 +01:00
Martin Kroeker 706dfe263b
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
2019-03-30 22:23:42 +01:00
Martin Kroeker 688fa9201c
Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
2019-03-30 22:22:15 +01:00
Martin Kroeker cdbe0f0235
Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:20:14 +01:00
Martin Kroeker f8b82bc6dc
Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
2019-03-30 22:18:03 +01:00
Martin Kroeker 3e3ccb9011
Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
2019-03-30 22:13:36 +01:00
Martin Kroeker 94ab4e6fb2
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
2019-03-30 22:11:38 +01:00
Martin Kroeker c3cfc6986b
Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
2019-03-30 22:05:11 +01:00
Martin Kroeker b9f4943a14
Add ?sum 2019-03-30 22:01:13 +01:00
Martin Kroeker 32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
2019-03-30 21:21:38 +01:00
Martin Kroeker 7c51cc8527
Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
AbdelRauf 853a18bc17 power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 2019-03-29 15:49:40 +00:00
Martin Kroeker e608d4f7fe
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
2019-03-13 22:10:28 +01:00
Martin Kroeker 03d7110900
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
2019-03-12 22:57:39 +01:00
Martin Kroeker f18ab6c17b
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
2019-03-09 16:39:35 +01:00