Wangyang Guo
ca7682e3a3
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
2021-08-02 07:06:54 +00:00
Wangyang Guo
9967e61abb
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
2021-08-02 07:06:54 +00:00
Wangyang Guo
a87736346f
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
2021-08-02 07:06:54 +00:00
Wangyang Guo
4c9d9940fd
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
2021-08-02 07:06:54 +00:00
Wangyang Guo
13b32f69b7
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
2021-08-02 07:06:54 +00:00
Wangyang Guo
3d8c6d9607
Small Matrix: skylakex: sgemm nn: clean up unused code
2021-08-02 07:06:54 +00:00
Wangyang Guo
49b61a3f30
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
2021-08-02 07:06:54 +00:00
Wangyang Guo
f88470323b
Optimize M < 16 using AVX512 mask
2021-08-02 07:06:54 +00:00
Wangyang Guo
9186456a12
small matrix: SkylakeX: add SGEMM NN kernel
2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
59cb5de46b
Refs #2587 Fix typos.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
...
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
gxw
0b8f7c8c10
Add cmake support for LOONGARCH64
2021-08-02 10:00:41 +08:00
gxw
af0a69f355
Add support for LOONGARCH64
2021-07-27 15:29:12 +08:00
Martin Kroeker
49bbf330ca
Empirical workaround for numpy SVD NaN problem from issue 3318
2021-07-18 22:19:19 +02:00
Martin Kroeker
5b4b385ecf
Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures
2021-07-14 20:50:14 +02:00
User User-User
39ef0880ae
copy conf
2021-06-19 21:49:58 +02:00
Martin Kroeker
c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
...
Power10: Fix for SBGEMM
2021-06-15 22:58:48 +02:00
Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
...
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
2021-06-15 13:07:47 -05:00
Gilles Gouaillardet
9d292d37b2
arm64: add the missing d9 register to the clobber list
...
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2021-06-14 17:01:28 +09:00
Martin Kroeker
2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
...
Remove spurious casts from PPC parameters and fix compilation for older targets
2021-06-10 18:05:47 +02:00
Martin Kroeker
dbba381dc3
Merge pull request #3260 from intelmy/sgemv_t_opt
...
Optimized sgemv_t for small N based on AVX512
2021-06-10 16:08:24 +02:00
Martin Kroeker
efdbdd8f82
Add prefetch values for power3
2021-06-10 11:20:29 +02:00
Martin Kroeker
3906ef3b0f
Add prefetch values for power3
2021-06-10 11:19:40 +02:00
Martin Kroeker
8adf0971d8
Add prefetch values for power3
2021-06-10 11:18:22 +02:00
Martin Kroeker
08e2e60762
Add prefetch values for power3
2021-06-10 11:17:33 +02:00
Martin Kroeker
fb9e678235
Fix caxpy/zaxpy for big-endian
2021-06-10 11:15:48 +02:00
Martin Kroeker
dc4fcb48df
Fix inverted conditional for caxpy/zaxpy
2021-06-10 11:14:03 +02:00
Martin Kroeker
7a48247761
fix c/zrot and sgemv for POWER5
2021-06-10 11:11:56 +02:00
Rajalakshmi Srinivasaraghavan
cbb70438df
POWER10: Fixes for sbgemm kernel
...
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
2021-06-09 12:20:09 -05:00
Ma, Yu
706a08d4a0
Optimized sgemv_t for small N based on AVX512
2021-06-08 15:08:28 -04:00
Zhaofeng Li
590be3fae3
riscv64: Add Makefile
2021-06-07 22:55:56 +00:00
Zhaofeng Li
3521cd48cb
RISCV64_GENERIC: Use generic kernel for DSDOT for better precision
...
The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and
the generic kernel seems to have better precision. Tested on SiFive
FU740 (HiFive Unmatched) and QEMU.
Also see #1469 .
2021-06-07 22:50:23 +00:00
Zhaofeng Li
1e0192a5cc
riscv64/imin: Fix wrong comparison
...
Same as #1990 .
2021-06-07 22:49:39 +00:00
Martin Kroeker
5f677e782e
Merge pull request #3196 from guowangy/skylakex-gemm-batch-k
...
GEMM: skylake: improve the performance when m is small
2021-05-22 19:25:28 +02:00
Martin Kroeker
02087a62e7
Merge pull request #3205 from intelmy/sgemv_n_opt
...
optimize on sgemv_n for small n
2021-05-17 17:49:01 +02:00
Martin Kroeker
4ecf631f95
Merge pull request #3228 from martin-frbg/issue3226
...
filter out -mavx flag on Sandybridge zgemm/ztrmm kernels
2021-05-15 09:06:12 +02:00
Martin Kroeker
310b76aad7
Merge pull request #3231 from martin-frbg/issue3227
...
Support compilation with pre-C99 versions of MSVC
2021-05-14 23:28:06 +02:00
Martin Kroeker
c4da892ba0
Only filter out -mavx on Sandybridge ZGEMM/ZTRMM kernels
2021-05-14 23:19:10 +02:00
Martin Kroeker
8b90e5f202
Drop redundant inclusion of complex.h
2021-05-14 15:06:44 +02:00
Martin Kroeker
bd60fb6ffc
filter out -mavx flag on zgemm kernels as it can cause problems with older gcc
2021-05-13 23:05:00 +02:00
Martin Kroeker
37ea8702ee
Merge pull request #3192 from damonyu1989/develop
...
Update the intrinsic api to the offical name.
2021-05-11 16:00:45 +02:00
Martin Kroeker
c0ca63ea46
Fix missing conditionals for non-SKX kernels
2021-05-05 14:55:36 +02:00
pnp
3d4ccd2a13
fix for build error
2021-04-30 12:25:33 -04:00
pnp
c59652f0ce
optimize on sgemv_n for small n
2021-04-30 12:14:58 -04:00
Wangyang Guo
aa7b3dc3db
GEMM: skylake: improve the performance when m is small
2021-04-28 13:56:06 +00:00
damonyu
ceb44bef14
update the intrinsic api to the offical name.
2021-04-27 11:12:29 +08:00