Bart Oldeman
5ceca1a4d8
Add sscal.c + microkernels for Haswell, Zen, Skylake and newer.
...
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
2022-12-06 14:05:49 -05:00
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-05 15:12:36 +08:00
Gengxin Xie
cb3c190a3a
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
2020-08-31 11:44:08 +08:00
wjc404
fa049d49c2
AVX2 STRSM kernel
2020-03-17 00:34:08 +08:00
wjc404
97a32cb0a5
Update KERNEL.HASWELL
2020-02-22 23:39:20 +08:00
wjc404
b73bf01378
optimize AVX2 SGEMM
2020-01-06 12:09:14 +08:00
wjc404
109e18cd96
Update KERNEL.HASWELL
2019-12-30 16:03:24 +08:00
wjc404
ed9af2f7da
Update KERNEL.HASWELL
2019-12-27 18:01:38 +08:00
wjc404
c418c81224
Update KERNEL.HASWELL
2019-12-23 23:41:44 +08:00
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
2019-12-21 14:37:06 +08:00
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
...
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
2018-12-16 23:09:22 +00:00
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
...
it's more efficient for certain tall/skinny matrices
2018-12-16 23:09:17 +00:00
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
...
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
2018-12-15 13:49:19 +00:00
Arjan van de Ven
00dc09ad19
Use the skylake sgemm beta code also for haswell
...
with a few small changes it's possible to use the skylake sgemm code
also for haswell, this gives a modest gain (10% range) for smallish
matrixes but does wonders for very skinny matrixes
2018-12-15 13:49:13 +00:00
Martin Kroeker
28c3fa8950
Add dsdot
2017-10-16 23:29:03 +02:00
Werner Saar
c8f2c5d636
added optimized trsm_kernels
2016-01-05 13:05:05 +01:00
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
2015-06-13 16:16:29 +02:00
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
2015-06-13 10:28:27 +02:00
Werner Saar
31c9e399e9
added optimized cscal kernel for haswell
2015-05-17 13:44:09 +02:00
Werner Saar
d63034303b
added optimized zscal kernel for haswell
2015-05-16 16:41:45 +02:00
Werner Saar
02e772c7e4
added optimized dscal kernel for haswell
2015-05-12 17:19:58 +02:00
Werner Saar
1c4b0eeae3
added optimized ssymv kernels for haswell
2015-04-23 10:23:13 +02:00
Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
2015-04-22 10:42:50 +02:00
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
2015-04-16 11:19:37 +02:00
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
2015-04-15 15:16:31 +02:00
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
2015-04-09 15:13:52 +02:00
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
2015-04-06 12:33:16 +02:00
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
2015-04-05 17:57:53 +02:00
wernsaar
8f100a14f2
optimized cgemv_t kernel for haswell
2014-09-13 16:13:27 +02:00
wernsaar
1a352b24e6
updated KERNEL.HASWELL
2014-09-13 12:23:27 +02:00
wernsaar
e0192a6914
bugfix in zgemv_n_4.c
2014-09-11 13:18:00 +02:00
wernsaar
baa46e4fba
added and tested optimized dgemv_n kernel for haswell
2014-09-09 16:17:45 +02:00
wernsaar
debc6d1a05
bugfix in KERNEL.HASWELL
2014-09-09 14:04:44 +02:00
wernsaar
e73a0113ec
added optimized gemv kernels
2014-09-09 13:54:55 +02:00
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
2014-09-07 21:13:57 +02:00
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
2014-09-06 12:08:48 +02:00
wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
8c582d362d
optimized zgemv_t_microk_haswell-2.c
2014-08-13 13:42:22 +02:00
wernsaar
11e34ddd1b
bugfix for zgemv_n_microk_haswell-2.c
2014-08-13 12:54:18 +02:00
wernsaar
58b075daef
added optimized zgemv_t kernel for haswell
2014-08-11 16:57:52 +02:00
wernsaar
dbc2eff029
disabled optimized haswell zgemv_n kernel for windows ( bad rounding )
2014-08-10 11:57:24 +02:00
wernsaar
462b4885ff
added optimized zgemv_n kernel for haswell
2014-08-10 08:39:17 +02:00
wernsaar
006ef3ea01
added optimized dgemv_t kernel for haswell
2014-08-07 10:08:54 +02:00
wernsaar
60f17628cc
added optimized dgemv_n kernel for haswell
2014-08-07 09:18:02 +02:00
wernsaar
7aa43c8928
enabled optimized sgemv kernels for windows
2014-08-06 14:06:30 +02:00
wernsaar
95a8caa2f3
added optimized sgemv_t kernel
2014-08-06 12:12:17 +02:00
wernsaar
2bab92961f
enabled optimized sgemv_n kernels for windows
2014-08-05 14:52:54 +02:00
wernsaar
3fbc13eb65
modified sgemv_n for haswell
2014-08-04 16:22:11 +02:00
wernsaar
6acbafe45b
added sgemv_n microkernel for haswell
2014-07-20 14:52:25 +02:00