Werner Saar
c22068c406
optimized sdot.c for increments != 1
2015-04-24 13:13:20 +02:00
Werner Saar
dee100d0e4
optimized saxpy.c for increments != 1
2015-04-24 11:52:59 +02:00
Werner Saar
0273966abb
optimized daxpy kernel for increments != 1
2015-04-24 11:39:17 +02:00
Werner Saar
3a67daa954
optimized ddot.c for increments != 1
2015-04-24 10:56:55 +02:00
Werner Saar
b4f2153dcd
added optimized ssymv kernels for sandybridge
2015-04-23 12:19:24 +02:00
Werner Saar
1c4b0eeae3
added optimized ssymv kernels for haswell
2015-04-23 10:23:13 +02:00
Werner Saar
1bec9abb9a
added optimized dsymv kernels for sandybridge
2015-04-22 12:09:43 +02:00
Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
2015-04-22 10:42:50 +02:00
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
2015-04-16 11:19:37 +02:00
Zhang Xianyi
37b9033c90
Merge pull request #543 from jeromerobert/develop
...
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
2015-04-15 11:18:14 -05:00
Werner Saar
13889515b3
added optimized caxpy-kernel for sandybridge
2015-04-15 16:29:25 +02:00
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
2015-04-15 15:16:31 +02:00
Werner Saar
e9f33b4ca7
added optimized caxpy-kernel for steamroller
2015-04-15 13:49:23 +02:00
Werner Saar
f5d847122a
updated caxpy_microk_bulldozer-2.c and caxpy.c
2015-04-15 11:59:38 +02:00
Jerome Robert
a4c96eca67
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
...
Refs #478 , #482 , 9798481
, fd9fd42
2015-04-15 11:46:48 +02:00
Werner Saar
baa0363ea2
add optimized ddot-kernel for piledriver
2015-04-14 15:09:13 +02:00
Werner Saar
34ba66606a
add optimized daxpy-kernel for piledriver
2015-04-14 14:23:29 +02:00
Werner Saar
f615dc7603
added optimized saxpy kernel for steamroller
2015-04-14 09:09:39 +02:00
Werner Saar
331c417637
optimized saxpy for piledriver
2015-04-14 08:34:11 +02:00
Werner Saar
d7a17ad85d
optimized sdot-kernel for pilediver
2015-04-13 13:19:21 +02:00
Werner Saar
d35f6c63c2
add optimized daxpy-kernel for steamroller
2015-04-13 12:22:43 +02:00
Werner Saar
166d76e864
added optimized sdot-kernel for steamroller
2015-04-11 08:48:18 +02:00
Werner Saar
f9f127d838
added optimized ddot kernel for steamroller
2015-04-10 16:18:03 +02:00
wernsaar
62231ab337
Merge pull request #538 from wernsaar/develop
...
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7
updated cdot and zdot
2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a
add optimized cdot- and zdot-kernel for sandybridge
2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8
updated cdot and zdot for piledriver
2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972
added optimized cdot- and zdot-kernel for steamroller
2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6
added optimized cdot- and zdot-kernels for bulldozer
2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d
Refs #535 . Fix the wrong vector instruction in sgemm sandy bridge kernel.
2015-04-08 03:55:49 +08:00
Werner Saar
60c6dec6e6
updated some lines for bulldozer
2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35
added optimized saxpy- and daxpy-kernel for sandybridge
2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
2015-04-06 12:33:16 +02:00
Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5
removed double definition line
2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
2015-04-05 17:57:53 +02:00
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
2015-03-24 15:27:17 -05:00
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
2015-02-26 01:47:11 +08:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
2015-01-12 08:55:29 +00:00
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
...
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643
added optimizations for steamroller
2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
2014-12-22 17:01:18 +01:00
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
2014-12-18 20:35:51 +01:00
Werner Saar
bc5fff7085
changed inline assembler labels to short form
2014-12-07 12:38:54 +01:00
Zhang Xianyi
0cf29ba6d2
Fixed a bug of sgemm sandy bridge kernel.
...
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00