Commit Graph

499 Commits

Author SHA1 Message Date
wernsaar
62231ab337 Merge pull request #538 from wernsaar/develop
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7 updated cdot and zdot 2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a add optimized cdot- and zdot-kernel for sandybridge 2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc add optimized cdot- and zdot-kernel for haswell 2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8 updated cdot and zdot for piledriver 2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972 added optimized cdot- and zdot-kernel for steamroller 2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6 added optimized cdot- and zdot-kernels for bulldozer 2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel. 2015-04-08 03:55:49 +08:00
Werner Saar
60c6dec6e6 updated some lines for bulldozer 2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35 added optimized saxpy- and daxpy-kernel for sandybridge 2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287 added optimized saxpy- and daxpy-kernel for haswell 2015-04-06 12:33:16 +02:00
Werner Saar
a901b065d3 added optimized ddot-kernel for sandybridge 2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0 add optimized sdot-kernel for sandybridge 2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5 removed double definition line 2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556 added optimized sdot- and ddot-kernel for HASWELL 2015-04-05 17:57:53 +02:00
Zhang Xianyi
e5b96e55a7 Fix build bug for ARM64. 2015-03-24 15:27:17 -05:00
Zhang Xianyi
ea7f9dacf4 Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1. 2015-02-26 01:47:11 +08:00
Martin Koehler
39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
Zhang Xianyi
229ce2ccd1 Add cortex-a9 and cortex-a15 targets. 2015-01-12 08:55:29 +00:00
Zhang Xianyi
41aad0407f Merge pull request #482 from jeromerobert/develop
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643 added optimizations for steamroller 2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3 Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3 Ref #458: Backport, sandybrigde uses nehalem zgemm kernel 2014-12-22 17:01:18 +01:00
Werner Saar
6261342de3 small optimization on dgemm_kernel for N=1 2014-12-18 20:35:51 +01:00
Werner Saar
bc5fff7085 changed inline assembler labels to short form 2014-12-07 12:38:54 +01:00
Zhang Xianyi
0cf29ba6d2 Fixed a bug of sgemm sandy bridge kernel.
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00
Zhang Xianyi
2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Zhang Xianyi
a85c2785ae Refs #467. Added generic kernel file for x86_64. 2014-11-24 15:34:48 +08:00
Benedikt Huber
58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar
7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar
b7c9566eea removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
wernsaar
6df1b0be81 optimized zgemv_n_microk_sandy-4.c 2014-09-14 10:21:22 +02:00
wernsaar
2ac1e076c1 added optimized zgemv_n kernel for sandybridge 2014-09-14 09:02:05 +02:00
wernsaar
9908b6031c bugfix in KERNEL.PILEDRIVER 2014-09-13 16:26:53 +02:00
wernsaar
8f100a14f2 optimized cgemv_t kernel for haswell 2014-09-13 16:13:27 +02:00
wernsaar
53b5726b04 added optimized cgemv_t kernel for haswell 2014-09-13 15:14:12 +02:00
wernsaar
1a352b24e6 updated KERNEL.HASWELL 2014-09-13 12:23:27 +02:00
wernsaar
5194818d4b updated zgemv_t_4.c 2014-09-13 09:48:34 +02:00
wernsaar
8a39cdb1c1 added optimized zgemv_t kernel for haswell 2014-09-13 09:47:07 +02:00
wernsaar
0a1390f2d8 enabled optimized zgemv_t kernel for bulldozer 2014-09-12 17:43:47 +02:00
wernsaar
a8b0812feb optimized zgemv_t for bulldozer 2014-09-12 17:42:25 +02:00
wernsaar
a0fb68ab42 added optimized zgemv_t kernel for bulldozer 2014-09-12 17:04:22 +02:00
wernsaar
44c11165d5 bugfix in cgemv_t_4.c 2014-09-12 14:12:24 +02:00
wernsaar
564be4eb72 added optimized cgemv_t kernel 2014-09-12 13:38:01 +02:00
wernsaar
107c3ea7d5 added optimized zgemv_t routine 2014-09-12 12:35:20 +02:00
wernsaar
bb8d698335 optimized zgemv_n_microk_haswell-4.c for small size 2014-09-11 13:44:55 +02:00
wernsaar
e0192a6914 bugfix in zgemv_n_4.c 2014-09-11 13:18:00 +02:00
wernsaar
bced4594bb added optimized zgemv_n kernel 2014-09-11 12:34:57 +02:00
wernsaar
cafba99b6b bufix in cgemv_n_microk_haswell-4.c 2014-09-11 11:12:44 +02:00