Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
2015-03-24 15:27:17 -05:00
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
2015-02-26 01:47:11 +08:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
2015-01-12 08:55:29 +00:00
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
...
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643
added optimizations for steamroller
2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
2014-12-22 17:01:18 +01:00
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
2014-12-18 20:35:51 +01:00
Werner Saar
bc5fff7085
changed inline assembler labels to short form
2014-12-07 12:38:54 +01:00
Zhang Xianyi
0cf29ba6d2
Fixed a bug of sgemm sandy bridge kernel.
...
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
2014-11-24 15:34:48 +08:00
Benedikt Huber
58c90d5937
# The first commit's message is:
...
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
b7c9566eea
removed obsolete gemv kernel files
2014-09-14 11:00:53 +02:00
wernsaar
6df1b0be81
optimized zgemv_n_microk_sandy-4.c
2014-09-14 10:21:22 +02:00
wernsaar
2ac1e076c1
added optimized zgemv_n kernel for sandybridge
2014-09-14 09:02:05 +02:00
wernsaar
9908b6031c
bugfix in KERNEL.PILEDRIVER
2014-09-13 16:26:53 +02:00
wernsaar
8f100a14f2
optimized cgemv_t kernel for haswell
2014-09-13 16:13:27 +02:00
wernsaar
53b5726b04
added optimized cgemv_t kernel for haswell
2014-09-13 15:14:12 +02:00
wernsaar
1a352b24e6
updated KERNEL.HASWELL
2014-09-13 12:23:27 +02:00
wernsaar
5194818d4b
updated zgemv_t_4.c
2014-09-13 09:48:34 +02:00
wernsaar
8a39cdb1c1
added optimized zgemv_t kernel for haswell
2014-09-13 09:47:07 +02:00
wernsaar
0a1390f2d8
enabled optimized zgemv_t kernel for bulldozer
2014-09-12 17:43:47 +02:00
wernsaar
a8b0812feb
optimized zgemv_t for bulldozer
2014-09-12 17:42:25 +02:00
wernsaar
a0fb68ab42
added optimized zgemv_t kernel for bulldozer
2014-09-12 17:04:22 +02:00
wernsaar
44c11165d5
bugfix in cgemv_t_4.c
2014-09-12 14:12:24 +02:00
wernsaar
564be4eb72
added optimized cgemv_t kernel
2014-09-12 13:38:01 +02:00
wernsaar
107c3ea7d5
added optimized zgemv_t routine
2014-09-12 12:35:20 +02:00
wernsaar
bb8d698335
optimized zgemv_n_microk_haswell-4.c for small size
2014-09-11 13:44:55 +02:00
wernsaar
e0192a6914
bugfix in zgemv_n_4.c
2014-09-11 13:18:00 +02:00
wernsaar
bced4594bb
added optimized zgemv_n kernel
2014-09-11 12:34:57 +02:00
wernsaar
cafba99b6b
bufix in cgemv_n_microk_haswell-4.c
2014-09-11 11:12:44 +02:00
wernsaar
ac8f232b2a
more optimizations
2014-09-11 10:25:48 +02:00
wernsaar
f98e1244c4
optimized cgemv_n_4.c
2014-09-10 19:26:14 +02:00
wernsaar
be95700b30
added optimized cgemv_kernel for haswell
2014-09-10 14:11:24 +02:00
wernsaar
4aa534ae93
added cgemv_n kernel, optimized for small sizes
2014-09-10 13:45:13 +02:00
wernsaar
baa46e4fba
added and tested optimized dgemv_n kernel for haswell
2014-09-09 16:17:45 +02:00
wernsaar
faab7a181d
added optimized dgemv_n kernel for haswell
2014-09-09 15:32:32 +02:00
wernsaar
8109d8232c
optimized dgemv_t kernel for haswell
2014-09-09 14:38:08 +02:00
wernsaar
debc6d1a05
bugfix in KERNEL.HASWELL
2014-09-09 14:04:44 +02:00
wernsaar
e73a0113ec
added optimized gemv kernels
2014-09-09 13:54:55 +02:00
wernsaar
44f2bf9bae
added optimized dgemv_t kernel for haswell
2014-09-09 13:34:22 +02:00
wernsaar
cd34e9701b
removed obsolete files
2014-09-08 19:15:31 +02:00
wernsaar
658939faaa
optimized dgemv_n kernel for small sizes
2014-09-08 15:22:35 +02:00
wernsaar
c4d9d4e5f8
added haswell optimized kernel
2014-09-08 12:25:16 +02:00
wernsaar
7c0a94ff47
bugfix in sgemv_n_microk_haswell-4.c
2014-09-08 10:54:33 +02:00
wernsaar
cbbc80aad3
added optimized sgemv_t kernel for haswell
2014-09-08 10:13:39 +02:00
wernsaar
2be5c7a640
bugfix for windows
2014-09-07 21:48:42 +02:00
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
2014-09-07 21:13:57 +02:00
wernsaar
553e275407
optimized sgemv_n kernel for sandybridge
2014-09-07 20:53:30 +02:00
wernsaar
7b3932b3f3
optimized sgemv_n kernel for nehalem
2014-09-07 19:20:08 +02:00
wernsaar
75207b1148
optimized sgemv_n for very small size of m
2014-09-07 18:23:48 +02:00
wernsaar
274828fa50
optimizations for very small sizes
2014-09-07 13:45:03 +02:00
wernsaar
5ae1731fe6
better optimzations for sgemv_t kernel
2014-09-06 21:28:57 +02:00
wernsaar
c8eaf3ae2d
optimized sgemv_t_4 kernel for very small sizes
2014-09-06 19:41:57 +02:00
wernsaar
3a7ab47ee9
optimized sgemv_t
2014-09-06 18:34:25 +02:00
wernsaar
cf5544b417
optimization for small size
2014-09-06 13:17:56 +02:00
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
2014-09-06 12:08:48 +02:00
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
2014-09-06 08:41:53 +02:00
wernsaar
6df7a88930
optimized sgemv_t for sandybridge
2014-09-05 10:22:50 +02:00
wernsaar
53de943690
bugfix for sgemv_n_4.c
2014-09-04 18:55:52 +02:00
wernsaar
7f910010a0
optimized sgemv_n kernel for small sizes
2014-09-04 13:09:27 +02:00
wernsaar
3a5d8dbff9
optimized sgemv_n_4.c
2014-09-03 15:34:30 +02:00
wernsaar
2a60c6d4b0
optimized sgemv_n for small sizes
2014-09-03 14:48:45 +02:00
wernsaar
0fc560ba23
bugfix for buffer overflow
2014-09-03 10:13:47 +02:00
wernsaar
f3b50dcf5b
removed obsolete instructions from sgemv_t_4.c
2014-09-02 13:35:41 +02:00
wernsaar
93eaba959d
optimized sgemv_t for bulldozer
2014-09-02 12:42:36 +02:00
wernsaar
9570e56965
optimized sgemv_t_4.c for small sizes
2014-09-01 15:11:37 +02:00
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
2014-08-31 14:33:15 +02:00
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
2014-08-31 13:23:44 +02:00
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
2014-08-30 13:36:27 +02:00
wernsaar
20cd850125
modification for clang compiler
2014-08-27 09:00:20 +02:00
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
2014-08-25 15:52:35 +02:00
wernsaar
ee74445155
added optimized caxpy kernel for bulldozer
2014-08-25 14:53:28 +02:00
wernsaar
9d2ace8bac
added optimized daxpy kernel for bulldozer
2014-08-24 10:57:12 +02:00
wernsaar
b55f997302
added optimized daxpy kernel for nehalem
2014-08-23 17:53:07 +02:00
wernsaar
e45c960c2c
added optimized saxpy kernel for nehalem
2014-08-23 17:15:21 +02:00
wernsaar
ac76b6267f
added optimized dgemv_n kernel for nehalem
2014-08-23 10:40:57 +02:00
wernsaar
f1b96c4846
added optimized ddot kernel for bulldozer
2014-08-22 21:19:29 +02:00
wernsaar
16d6be852d
added optimized ddot kernel for nehalem
2014-08-22 20:34:41 +02:00
wernsaar
95a707ced3
update of KERNEL.BULLDOZER
2014-08-22 17:01:27 +02:00
wernsaar
5d97b0754c
added optimized sdot kernel for nehalem
2014-08-22 17:00:26 +02:00
wernsaar
8a9e868919
added optimized sdot for bulldozer
2014-08-22 14:29:17 +02:00
wernsaar
c8b0645266
added optimized symv_L kernels for nehalem
2014-08-21 14:27:00 +02:00
wernsaar
ec05ff3f64
added optimized ssymv_L kernel for bulldozer
2014-08-21 13:32:06 +02:00
wernsaar
f6f9122660
added optimized dsymv_L kernel for bulldozer
2014-08-21 13:02:53 +02:00
wernsaar
8247f38dc1
added optimized dsymv_U kernel for nehalem
2014-08-20 09:58:04 +02:00
wernsaar
ef6374196d
updated optimized dsymv_U kernel for bulldozer
2014-08-20 09:00:56 +02:00
wernsaar
f824c2b751
updated optimized ssymv_U for bulldozer
2014-08-19 19:25:03 +02:00
wernsaar
4ba4ab623f
added optimized ssymv_U kernel for nehalem
2014-08-19 17:09:45 +02:00
wernsaar
4f39447c05
added optimized ssymv_U kernel for bulldozer
2014-08-18 13:52:24 +02:00
wernsaar
74c9465672
added optimized dsymv_U kernel for bulldozer
2014-08-18 12:18:10 +02:00
wernsaar
101dd08173
add reference in C for symv_U
2014-08-16 13:52:50 +02:00
wernsaar
493d4fe7e5
added reference in C for symv_L
2014-08-16 11:36:48 +02:00
wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
c1a6374c6f
optimized zgemv_n kernel for sandybridge
2014-08-13 16:10:03 +02:00