Zhang Xianyi
c44ff4d648
Refs #714 . avoid compiling warnings.
2016-01-28 04:38:07 +08:00
Werner Saar
c8f2c5d636
added optimized trsm_kernels
2016-01-05 13:05:05 +01:00
Zhang Xianyi
69363622a8
Fix DYNAMIC_ARCH=1 bug.
2015-10-27 05:10:40 +08:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
ab0a0a75fc
Merge branch 'develop' into cmake
2015-08-03 23:59:01 -05:00
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
...
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
Zhang Xianyi
7ac7e147d4
Fixed cmake building bugs on Linux. Disable LAPACK by default.
2015-08-04 04:37:05 +08:00
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
2015-06-13 16:16:29 +02:00
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
2015-06-13 10:28:27 +02:00
Werner Saar
24f58c8bb1
added optimized cscal and zscal kernels for steamroller
2015-05-18 12:40:07 +02:00
Werner Saar
95b1faf667
added optimized cscal and zscal kernels for steamroller and piledriver
2015-05-18 10:50:57 +02:00
Werner Saar
2d9e406050
added optimized cscal kernel for sandybridge
2015-05-18 08:46:06 +02:00
Werner Saar
59083e3ce1
added optimized cscal kernel for bulldozer
2015-05-18 07:33:52 +02:00
wernsaar
685be40339
Merge pull request #571 from wernsaar/develop
...
added optimized cscal and zscal functions
2015-05-17 14:09:14 +02:00
Werner Saar
31c9e399e9
added optimized cscal kernel for haswell
2015-05-17 13:44:09 +02:00
Werner Saar
7de6bb9889
added optimized zscal kernel for bulldozer
2015-05-17 11:45:19 +02:00
Werner Saar
d63034303b
added optimized zscal kernel for haswell
2015-05-16 16:41:45 +02:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
Werner Saar
18e90ee2e3
bugfix: added static to functions
2015-05-13 13:31:26 +02:00
Werner Saar
e00cccc41e
added optimized dscal kernel for piledriver
2015-05-13 13:05:35 +02:00
Werner Saar
73f09bf64f
optimized dscal kernel for increment != 1
2015-05-13 12:14:39 +02:00
Werner Saar
02e772c7e4
added optimized dscal kernel for haswell
2015-05-12 17:19:58 +02:00
Werner Saar
7aee913991
added optimized dscal kernel for sandybridge
2015-05-12 16:27:43 +02:00
Werner Saar
e50a933037
added optimized dscal kernel for bulldozer
2015-05-12 12:28:44 +02:00
Werner Saar
133c11a156
updated dgemv_n kernel for nehalem
2015-04-30 14:38:06 +02:00
Werner Saar
30f52d53df
optimized dgemv_n kernel for haswell
2015-04-30 12:11:39 +02:00
Werner Saar
5e83d80725
optimized dger kernel for sandybridge
2015-04-28 16:58:11 +02:00
Werner Saar
b2e1797dc6
added optimized sger kernel for sandybridge
2015-04-28 15:33:38 +02:00
Werner Saar
e216f686cb
optimized saxpy and daxpy for sandybridge
2015-04-28 10:18:32 +02:00
Werner Saar
fc0e0391f3
bugfixes: replaced int with BLASLONG
2015-04-24 14:30:44 +02:00
Werner Saar
c22068c406
optimized sdot.c for increments != 1
2015-04-24 13:13:20 +02:00
Werner Saar
dee100d0e4
optimized saxpy.c for increments != 1
2015-04-24 11:52:59 +02:00
Werner Saar
0273966abb
optimized daxpy kernel for increments != 1
2015-04-24 11:39:17 +02:00
Werner Saar
3a67daa954
optimized ddot.c for increments != 1
2015-04-24 10:56:55 +02:00
Werner Saar
b4f2153dcd
added optimized ssymv kernels for sandybridge
2015-04-23 12:19:24 +02:00
Werner Saar
1c4b0eeae3
added optimized ssymv kernels for haswell
2015-04-23 10:23:13 +02:00
Werner Saar
1bec9abb9a
added optimized dsymv kernels for sandybridge
2015-04-22 12:09:43 +02:00
Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
2015-04-22 10:42:50 +02:00
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
2015-04-16 11:19:37 +02:00
Zhang Xianyi
37b9033c90
Merge pull request #543 from jeromerobert/develop
...
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
2015-04-15 11:18:14 -05:00
Werner Saar
13889515b3
added optimized caxpy-kernel for sandybridge
2015-04-15 16:29:25 +02:00
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
2015-04-15 15:16:31 +02:00
Werner Saar
e9f33b4ca7
added optimized caxpy-kernel for steamroller
2015-04-15 13:49:23 +02:00
Werner Saar
f5d847122a
updated caxpy_microk_bulldozer-2.c and caxpy.c
2015-04-15 11:59:38 +02:00
Jerome Robert
a4c96eca67
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
...
Refs #478 , #482 , 9798481
, fd9fd42
2015-04-15 11:46:48 +02:00
Werner Saar
baa0363ea2
add optimized ddot-kernel for piledriver
2015-04-14 15:09:13 +02:00
Werner Saar
34ba66606a
add optimized daxpy-kernel for piledriver
2015-04-14 14:23:29 +02:00
Werner Saar
f615dc7603
added optimized saxpy kernel for steamroller
2015-04-14 09:09:39 +02:00
Werner Saar
331c417637
optimized saxpy for piledriver
2015-04-14 08:34:11 +02:00
Werner Saar
d7a17ad85d
optimized sdot-kernel for pilediver
2015-04-13 13:19:21 +02:00
Werner Saar
d35f6c63c2
add optimized daxpy-kernel for steamroller
2015-04-13 12:22:43 +02:00
Werner Saar
166d76e864
added optimized sdot-kernel for steamroller
2015-04-11 08:48:18 +02:00
Werner Saar
f9f127d838
added optimized ddot kernel for steamroller
2015-04-10 16:18:03 +02:00
wernsaar
62231ab337
Merge pull request #538 from wernsaar/develop
...
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7
updated cdot and zdot
2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a
add optimized cdot- and zdot-kernel for sandybridge
2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8
updated cdot and zdot for piledriver
2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972
added optimized cdot- and zdot-kernel for steamroller
2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6
added optimized cdot- and zdot-kernels for bulldozer
2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d
Refs #535 . Fix the wrong vector instruction in sgemm sandy bridge kernel.
2015-04-08 03:55:49 +08:00
Werner Saar
60c6dec6e6
updated some lines for bulldozer
2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35
added optimized saxpy- and daxpy-kernel for sandybridge
2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
2015-04-06 12:33:16 +02:00
Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5
removed double definition line
2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
2015-04-05 17:57:53 +02:00
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
...
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643
added optimizations for steamroller
2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
2014-12-22 17:01:18 +01:00
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
2014-12-18 20:35:51 +01:00
Werner Saar
bc5fff7085
changed inline assembler labels to short form
2014-12-07 12:38:54 +01:00
Zhang Xianyi
0cf29ba6d2
Fixed a bug of sgemm sandy bridge kernel.
...
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
2014-11-24 15:34:48 +08:00
wernsaar
b7c9566eea
removed obsolete gemv kernel files
2014-09-14 11:00:53 +02:00
wernsaar
6df1b0be81
optimized zgemv_n_microk_sandy-4.c
2014-09-14 10:21:22 +02:00
wernsaar
2ac1e076c1
added optimized zgemv_n kernel for sandybridge
2014-09-14 09:02:05 +02:00
wernsaar
9908b6031c
bugfix in KERNEL.PILEDRIVER
2014-09-13 16:26:53 +02:00
wernsaar
8f100a14f2
optimized cgemv_t kernel for haswell
2014-09-13 16:13:27 +02:00
wernsaar
53b5726b04
added optimized cgemv_t kernel for haswell
2014-09-13 15:14:12 +02:00
wernsaar
1a352b24e6
updated KERNEL.HASWELL
2014-09-13 12:23:27 +02:00
wernsaar
5194818d4b
updated zgemv_t_4.c
2014-09-13 09:48:34 +02:00
wernsaar
8a39cdb1c1
added optimized zgemv_t kernel for haswell
2014-09-13 09:47:07 +02:00
wernsaar
0a1390f2d8
enabled optimized zgemv_t kernel for bulldozer
2014-09-12 17:43:47 +02:00
wernsaar
a8b0812feb
optimized zgemv_t for bulldozer
2014-09-12 17:42:25 +02:00
wernsaar
a0fb68ab42
added optimized zgemv_t kernel for bulldozer
2014-09-12 17:04:22 +02:00
wernsaar
44c11165d5
bugfix in cgemv_t_4.c
2014-09-12 14:12:24 +02:00
wernsaar
564be4eb72
added optimized cgemv_t kernel
2014-09-12 13:38:01 +02:00
wernsaar
107c3ea7d5
added optimized zgemv_t routine
2014-09-12 12:35:20 +02:00
wernsaar
bb8d698335
optimized zgemv_n_microk_haswell-4.c for small size
2014-09-11 13:44:55 +02:00
wernsaar
e0192a6914
bugfix in zgemv_n_4.c
2014-09-11 13:18:00 +02:00
wernsaar
bced4594bb
added optimized zgemv_n kernel
2014-09-11 12:34:57 +02:00
wernsaar
cafba99b6b
bufix in cgemv_n_microk_haswell-4.c
2014-09-11 11:12:44 +02:00
wernsaar
ac8f232b2a
more optimizations
2014-09-11 10:25:48 +02:00
wernsaar
f98e1244c4
optimized cgemv_n_4.c
2014-09-10 19:26:14 +02:00
wernsaar
be95700b30
added optimized cgemv_kernel for haswell
2014-09-10 14:11:24 +02:00
wernsaar
4aa534ae93
added cgemv_n kernel, optimized for small sizes
2014-09-10 13:45:13 +02:00
wernsaar
baa46e4fba
added and tested optimized dgemv_n kernel for haswell
2014-09-09 16:17:45 +02:00
wernsaar
faab7a181d
added optimized dgemv_n kernel for haswell
2014-09-09 15:32:32 +02:00
wernsaar
8109d8232c
optimized dgemv_t kernel for haswell
2014-09-09 14:38:08 +02:00
wernsaar
debc6d1a05
bugfix in KERNEL.HASWELL
2014-09-09 14:04:44 +02:00
wernsaar
e73a0113ec
added optimized gemv kernels
2014-09-09 13:54:55 +02:00
wernsaar
44f2bf9bae
added optimized dgemv_t kernel for haswell
2014-09-09 13:34:22 +02:00
wernsaar
cd34e9701b
removed obsolete files
2014-09-08 19:15:31 +02:00
wernsaar
658939faaa
optimized dgemv_n kernel for small sizes
2014-09-08 15:22:35 +02:00
wernsaar
c4d9d4e5f8
added haswell optimized kernel
2014-09-08 12:25:16 +02:00
wernsaar
7c0a94ff47
bugfix in sgemv_n_microk_haswell-4.c
2014-09-08 10:54:33 +02:00
wernsaar
cbbc80aad3
added optimized sgemv_t kernel for haswell
2014-09-08 10:13:39 +02:00
wernsaar
2be5c7a640
bugfix for windows
2014-09-07 21:48:42 +02:00
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
2014-09-07 21:13:57 +02:00
wernsaar
553e275407
optimized sgemv_n kernel for sandybridge
2014-09-07 20:53:30 +02:00
wernsaar
7b3932b3f3
optimized sgemv_n kernel for nehalem
2014-09-07 19:20:08 +02:00
wernsaar
75207b1148
optimized sgemv_n for very small size of m
2014-09-07 18:23:48 +02:00
wernsaar
274828fa50
optimizations for very small sizes
2014-09-07 13:45:03 +02:00
wernsaar
5ae1731fe6
better optimzations for sgemv_t kernel
2014-09-06 21:28:57 +02:00
wernsaar
c8eaf3ae2d
optimized sgemv_t_4 kernel for very small sizes
2014-09-06 19:41:57 +02:00
wernsaar
3a7ab47ee9
optimized sgemv_t
2014-09-06 18:34:25 +02:00
wernsaar
cf5544b417
optimization for small size
2014-09-06 13:17:56 +02:00
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
2014-09-06 12:08:48 +02:00
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
2014-09-06 08:41:53 +02:00
wernsaar
6df7a88930
optimized sgemv_t for sandybridge
2014-09-05 10:22:50 +02:00
wernsaar
53de943690
bugfix for sgemv_n_4.c
2014-09-04 18:55:52 +02:00
wernsaar
7f910010a0
optimized sgemv_n kernel for small sizes
2014-09-04 13:09:27 +02:00
wernsaar
3a5d8dbff9
optimized sgemv_n_4.c
2014-09-03 15:34:30 +02:00
wernsaar
2a60c6d4b0
optimized sgemv_n for small sizes
2014-09-03 14:48:45 +02:00
wernsaar
0fc560ba23
bugfix for buffer overflow
2014-09-03 10:13:47 +02:00
wernsaar
f3b50dcf5b
removed obsolete instructions from sgemv_t_4.c
2014-09-02 13:35:41 +02:00
wernsaar
93eaba959d
optimized sgemv_t for bulldozer
2014-09-02 12:42:36 +02:00
wernsaar
9570e56965
optimized sgemv_t_4.c for small sizes
2014-09-01 15:11:37 +02:00
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
2014-08-31 14:33:15 +02:00
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
2014-08-31 13:23:44 +02:00
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
2014-08-30 13:36:27 +02:00
wernsaar
20cd850125
modification for clang compiler
2014-08-27 09:00:20 +02:00
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
2014-08-25 15:52:35 +02:00
wernsaar
ee74445155
added optimized caxpy kernel for bulldozer
2014-08-25 14:53:28 +02:00
wernsaar
9d2ace8bac
added optimized daxpy kernel for bulldozer
2014-08-24 10:57:12 +02:00
wernsaar
b55f997302
added optimized daxpy kernel for nehalem
2014-08-23 17:53:07 +02:00
wernsaar
e45c960c2c
added optimized saxpy kernel for nehalem
2014-08-23 17:15:21 +02:00
wernsaar
ac76b6267f
added optimized dgemv_n kernel for nehalem
2014-08-23 10:40:57 +02:00
wernsaar
f1b96c4846
added optimized ddot kernel for bulldozer
2014-08-22 21:19:29 +02:00
wernsaar
16d6be852d
added optimized ddot kernel for nehalem
2014-08-22 20:34:41 +02:00
wernsaar
95a707ced3
update of KERNEL.BULLDOZER
2014-08-22 17:01:27 +02:00
wernsaar
5d97b0754c
added optimized sdot kernel for nehalem
2014-08-22 17:00:26 +02:00
wernsaar
8a9e868919
added optimized sdot for bulldozer
2014-08-22 14:29:17 +02:00
wernsaar
c8b0645266
added optimized symv_L kernels for nehalem
2014-08-21 14:27:00 +02:00
wernsaar
ec05ff3f64
added optimized ssymv_L kernel for bulldozer
2014-08-21 13:32:06 +02:00