wernsaar
53f1277b6b
modified benchmark/gemv.c
2014-08-31 15:38:18 +02:00
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
2014-08-31 14:33:15 +02:00
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
2014-08-31 13:23:44 +02:00
wernsaar
e2fc8c8c2c
changed 1 test value (bug in lapack-testing?)
2014-08-30 13:58:02 +02:00
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
2014-08-30 13:36:27 +02:00
Zhang Xianyi
868f8a8756
Merge pull request #443 from idunham/fix
...
Workaround PIC limitations in cpuid.
2014-08-29 13:31:06 +08:00
Isaac Dunham
db7e6366cd
Workaround PIC limitations in cpuid.
...
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.
Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
2014-08-28 13:05:07 -07:00
Zhang Xianyi
2702323f7d
Merge pull request #440 from wernsaar/develop
...
optimizations for leve1 and level2 blas functions
2014-08-28 12:43:54 +08:00
wernsaar
20cd850125
modification for clang compiler
2014-08-27 09:00:20 +02:00
wernsaar
5fa6158731
renoved flag no-integrated-as, because not working on macosx
2014-08-26 18:29:40 +02:00
wernsaar
84badf8086
EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system
2014-08-26 17:36:32 +02:00
Zhang Xianyi
c8cc4a0d22
Fixed the typo in Changelog.txt
2014-08-26 16:14:34 +08:00
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
2014-08-25 15:52:35 +02:00
wernsaar
ee74445155
added optimized caxpy kernel for bulldozer
2014-08-25 14:53:28 +02:00
wernsaar
9d2ace8bac
added optimized daxpy kernel for bulldozer
2014-08-24 10:57:12 +02:00
wernsaar
b55f997302
added optimized daxpy kernel for nehalem
2014-08-23 17:53:07 +02:00
wernsaar
29125864b3
updated gemm.c
2014-08-23 17:28:01 +02:00
wernsaar
e45c960c2c
added optimized saxpy kernel for nehalem
2014-08-23 17:15:21 +02:00
wernsaar
55e81da379
added axpy benchmark-test
2014-08-23 13:12:44 +02:00
wernsaar
ac76b6267f
added optimized dgemv_n kernel for nehalem
2014-08-23 10:40:57 +02:00
wernsaar
f1b96c4846
added optimized ddot kernel for bulldozer
2014-08-22 21:19:29 +02:00
wernsaar
16d6be852d
added optimized ddot kernel for nehalem
2014-08-22 20:34:41 +02:00
wernsaar
53ec5789e2
bugfix for Makefile
2014-08-22 17:02:55 +02:00
wernsaar
95a707ced3
update of KERNEL.BULLDOZER
2014-08-22 17:01:27 +02:00
wernsaar
5d97b0754c
added optimized sdot kernel for nehalem
2014-08-22 17:00:26 +02:00
wernsaar
8a9e868919
added optimized sdot for bulldozer
2014-08-22 14:29:17 +02:00
wernsaar
7e404de3de
bugfix in Makefile
2014-08-22 11:51:30 +02:00
wernsaar
e4472ad850
added sdot and ddot benchmarks
2014-08-22 11:42:07 +02:00
wernsaar
fb0b4552a5
added hemv benchmark
2014-08-22 10:00:09 +02:00
wernsaar
6f73ffc114
added benchmarks for csymv and zsymv
2014-08-21 19:33:57 +02:00
wernsaar
c8b0645266
added optimized symv_L kernels for nehalem
2014-08-21 14:27:00 +02:00
wernsaar
ec05ff3f64
added optimized ssymv_L kernel for bulldozer
2014-08-21 13:32:06 +02:00
wernsaar
f6f9122660
added optimized dsymv_L kernel for bulldozer
2014-08-21 13:02:53 +02:00
wernsaar
8247f38dc1
added optimized dsymv_U kernel for nehalem
2014-08-20 09:58:04 +02:00
wernsaar
ef6374196d
updated optimized dsymv_U kernel for bulldozer
2014-08-20 09:00:56 +02:00
wernsaar
f824c2b751
updated optimized ssymv_U for bulldozer
2014-08-19 19:25:03 +02:00
wernsaar
4ba4ab623f
added optimized ssymv_U kernel for nehalem
2014-08-19 17:09:45 +02:00
wernsaar
4f39447c05
added optimized ssymv_U kernel for bulldozer
2014-08-18 13:52:24 +02:00
wernsaar
74c9465672
added optimized dsymv_U kernel for bulldozer
2014-08-18 12:18:10 +02:00
Zhang Xianyi
a7126c2ce4
Merge branch 'develop'
2014-08-18 11:16:14 +08:00
Zhang Xianyi
a69dd3fbc5
OpenBLAS 0.2.11 version.
2014-08-18 11:15:42 +08:00
wernsaar
101dd08173
add reference in C for symv_U
2014-08-16 13:52:50 +02:00
wernsaar
493d4fe7e5
added reference in C for symv_L
2014-08-16 11:36:48 +02:00
wernsaar
0a22816e70
Ref #433 : removed obsolete lapack entries from common_interface.h
2014-08-15 12:40:10 +02:00
Zhang Xianyi
c3cd6e7e32
Merge pull request #434 from wernsaar/develop
...
A lot of performance enhancements
2014-08-15 08:07:27 +08:00
wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
c1a6374c6f
optimized zgemv_n kernel for sandybridge
2014-08-13 16:10:03 +02:00
wernsaar
dc05937313
added additional test values
2014-08-13 14:54:50 +02:00
wernsaar
2470129132
added fast return, if m or n < 1
2014-08-13 13:54:19 +02:00