Commit Graph

1318 Commits

Author SHA1 Message Date
wernsaar 7f910010a0 optimized sgemv_n kernel for small sizes 2014-09-04 13:09:27 +02:00
wernsaar 3a5d8dbff9 optimized sgemv_n_4.c 2014-09-03 15:34:30 +02:00
wernsaar 2a60c6d4b0 optimized sgemv_n for small sizes 2014-09-03 14:48:45 +02:00
wernsaar 0fc560ba23 bugfix for buffer overflow 2014-09-03 10:13:47 +02:00
wernsaar d1800397f5 optimized interface/gemv.c for multithreading 2014-09-02 17:36:07 +02:00
wernsaar f4ff889491 updated interface/gemv.c for multithreading 2014-09-02 16:30:04 +02:00
wernsaar 210bec9111 added plot-header to compare multithreading 2014-09-02 14:11:42 +02:00
wernsaar f3b50dcf5b removed obsolete instructions from sgemv_t_4.c 2014-09-02 13:35:41 +02:00
wernsaar 93eaba959d optimized sgemv_t for bulldozer 2014-09-02 12:42:36 +02:00
wernsaar 9570e56965 optimized sgemv_t_4.c for small sizes 2014-09-01 15:11:37 +02:00
wernsaar d7f91f8b4f extended gemv.c benchmark 2014-09-01 15:07:36 +02:00
wernsaar 53f1277b6b modified benchmark/gemv.c 2014-08-31 15:38:18 +02:00
wernsaar bc99faef1b optimized sgemv_t_4.c for uneven sizes 2014-08-31 14:33:15 +02:00
wernsaar 848c0f16f7 optimized sgemv_t_4.c for small size 2014-08-31 13:23:44 +02:00
wernsaar e2fc8c8c2c changed 1 test value (bug in lapack-testing?) 2014-08-30 13:58:02 +02:00
wernsaar 53e6dbf6ca optimized sgemv_t kernel for small sizes 2014-08-30 13:36:27 +02:00
Zhang Xianyi 868f8a8756 Merge pull request #443 from idunham/fix
Workaround PIC limitations in cpuid.
2014-08-29 13:31:06 +08:00
Isaac Dunham db7e6366cd Workaround PIC limitations in cpuid.
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.

Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
2014-08-28 13:05:07 -07:00
Zhang Xianyi 2702323f7d Merge pull request #440 from wernsaar/develop
optimizations for leve1 and level2 blas functions
2014-08-28 12:43:54 +08:00
wernsaar 20cd850125 modification for clang compiler 2014-08-27 09:00:20 +02:00
wernsaar 5fa6158731 renoved flag no-integrated-as, because not working on macosx 2014-08-26 18:29:40 +02:00
wernsaar 84badf8086 EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system 2014-08-26 17:36:32 +02:00
Zhang Xianyi c8cc4a0d22 Fixed the typo in Changelog.txt 2014-08-26 16:14:34 +08:00
wernsaar 3885eebdb8 added optimized zaxpy bulldozer kernel 2014-08-25 15:52:35 +02:00
wernsaar ee74445155 added optimized caxpy kernel for bulldozer 2014-08-25 14:53:28 +02:00
wernsaar 9d2ace8bac added optimized daxpy kernel for bulldozer 2014-08-24 10:57:12 +02:00
wernsaar b55f997302 added optimized daxpy kernel for nehalem 2014-08-23 17:53:07 +02:00
wernsaar 29125864b3 updated gemm.c 2014-08-23 17:28:01 +02:00
wernsaar e45c960c2c added optimized saxpy kernel for nehalem 2014-08-23 17:15:21 +02:00
wernsaar 55e81da379 added axpy benchmark-test 2014-08-23 13:12:44 +02:00
wernsaar ac76b6267f added optimized dgemv_n kernel for nehalem 2014-08-23 10:40:57 +02:00
wernsaar f1b96c4846 added optimized ddot kernel for bulldozer 2014-08-22 21:19:29 +02:00
wernsaar 16d6be852d added optimized ddot kernel for nehalem 2014-08-22 20:34:41 +02:00
wernsaar 53ec5789e2 bugfix for Makefile 2014-08-22 17:02:55 +02:00
wernsaar 95a707ced3 update of KERNEL.BULLDOZER 2014-08-22 17:01:27 +02:00
wernsaar 5d97b0754c added optimized sdot kernel for nehalem 2014-08-22 17:00:26 +02:00
wernsaar 8a9e868919 added optimized sdot for bulldozer 2014-08-22 14:29:17 +02:00
wernsaar 7e404de3de bugfix in Makefile 2014-08-22 11:51:30 +02:00
wernsaar e4472ad850 added sdot and ddot benchmarks 2014-08-22 11:42:07 +02:00
wernsaar fb0b4552a5 added hemv benchmark 2014-08-22 10:00:09 +02:00
wernsaar 6f73ffc114 added benchmarks for csymv and zsymv 2014-08-21 19:33:57 +02:00
wernsaar c8b0645266 added optimized symv_L kernels for nehalem 2014-08-21 14:27:00 +02:00
wernsaar ec05ff3f64 added optimized ssymv_L kernel for bulldozer 2014-08-21 13:32:06 +02:00
wernsaar f6f9122660 added optimized dsymv_L kernel for bulldozer 2014-08-21 13:02:53 +02:00
wernsaar 8247f38dc1 added optimized dsymv_U kernel for nehalem 2014-08-20 09:58:04 +02:00
wernsaar ef6374196d updated optimized dsymv_U kernel for bulldozer 2014-08-20 09:00:56 +02:00
wernsaar f824c2b751 updated optimized ssymv_U for bulldozer 2014-08-19 19:25:03 +02:00
wernsaar 4ba4ab623f added optimized ssymv_U kernel for nehalem 2014-08-19 17:09:45 +02:00
wernsaar 4f39447c05 added optimized ssymv_U kernel for bulldozer 2014-08-18 13:52:24 +02:00
wernsaar 74c9465672 added optimized dsymv_U kernel for bulldozer 2014-08-18 12:18:10 +02:00