Commit Graph

1451 Commits

Author SHA1 Message Date
Isaac Dunham db7e6366cd Workaround PIC limitations in cpuid.
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.

Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
2014-08-28 13:05:07 -07:00
Zhang Xianyi 2702323f7d Merge pull request #440 from wernsaar/develop
optimizations for leve1 and level2 blas functions
2014-08-28 12:43:54 +08:00
wernsaar 20cd850125 modification for clang compiler 2014-08-27 09:00:20 +02:00
wernsaar 5fa6158731 renoved flag no-integrated-as, because not working on macosx 2014-08-26 18:29:40 +02:00
wernsaar 84badf8086 EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system 2014-08-26 17:36:32 +02:00
Zhang Xianyi c8cc4a0d22 Fixed the typo in Changelog.txt 2014-08-26 16:14:34 +08:00
wernsaar 3885eebdb8 added optimized zaxpy bulldozer kernel 2014-08-25 15:52:35 +02:00
wernsaar ee74445155 added optimized caxpy kernel for bulldozer 2014-08-25 14:53:28 +02:00
wernsaar 9d2ace8bac added optimized daxpy kernel for bulldozer 2014-08-24 10:57:12 +02:00
wernsaar b55f997302 added optimized daxpy kernel for nehalem 2014-08-23 17:53:07 +02:00
wernsaar 29125864b3 updated gemm.c 2014-08-23 17:28:01 +02:00
wernsaar e45c960c2c added optimized saxpy kernel for nehalem 2014-08-23 17:15:21 +02:00
wernsaar 55e81da379 added axpy benchmark-test 2014-08-23 13:12:44 +02:00
wernsaar ac76b6267f added optimized dgemv_n kernel for nehalem 2014-08-23 10:40:57 +02:00
wernsaar f1b96c4846 added optimized ddot kernel for bulldozer 2014-08-22 21:19:29 +02:00
wernsaar 16d6be852d added optimized ddot kernel for nehalem 2014-08-22 20:34:41 +02:00
wernsaar 53ec5789e2 bugfix for Makefile 2014-08-22 17:02:55 +02:00
wernsaar 95a707ced3 update of KERNEL.BULLDOZER 2014-08-22 17:01:27 +02:00
wernsaar 5d97b0754c added optimized sdot kernel for nehalem 2014-08-22 17:00:26 +02:00
wernsaar 8a9e868919 added optimized sdot for bulldozer 2014-08-22 14:29:17 +02:00
wernsaar 7e404de3de bugfix in Makefile 2014-08-22 11:51:30 +02:00
wernsaar e4472ad850 added sdot and ddot benchmarks 2014-08-22 11:42:07 +02:00
wernsaar fb0b4552a5 added hemv benchmark 2014-08-22 10:00:09 +02:00
wernsaar 6f73ffc114 added benchmarks for csymv and zsymv 2014-08-21 19:33:57 +02:00
wernsaar c8b0645266 added optimized symv_L kernels for nehalem 2014-08-21 14:27:00 +02:00
wernsaar ec05ff3f64 added optimized ssymv_L kernel for bulldozer 2014-08-21 13:32:06 +02:00
wernsaar f6f9122660 added optimized dsymv_L kernel for bulldozer 2014-08-21 13:02:53 +02:00
wernsaar 8247f38dc1 added optimized dsymv_U kernel for nehalem 2014-08-20 09:58:04 +02:00
wernsaar ef6374196d updated optimized dsymv_U kernel for bulldozer 2014-08-20 09:00:56 +02:00
wernsaar f824c2b751 updated optimized ssymv_U for bulldozer 2014-08-19 19:25:03 +02:00
wernsaar 4ba4ab623f added optimized ssymv_U kernel for nehalem 2014-08-19 17:09:45 +02:00
wernsaar 4f39447c05 added optimized ssymv_U kernel for bulldozer 2014-08-18 13:52:24 +02:00
wernsaar 74c9465672 added optimized dsymv_U kernel for bulldozer 2014-08-18 12:18:10 +02:00
Zhang Xianyi a69dd3fbc5 OpenBLAS 0.2.11 version. 2014-08-18 11:15:42 +08:00
wernsaar 101dd08173 add reference in C for symv_U 2014-08-16 13:52:50 +02:00
wernsaar 493d4fe7e5 added reference in C for symv_L 2014-08-16 11:36:48 +02:00
wernsaar 0a22816e70 Ref #433: removed obsolete lapack entries from common_interface.h 2014-08-15 12:40:10 +02:00
Zhang Xianyi c3cd6e7e32 Merge pull request #434 from wernsaar/develop
A lot of performance enhancements
2014-08-15 08:07:27 +08:00
wernsaar 11eab4c019 added optimized cgemv_n for haswell 2014-08-14 19:00:30 +02:00
wernsaar 4568d32b6b added optimized cgemv_t kernel for haswell 2014-08-14 14:10:29 +02:00
wernsaar c1a6374c6f optimized zgemv_n kernel for sandybridge 2014-08-13 16:10:03 +02:00
wernsaar dc05937313 added additional test values 2014-08-13 14:54:50 +02:00
wernsaar 2470129132 added fast return, if m or n < 1 2014-08-13 13:54:19 +02:00
wernsaar 8c582d362d optimized zgemv_t_microk_haswell-2.c 2014-08-13 13:42:22 +02:00
wernsaar 11e34ddd1b bugfix for zgemv_n_microk_haswell-2.c 2014-08-13 12:54:18 +02:00
wernsaar 9528f0d9ee bugfix in zgemv_n_microk_sandy-2.c 2014-08-13 12:18:03 +02:00
wernsaar b06550519e added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
wernsaar 6093ee5363 bugfix in zgemv_n_microk_haswell-2.c 2014-08-12 10:02:25 +02:00
wernsaar 07c66b1960 modified algorithm for better numerical stability 2014-08-12 08:35:42 +02:00
wernsaar 58b075daef added optimized zgemv_t kernel for haswell 2014-08-11 16:57:52 +02:00