Compare commits

..

328 Commits

Author SHA1 Message Date
Zhang Xianyi
034ffa93fa Provide iaxpy and cblas_iaxpy for integer vectors. make INTEGER_PRECISION=1 2015-07-01 03:11:27 +08:00
Zhang Xianyi
3f1b57668e Fix blas lock bug on AArch64. 2015-06-26 11:54:41 +08:00
Zhang Xianyi
d8f18d32c3 Merge pull request #595 from tanderson92/fixTests
Fix test execution when USE_OPENMP=0
2015-06-22 21:54:51 -05:00
wernsaar
bdb5c842fc Merge pull request #596 from wernsaar/develop
optimizations for haswell
2015-06-13 16:44:48 +02:00
Werner Saar
e7c969e164 added optimized dtrmm_kernel for haswell 2015-06-13 16:16:29 +02:00
Werner Saar
9bd962f655 modified haswell parameter dgemm_unroll_n 2015-06-13 10:28:27 +02:00
Thomas Anderson
4f5691e5c0 Fix test execution when USE_OPENMP=0
The standard way to disable OpenMP support is to set USE_OPENMP=0,
as indicated by other checks to see if USE_OPENMP equals 1. The
problem is obviously then that `ifdef USE_OPENMP` is very much not
what we want to test for. This causes tests to fail when no OpenMP
library is installed.
2015-06-12 23:52:07 -07:00
Zhang Xianyi
29293160a4 Fix #593. Change MACOSX_DEPLOYMENT_TARGET to 10.6. 2015-06-08 10:53:50 -05:00
wernsaar
3e33afef2e Merge pull request #592 from wernsaar/develop
added benchmark scripts
2015-06-08 14:22:02 +02:00
Werner Saar
8614057ea9 added benchmark scripts for numpy, octave and R 2015-06-08 14:06:38 +02:00
Werner Saar
7f375f9e8f updated geev benchmark 2015-06-08 12:58:38 +02:00
wernsaar
69c5169e7d Merge pull request #589 from wernsaar/develop
small modification of gemm.c
2015-06-03 12:14:09 +02:00
Werner Saar
e19948baa1 small modification of gemm.c 2015-06-03 09:11:51 +02:00
wernsaar
a2eaf234fc Merge pull request #587 from wernsaar/develop
added gesv benchmark
2015-06-02 15:29:49 +02:00
Werner Saar
6a13a94e71 added gesv benchmark 2015-06-02 13:35:49 +02:00
wernsaar
eff43d3289 Merge pull request #585 from wernsaar/develop
bugfix for benchmark Makefile on MAC
2015-05-31 15:01:54 +02:00
Werner Saar
9c4817d07b bugfix for Makefile on mac 2015-05-31 14:16:51 +02:00
wernsaar
319f3a0451 Merge pull request #584 from wernsaar/develop
bugfixes, to build benchmarks with mingw on Windows OS
2015-05-29 13:27:20 +02:00
Werner Saar
02c7766f68 bugfixes, to build benchmarks with mingw on Windows OS 2015-05-29 12:56:22 +02:00
wernsaar
f38cb67ca8 Merge pull request #581 from wernsaar/develop
bugfix for arm locking
2015-05-23 12:58:15 +02:00
Werner Saar
eea2e30b74 bugfix for arm locking 2015-05-23 11:40:40 +02:00
Werner Saar
19b8fd2aed smp lock bugfix 2015-05-23 10:58:38 +02:00
wernsaar
0cc5212741 Merge pull request #580 from wernsaar/develop
added blas level1 swap  benchmark
2015-05-23 09:46:39 +02:00
Werner Saar
c47c8e8cf5 added blas level1 swap benchmark 2015-05-21 08:51:42 +02:00
Zhang Xianyi
a11555c715 Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7

In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/.
2015-05-20 21:57:27 -05:00
wernsaar
897d03518e Merge pull request #578 from wernsaar/develop
added blas level1 copy benchmark
2015-05-20 11:56:02 +02:00
Werner Saar
23fbc5728e added blas level1 copy benchmark 2015-05-20 11:05:00 +02:00
Zhang Xianyi
6d40fa587f Fix f_check bug. 2015-05-19 12:04:45 -05:00
wernsaar
22dcd79959 Merge pull request #577 from wernsaar/develop
Bugfix for armv6 memory barrier
2015-05-19 10:59:24 +02:00
Werner Saar
ea4df0aad3 Ref #574: Bugfix for armv6 memory barrier 2015-05-19 10:43:12 +02:00
Zhang Xianyi
e127fb8fd8 1) Refs #575. Remove g77 from compiler list.
2) If OpenBLAS cannot find Fortran compiler, it will only build BLAS
(without LAPACK).
2015-05-19 00:01:04 -05:00
wernsaar
7fb718a7d8 Merge pull request #572 from wernsaar/develop
added optimized cscal and zscal functions for steamroller
2015-05-18 13:47:38 +02:00
Werner Saar
24f58c8bb1 added optimized cscal and zscal kernels for steamroller 2015-05-18 12:40:07 +02:00
Werner Saar
95b1faf667 added optimized cscal and zscal kernels for steamroller and piledriver 2015-05-18 10:50:57 +02:00
Werner Saar
2d9e406050 added optimized cscal kernel for sandybridge 2015-05-18 08:46:06 +02:00
Werner Saar
59083e3ce1 added optimized cscal kernel for bulldozer 2015-05-18 07:33:52 +02:00
wernsaar
685be40339 Merge pull request #571 from wernsaar/develop
added optimized cscal and zscal functions
2015-05-17 14:09:14 +02:00
Werner Saar
31c9e399e9 added optimized cscal kernel for haswell 2015-05-17 13:44:09 +02:00
Werner Saar
7de6bb9889 added optimized zscal kernel for bulldozer 2015-05-17 11:45:19 +02:00
Werner Saar
d63034303b added optimized zscal kernel for haswell 2015-05-16 16:41:45 +02:00
Zhang Xianyi
51ff17d46e Add AMD Excavator target. 2015-05-13 16:16:30 -05:00
wernsaar
905534942a Merge pull request #568 from wernsaar/develop
added optimized dscal kernel
2015-05-13 13:48:08 +02:00
Werner Saar
18e90ee2e3 bugfix: added static to functions 2015-05-13 13:31:26 +02:00
Werner Saar
e00cccc41e added optimized dscal kernel for piledriver 2015-05-13 13:05:35 +02:00
Werner Saar
73f09bf64f optimized dscal kernel for increment != 1 2015-05-13 12:14:39 +02:00
Werner Saar
02e772c7e4 added optimized dscal kernel for haswell 2015-05-12 17:19:58 +02:00
Werner Saar
7aee913991 added optimized dscal kernel for sandybridge 2015-05-12 16:27:43 +02:00
Werner Saar
e50a933037 added optimized dscal kernel for bulldozer 2015-05-12 12:28:44 +02:00
Zhang Xianyi
5f9011d6ef Merge pull request #566 from powderluv/develop
Fix build with ALLOC_SHM=0 (Android NDK)
2015-05-11 20:59:12 -05:00
powderluv
ebb9eba987 Fix build with ALLOC_SHM=0 (Android NDK)
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
2015-05-10 00:10:26 -07:00
Zhang Xianyi
8e5a1083bb Refs #532. Improve gemv paralel with small m and large n case.
Splite the matrix and reduction.
2015-05-08 05:33:17 +08:00
Zhang Xianyi
6743beb748 Refs #565. Fix the bug of generate FEXTRALIB. 2015-05-07 13:06:53 +08:00
Zhang Xianyi
bcabf72c08 Refs #565. Merge branch 'andreasnoack-anj/bench' into develop 2015-05-07 12:52:14 +08:00
Andreas Noack
cda29f183b Add vecLib benchmarks 2015-05-06 21:52:34 -04:00
wernsaar
e52d36450a Merge pull request #564 from wernsaar/develop
Use only 1 thread in trsm if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
2015-05-06 11:10:31 +02:00
Werner Saar
f8f2e261fe use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD 2015-05-06 10:41:53 +02:00
Werner Saar
be3c843700 added loops to trsm.c 2015-05-06 09:21:19 +02:00
wernsaar
e6f57db846 Merge pull request #563 from wernsaar/develop
Bugfix for gemm3m tests
2015-05-05 12:13:35 +02:00
Werner Saar
9bfd267d51 bugfix for gemm3m tests 2015-05-05 11:58:59 +02:00
Werner Saar
924bc5372e removed gemm3m functions from normal checks 2015-05-05 11:39:43 +02:00
wernsaar
2b83a69650 Merge pull request #561 from wernsaar/develop
updated dgemv_n sgemv_n kernels
2015-05-04 11:11:13 +02:00
Werner Saar
133c11a156 updated dgemv_n kernel for nehalem 2015-04-30 14:38:06 +02:00
Werner Saar
30f52d53df optimized dgemv_n kernel for haswell 2015-04-30 12:11:39 +02:00
Zhang Xianyi
a124637329 Merge pull request #560 from sebastien-villemot/develop
Fix detection of ARM architectures in c_check.
2015-04-29 11:36:47 -05:00
Sébastien Villemot
642aaba2e0 Fix detection of ARM architectures in c_check.
This is necessary to avoid the false detection of a cross-compiling environment.
2015-04-29 18:14:21 +02:00
wernsaar
4c616173e4 Merge pull request #558 from wernsaar/develop
optimizations for sandybridge
2015-04-28 17:30:16 +02:00
Werner Saar
5e83d80725 optimized dger kernel for sandybridge 2015-04-28 16:58:11 +02:00
Werner Saar
b2e1797dc6 added optimized sger kernel for sandybridge 2015-04-28 15:33:38 +02:00
Werner Saar
e216f686cb optimized saxpy and daxpy for sandybridge 2015-04-28 10:18:32 +02:00
Zhang Xianyi
e42652f772 Merge pull request #554 from wernsaar/develop
added benchmarks for zgeru and cgeru
2015-04-25 08:11:36 -05:00
Werner Saar
e77db2af31 add benchmarks for zgeru and cgeru 2015-04-25 14:53:07 +02:00
Zhang Xianyi
37b00841ac Merge pull request #552 from jeromerobert/develop
gemv: Ensure stack buffer is large enough to handle memory alignment
2015-04-24 14:12:12 -05:00
Werner Saar
fc0e0391f3 bugfixes: replaced int with BLASLONG 2015-04-24 14:30:44 +02:00
wernsaar
da0f27b9ac Merge pull request #553 from wernsaar/develop
optimized some blas level1 kernels for increments != 1
2015-04-24 13:57:48 +02:00
Werner Saar
c22068c406 optimized sdot.c for increments != 1 2015-04-24 13:13:20 +02:00
Werner Saar
dee100d0e4 optimized saxpy.c for increments != 1 2015-04-24 11:52:59 +02:00
Werner Saar
0273966abb optimized daxpy kernel for increments != 1 2015-04-24 11:39:17 +02:00
Werner Saar
3a67daa954 optimized ddot.c for increments != 1 2015-04-24 10:56:55 +02:00
Jerome Robert
ab567d8443 gemv: Ensure stack buffer is large enough to handle memory alignment
Ref #478
2015-04-24 10:12:49 +02:00
wernsaar
3c09cea4b2 Merge pull request #550 from wernsaar/develop
added optimized ssymv kernels for haswell and sandybridge
2015-04-23 13:27:38 +02:00
Werner Saar
b4f2153dcd added optimized ssymv kernels for sandybridge 2015-04-23 12:19:24 +02:00
Werner Saar
1c4b0eeae3 added optimized ssymv kernels for haswell 2015-04-23 10:23:13 +02:00
wernsaar
406d9d64e9 Merge pull request #549 from wernsaar/develop
added optimized dsymv kernels for haswell and sandybridge
2015-04-22 12:36:13 +02:00
Werner Saar
1bec9abb9a added optimized dsymv kernels for sandybridge 2015-04-22 12:09:43 +02:00
Werner Saar
3814bf60d3 added optimized dsymv kernels for haswell 2015-04-22 10:42:50 +02:00
Zhang Xianyi
847e19c04e Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491) 2015-04-20 23:22:40 -05:00
Werner Saar
46c7b4d5c8 added asum benchmark 2015-04-19 11:24:07 +02:00
Werner Saar
8e05d291b5 added scal benchmark 2015-04-18 08:41:41 +02:00
wernsaar
9da555e5f7 Merge pull request #546 from wernsaar/develop
added optimized zaxpy-kernels
2015-04-16 11:36:51 +02:00
Werner Saar
6d0db0151f added optimized zaxpy-kernels 2015-04-16 11:19:37 +02:00
Zhang Xianyi
37b9033c90 Merge pull request #543 from jeromerobert/develop
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
2015-04-15 11:18:14 -05:00
wernsaar
59e7a518c6 Merge pull request #544 from wernsaar/develop
Optimized  caxpy-kernels
2015-04-15 17:04:02 +02:00
Werner Saar
13889515b3 added optimized caxpy-kernel for sandybridge 2015-04-15 16:29:25 +02:00
Werner Saar
248c9340c3 added optimized caxpy-kernel for haswell 2015-04-15 15:16:31 +02:00
Werner Saar
e9f33b4ca7 added optimized caxpy-kernel for steamroller 2015-04-15 13:49:23 +02:00
Werner Saar
f5d847122a updated caxpy_microk_bulldozer-2.c and caxpy.c 2015-04-15 11:59:38 +02:00
Jerome Robert
a4c96eca67 Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Refs #478, #482, 9798481, fd9fd42
2015-04-15 11:46:48 +02:00
wernsaar
fb02cb0a41 Merge pull request #540 from wernsaar/develop
Optimized dot- and axpy-kernels
2015-04-14 15:53:09 +02:00
Werner Saar
baa0363ea2 add optimized ddot-kernel for piledriver 2015-04-14 15:09:13 +02:00
Werner Saar
34ba66606a add optimized daxpy-kernel for piledriver 2015-04-14 14:23:29 +02:00
Werner Saar
f615dc7603 added optimized saxpy kernel for steamroller 2015-04-14 09:09:39 +02:00
Werner Saar
331c417637 optimized saxpy for piledriver 2015-04-14 08:34:11 +02:00
Zhang Xianyi
6c3a0b5d46 Enable MAX_STACK_ALLOC by default. 2015-04-13 23:23:40 -05:00
Zhang Xianyi
fd9fd42936 Refs #478, #482. Fixed bug on previous commit. 2015-04-13 23:22:27 -05:00
Zhang Xianyi
9798481979 Refs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Werner Saar
d7a17ad85d optimized sdot-kernel for pilediver 2015-04-13 13:19:21 +02:00
Werner Saar
d35f6c63c2 add optimized daxpy-kernel for steamroller 2015-04-13 12:22:43 +02:00
Werner Saar
166d76e864 added optimized sdot-kernel for steamroller 2015-04-11 08:48:18 +02:00
Werner Saar
f9f127d838 added optimized ddot kernel for steamroller 2015-04-10 16:18:03 +02:00
wernsaar
62231ab337 Merge pull request #538 from wernsaar/develop
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7 updated cdot and zdot 2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a add optimized cdot- and zdot-kernel for sandybridge 2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc add optimized cdot- and zdot-kernel for haswell 2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8 updated cdot and zdot for piledriver 2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972 added optimized cdot- and zdot-kernel for steamroller 2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6 added optimized cdot- and zdot-kernels for bulldozer 2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel. 2015-04-08 03:55:49 +08:00
Zhang Xianyi
4f680a7d61 Merge pull request #534 from wernsaar/develop
Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge
2015-04-07 12:48:11 -05:00
Werner Saar
ba926e807c added cdot- and zdot benchmark 2015-04-07 11:56:06 +02:00
Werner Saar
60c6dec6e6 updated some lines for bulldozer 2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35 added optimized saxpy- and daxpy-kernel for sandybridge 2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287 added optimized saxpy- and daxpy-kernel for haswell 2015-04-06 12:33:16 +02:00
Zhang Xianyi
1e80b8b0d3 Merge pull request #531 from wernsaar/develop
added optimized sdot- and ddot-kernels for Haswell and Sandybridge
2015-04-05 16:42:39 -05:00
Werner Saar
a901b065d3 added optimized ddot-kernel for sandybridge 2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0 add optimized sdot-kernel for sandybridge 2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5 removed double definition line 2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556 added optimized sdot- and ddot-kernel for HASWELL 2015-04-05 17:57:53 +02:00
Zhang Xianyi
8977b3f235 Refs #529. Support Intel Broadwell by Haswell kernels. 2015-04-02 11:08:03 -05:00
Zhang Xianyi
f6426395ea Merge pull request #527 from xantares/patch-1
fix mingw install
2015-03-30 10:16:11 -05:00
xantares
0ac787eefe fix mingw install 2015-03-30 09:30:55 +02:00
Zhang Xianyi
e5b96e55a7 Fix build bug for ARM64. 2015-03-24 15:27:17 -05:00
Zhang Xianyi
a3491e1e88 Update the doc for 0.2.14. 2015-03-24 15:05:59 -05:00
Zhang Xianyi
e81a5d61e4 Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop 2015-03-24 12:17:12 -05:00
Zhang Xianyi
c674fa32be Add ARM targets. 2015-03-24 12:17:04 -05:00
Zhang Xianyi
e34911a73d Fix compiling bug for ARM with setting BINARY. 2015-03-24 17:15:33 +00:00
Zhang Xianyi
76dcaf2281 Merge pull request #521 from maxlevesque/patch-1
Correct typo /proc/ instead of /pros/
2015-03-21 12:26:35 -05:00
Maximilien Levesque
770fac92eb Correct typo /proc/ instead of /pros/ 2015-03-20 23:25:11 +01:00
Zhang Xianyi
e95d64333a Refs #519. Avoid calling strncpy. 2015-03-19 15:57:22 -05:00
Zhang Xianyi
75c40bcc48 Refs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX. 2015-03-19 11:52:09 -05:00
Zhang Xianyi
b62f9f4120 Merge pull request #518 from ton/issue-508
Fix issue #508
2015-03-18 13:00:07 -05:00
Ton van den Heuvel
b6438dedea Fix issue #508
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Zhang Xianyi
cdefdb21cd Refs #492. Fixed c/zsyr bug with negative incx. 2015-02-26 06:37:03 +08:00
Zhang Xianyi
ea7f9dacf4 Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1. 2015-02-26 01:47:11 +08:00
Zhang Xianyi
bf5dbb7e2a Refs#509. Merge branch 'grisuthedragon-develop' into develop 2015-02-26 01:44:19 +08:00
Martin Koehler
39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
Zhang Xianyi
771b18ae9c Detect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0. 2015-02-08 01:42:48 -06:00
Zhang Xianyi
cfa9392ffa Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread. 2015-02-08 01:30:23 -06:00
Zhang Xianyi
1ccd57ce80 Merge pull request #497 from eschnett/develop
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 23:09:38 -06:00
Erik Schnetter
65a847cd36 Introduce openblas_get_num_threads and openblas_get_num_procs 2015-02-03 12:23:41 -05:00
Zhang Xianyi
07ff001981 Merge pull request #495 from jeromerobert/develop
Fix a segfault in gemv when MAX_STACK_ALLOC is set
2015-01-29 18:23:50 +08:00
Jerome Robert
b17ccb4c5c Fix a segfault in gemv when MAX_STACK_ALLOC is set
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Zhang Xianyi
63c6fcfa0a Merge pull request #490 from eschnett/develop
Move #include statements outside extern "C" blocks
2015-01-13 15:43:56 +08:00
Erik Schnetter
29cb47fc06 Move #include statements outside extern "C" blocks 2015-01-12 21:27:52 -05:00
Zhang Xianyi
4e6c4046f7 Fix cortex-a15 detecting bug. 2015-01-12 09:35:16 +00:00
Zhang Xianyi
229ce2ccd1 Add cortex-a9 and cortex-a15 targets. 2015-01-12 08:55:29 +00:00
Zhang Xianyi
ef75be0e51 Merge pull request #487 from kortschak/dromtg-test
Add test for drotmg bug fixed by 692b14c
2015-01-07 14:13:11 +08:00
kortschak
5344f335a8 Add test for drotmg bug fixed by 692b14c
Test requested in issue xianyi/OpenBLAS#484.

Run tests by applying the following change and then make:

	diff --git a/Makefile.rule b/Makefile.rule
	index bea1fe1..9852ff3 100644
	--- a/Makefile.rule
	+++ b/Makefile.rule
	@@ -140,7 +140,7 @@ NO_AFFINITY = 1

	-# UTEST_CHECK = 1
	+UTEST_CHECK = 1
2015-01-07 10:06:55 +10:30
Zhang Xianyi
5cb5af9333 Add configuration options. 2015-01-02 02:42:32 +08:00
Zhang Xianyi
41aad0407f Merge pull request #482 from jeromerobert/develop
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Zhang Xianyi
f8f2e84659 Merge pull request #486 from wernsaar/develop
Optimizations for steamroller
2014-12-31 02:36:23 +08:00
Werner Saar
34633fef01 Merge branch 'develop' of github.com:wernsaar/OpenBLAS into develop 2014-12-30 20:16:53 +08:00
Werner Saar
ddf983d643 added optimizations for steamroller 2014-12-30 20:14:45 +08:00
Zhang Xianyi
17b9db20f1 Merge pull request #483 from wernsaar/develop
added Steamroller as a  cpu target
2014-12-29 12:00:16 +08:00
Werner Saar
0dc559ed30 bugfix in dynamic.c 2014-12-28 17:15:42 +01:00
Werner Saar
9566f5fdb0 added Steamroller as a target processor 2014-12-28 13:45:19 +01:00
Werner Saar
4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3 Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478
2014-12-27 14:33:12 +01:00
Zhang Xianyi
cbb3ab80e7 Merge pull request #481 from eschnett/develop
Correct ilaver C declaration
2014-12-26 10:09:19 +08:00
Erik Schnetter
cd9868b1b4 Correct ilaver C declaration 2014-12-25 17:41:17 -05:00
Zhang Xianyi
eb738148fe Merge pull request #479 from wernsaar/develop
workaround for sandybridge zgemm kernel
2014-12-23 00:59:41 +08:00
Werner Saar
587e16fba3 Ref #458: Backport, sandybrigde uses nehalem zgemm kernel 2014-12-22 17:01:18 +01:00
Werner Saar
4de7b9ae47 increased NMAX to 128 2014-12-22 14:04:27 +01:00
Werner Saar
887aed634d modified sources for OS Darwin 2014-12-19 12:40:46 +01:00
Werner Saar
6261342de3 small optimization on dgemm_kernel for N=1 2014-12-18 20:35:51 +01:00
Werner Saar
1e566223ed added code for the size of n 2014-12-17 15:02:11 +01:00
Werner Saar
113b48ca22 modified makefile for acml6.1 2014-12-17 14:12:21 +01:00
Zhang Xianyi
3e81c99b6b Fixed installation bug on Mac OSX. 2014-12-13 13:05:06 +08:00
Werner Saar
ec85c4a51d Increased the Threshold value in sep.in 2014-12-11 14:57:41 +01:00
Werner Saar
97de657d38 added tests to sep.as as workaround for gfortran-4.8.x 2014-12-11 13:53:59 +01:00
Zhang Xianyi
71966eba6c Merge pull request #475 from xantares/patch-2
add OpenBLAS_VERSION to cmake config file
2014-12-09 17:57:43 +08:00
Zhang Xianyi
a359979e17 Merge pull request #474 from xantares/patch-1
set OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>
2014-12-09 17:57:16 +08:00
xantares
7a6a141bc4 add OpenBLAS_VERSION to cmake config file 2014-12-09 10:34:41 +01:00
xantares
b8ff6892f6 set OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>
usually these files are more often located in this subdir
2014-12-09 10:18:18 +01:00
Zhang Xianyi
8fe7a9ce6f Merge pull request #473 from wernsaar/develop
changed inline assembler labels to short form
2014-12-08 13:22:18 +08:00
Werner Saar
bc5fff7085 changed inline assembler labels to short form 2014-12-07 12:38:54 +01:00
Zhang Xianyi
1943ea91a8 Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop 2014-12-03 23:03:48 +08:00
Zhang Xianyi
f5424fc9de Update the doc for 0.2.13 version. 2014-12-03 23:00:29 +08:00
Zhang Xianyi
0cf29ba6d2 Fixed a bug of sgemm sandy bridge kernel.
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00
Zhang Xianyi
50e18033e6 Merge pull request #471 from nolta/patch-4
c_check: set $hostarch to x86_64 instead of amd64
2014-12-03 12:53:20 +08:00
Zhang Xianyi
551b55d1c7 Merge pull request #470 from nolta/patch-3
fix fortran compiler detection on FreeBSD
2014-12-03 12:50:46 +08:00
Mike Nolta
271ceb8bae c_check: set $hostarch to x86_64 instead of amd64
`uname -m` returns "amd64" on some systems.
2014-12-02 21:23:23 -05:00
Mike Nolta
5f846be2e4 fix fortran compiler detection on FreeBSD
On FreeBSD, passing extra options to `which` causes it to report a non-zero status:

```
$ which gfortran48 -m64
/usr/local/bin/gfortran48
$ echo $?
1
```

```
$ which gfortran48
/usr/local/bin/gfortran48
$ echo $?
0
```
2014-12-02 20:47:40 -05:00
Zhang Xianyi
fe7dcf98f3 Refs #461. Provide OpenBLASConfig.cmake to support CMake.
If you "make PREFIX=/path/to/OpenBLAS install" ,
The config file will be located in /path/to/OpenBLAS/cmake

Then, you can use "find_package(OpenBLAS)" at CMake.
cmake -DOpenBLAS_DIR=/path/to/OpenBLAS/cmake ..
2014-11-29 02:16:40 +08:00
Zhang Xianyi
2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Zhang Xianyi
a85c2785ae Refs #467. Added generic kernel file for x86_64. 2014-11-24 15:34:48 +08:00
Zhang Xianyi
4806715c97 Fixed #456. Merged the optimizations for APM's
xgene-1 (aarch64).
Merge branch 'benedikt-huber-dave-patch' into develop
2014-11-11 22:21:04 +08:00
Benedikt Huber
58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
Zhang Xianyi
2987bc7b40 refs #464. Fixed the bug of detecting L2 associative on x86. 2014-11-10 17:15:34 +08:00
Zhang Xianyi
695e0fa649 #463 fixed a compiling bug on AIX. 2014-11-10 14:39:56 +08:00
Zhang Xianyi
cbb23c46c2 Merge pull request #459 from tkelman/symbol-rename
add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
2014-10-25 19:49:03 +08:00
Tony Kelman
0b4602b753 add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
for adding a prefix or suffix to all exported symbol names in the shared library
Useful to avoid conflicts with other BLAS libraries, especially when using
64 bit integer interfaces in OpenBLAS

Note that since OSX does not have the objcopy utility, setting these options
to non-empty values on Mac requires the objconv tool, available (GPL license)
from http://www.agner.org/optimize/#objconv
2014-10-24 22:27:09 -07:00
Zhang Xianyi
ac5a7e1c1b Update dot to 0.2.12 version. 2014-10-13 17:10:12 +08:00
wernsaar
f1b9a4a1ca Ref #454: fixed bug in common_param.h 2014-09-23 11:34:29 +02:00
Zhang Xianyi
ae6b7caf32 Merge pull request #453 from wernsaar/develop
Enabled GEMM3M functions
2014-09-22 16:47:54 +08:00
wernsaar
f446d2368a updated cblas.h and cblas_noconst.h 2014-09-21 13:39:15 +02:00
wernsaar
dab4edd069 added benchmark for gemm3m functions 2014-09-21 12:00:41 +02:00
wernsaar
9d7057366d bugfix for GEMM3M functions 2014-09-21 11:41:43 +02:00
wernsaar
7f234f8ed1 added GEMM3M tests 2014-09-21 10:55:08 +02:00
wernsaar
9e829ce98f enabled cblas gemm3m functions 2014-09-20 17:20:02 +02:00
wernsaar
d49fd33885 disabled SYMM3M and HEMM3M functions because segment violations 2014-09-20 15:27:40 +02:00
wernsaar
f0f9b25bb6 added test for CGEMM3M function 2014-09-20 14:53:30 +02:00
wernsaar
7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar
7a911569b8 added test for GEMM3M functions 2014-09-20 14:21:42 +02:00
wernsaar
466bfb8b86 updated README.md 2014-09-17 16:01:07 +02:00
Zhang Xianyi
70d1ba09b2 Update the doc for target list. 2014-09-17 14:29:21 +08:00
Zhang Xianyi
d293b78b64 Merge pull request #451 from eshelman/patch-1
Add HASWELL to TargetList.txt
2014-09-17 14:20:06 +08:00
Eliot Eshelman
9912dbbcf9 Add HASWELL to TargetList.txt
The Intel "Haswell" architecture is missing from the list of build targets.
2014-09-16 18:26:45 -04:00
Zhang Xianyi
01bc462e8e Merge pull request #449 from wernsaar/develop
optimized multithreading lower limits
2014-09-16 14:33:48 +08:00
wernsaar
3300f5ebff optimized multithreading lower limits 2014-09-15 11:38:25 +02:00
Zhang Xianyi
59e2c20557 Merge pull request #448 from wernsaar/develop
Optimized cgemv and zgemv kernels
2014-09-15 13:12:14 +08:00
wernsaar
b7c9566eea removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
wernsaar
6df1b0be81 optimized zgemv_n_microk_sandy-4.c 2014-09-14 10:21:22 +02:00
wernsaar
2ac1e076c1 added optimized zgemv_n kernel for sandybridge 2014-09-14 09:02:05 +02:00
wernsaar
9908b6031c bugfix in KERNEL.PILEDRIVER 2014-09-13 16:26:53 +02:00
wernsaar
8f100a14f2 optimized cgemv_t kernel for haswell 2014-09-13 16:13:27 +02:00
wernsaar
53b5726b04 added optimized cgemv_t kernel for haswell 2014-09-13 15:14:12 +02:00
wernsaar
1a352b24e6 updated KERNEL.HASWELL 2014-09-13 12:23:27 +02:00
wernsaar
5194818d4b updated zgemv_t_4.c 2014-09-13 09:48:34 +02:00
wernsaar
8a39cdb1c1 added optimized zgemv_t kernel for haswell 2014-09-13 09:47:07 +02:00
wernsaar
fd2478c9e2 optimized interface/zgemv.c for multithreading 2014-09-12 19:18:23 +02:00
wernsaar
0a1390f2d8 enabled optimized zgemv_t kernel for bulldozer 2014-09-12 17:43:47 +02:00
wernsaar
a8b0812feb optimized zgemv_t for bulldozer 2014-09-12 17:42:25 +02:00
wernsaar
a0fb68ab42 added optimized zgemv_t kernel for bulldozer 2014-09-12 17:04:22 +02:00
wernsaar
44c11165d5 bugfix in cgemv_t_4.c 2014-09-12 14:12:24 +02:00
wernsaar
564be4eb72 added optimized cgemv_t kernel 2014-09-12 13:38:01 +02:00
wernsaar
107c3ea7d5 added optimized zgemv_t routine 2014-09-12 12:35:20 +02:00
wernsaar
bb8d698335 optimized zgemv_n_microk_haswell-4.c for small size 2014-09-11 13:44:55 +02:00
wernsaar
e0192a6914 bugfix in zgemv_n_4.c 2014-09-11 13:18:00 +02:00
wernsaar
bced4594bb added optimized zgemv_n kernel 2014-09-11 12:34:57 +02:00
wernsaar
cafba99b6b bufix in cgemv_n_microk_haswell-4.c 2014-09-11 11:12:44 +02:00
wernsaar
ac8f232b2a more optimizations 2014-09-11 10:25:48 +02:00
wernsaar
f98e1244c4 optimized cgemv_n_4.c 2014-09-10 19:26:14 +02:00
wernsaar
be95700b30 added optimized cgemv_kernel for haswell 2014-09-10 14:11:24 +02:00
wernsaar
4aa534ae93 added cgemv_n kernel, optimized for small sizes 2014-09-10 13:45:13 +02:00
Zhang Xianyi
1cba8e7b11 Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
2014-09-10 16:31:31 +08:00
Zhang Xianyi
d13e92f07e Merge pull request #445 from wernsaar/develop
A lot of optimizations for gemv kernels
2014-09-10 16:28:14 +08:00
wernsaar
baa46e4fba added and tested optimized dgemv_n kernel for haswell 2014-09-09 16:17:45 +02:00
wernsaar
faab7a181d added optimized dgemv_n kernel for haswell 2014-09-09 15:32:32 +02:00
wernsaar
8109d8232c optimized dgemv_t kernel for haswell 2014-09-09 14:38:08 +02:00
wernsaar
debc6d1a05 bugfix in KERNEL.HASWELL 2014-09-09 14:04:44 +02:00
wernsaar
e73a0113ec added optimized gemv kernels 2014-09-09 13:54:55 +02:00
wernsaar
44f2bf9bae added optimized dgemv_t kernel for haswell 2014-09-09 13:34:22 +02:00
Martin Koehler
a057e5434d add CBLAS interface for s/d/c/zimatcopy 2014-09-09 09:52:13 +02:00
wernsaar
cd34e9701b removed obsolete files 2014-09-08 19:15:31 +02:00
Martin Köhler
7794766d3c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
wernsaar
658939faaa optimized dgemv_n kernel for small sizes 2014-09-08 15:22:35 +02:00
wernsaar
f511807fc0 modified multithreading threshold 2014-09-08 12:27:32 +02:00
wernsaar
c4d9d4e5f8 added haswell optimized kernel 2014-09-08 12:25:16 +02:00
wernsaar
7c0a94ff47 bugfix in sgemv_n_microk_haswell-4.c 2014-09-08 10:54:33 +02:00
wernsaar
cbbc80aad3 added optimized sgemv_t kernel for haswell 2014-09-08 10:13:39 +02:00
wernsaar
2be5c7a640 bugfix for windows 2014-09-07 21:48:42 +02:00
wernsaar
80f7786875 enabled optimized sgemv kernels for piledriver 2014-09-07 21:13:57 +02:00
wernsaar
553e275407 optimized sgemv_n kernel for sandybridge 2014-09-07 20:53:30 +02:00
wernsaar
7b3932b3f3 optimized sgemv_n kernel for nehalem 2014-09-07 19:20:08 +02:00
wernsaar
75207b1148 optimized sgemv_n for very small size of m 2014-09-07 18:23:48 +02:00
wernsaar
274828fa50 optimizations for very small sizes 2014-09-07 13:45:03 +02:00
wernsaar
5ae1731fe6 better optimzations for sgemv_t kernel 2014-09-06 21:28:57 +02:00
wernsaar
c8eaf3ae2d optimized sgemv_t_4 kernel for very small sizes 2014-09-06 19:41:57 +02:00
wernsaar
3a7ab47ee9 optimized sgemv_t 2014-09-06 18:34:25 +02:00
wernsaar
cf5544b417 optimization for small size 2014-09-06 13:17:56 +02:00
wernsaar
d143f84dd2 added optimized sgemv_n kernel for haswell 2014-09-06 12:08:48 +02:00
wernsaar
7794237475 undef WHEREAMI 2014-09-06 11:01:42 +02:00
wernsaar
a64fe9bcc9 added optimized sgemv_n kernel for sandybridge 2014-09-06 08:41:53 +02:00
wernsaar
2021d0f9d6 experimentally removed expensive function calls 2014-09-05 15:05:53 +02:00
wernsaar
6df7a88930 optimized sgemv_t for sandybridge 2014-09-05 10:22:50 +02:00
wernsaar
53de943690 bugfix for sgemv_n_4.c 2014-09-04 18:55:52 +02:00
wernsaar
7f910010a0 optimized sgemv_n kernel for small sizes 2014-09-04 13:09:27 +02:00
wernsaar
3a5d8dbff9 optimized sgemv_n_4.c 2014-09-03 15:34:30 +02:00
wernsaar
2a60c6d4b0 optimized sgemv_n for small sizes 2014-09-03 14:48:45 +02:00
wernsaar
0fc560ba23 bugfix for buffer overflow 2014-09-03 10:13:47 +02:00
wernsaar
d1800397f5 optimized interface/gemv.c for multithreading 2014-09-02 17:36:07 +02:00
wernsaar
f4ff889491 updated interface/gemv.c for multithreading 2014-09-02 16:30:04 +02:00
wernsaar
210bec9111 added plot-header to compare multithreading 2014-09-02 14:11:42 +02:00
wernsaar
f3b50dcf5b removed obsolete instructions from sgemv_t_4.c 2014-09-02 13:35:41 +02:00
wernsaar
93eaba959d optimized sgemv_t for bulldozer 2014-09-02 12:42:36 +02:00
wernsaar
9570e56965 optimized sgemv_t_4.c for small sizes 2014-09-01 15:11:37 +02:00
wernsaar
d7f91f8b4f extended gemv.c benchmark 2014-09-01 15:07:36 +02:00
wernsaar
53f1277b6b modified benchmark/gemv.c 2014-08-31 15:38:18 +02:00
wernsaar
bc99faef1b optimized sgemv_t_4.c for uneven sizes 2014-08-31 14:33:15 +02:00
wernsaar
848c0f16f7 optimized sgemv_t_4.c for small size 2014-08-31 13:23:44 +02:00
wernsaar
e2fc8c8c2c changed 1 test value (bug in lapack-testing?) 2014-08-30 13:58:02 +02:00
wernsaar
53e6dbf6ca optimized sgemv_t kernel for small sizes 2014-08-30 13:36:27 +02:00
Zhang Xianyi
868f8a8756 Merge pull request #443 from idunham/fix
Workaround PIC limitations in cpuid.
2014-08-29 13:31:06 +08:00
Isaac Dunham
db7e6366cd Workaround PIC limitations in cpuid.
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.

Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
2014-08-28 13:05:07 -07:00
Zhang Xianyi
2702323f7d Merge pull request #440 from wernsaar/develop
optimizations for leve1 and level2 blas functions
2014-08-28 12:43:54 +08:00
wernsaar
20cd850125 modification for clang compiler 2014-08-27 09:00:20 +02:00
wernsaar
5fa6158731 renoved flag no-integrated-as, because not working on macosx 2014-08-26 18:29:40 +02:00
wernsaar
84badf8086 EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system 2014-08-26 17:36:32 +02:00
Zhang Xianyi
c8cc4a0d22 Fixed the typo in Changelog.txt 2014-08-26 16:14:34 +08:00
wernsaar
3885eebdb8 added optimized zaxpy bulldozer kernel 2014-08-25 15:52:35 +02:00
wernsaar
ee74445155 added optimized caxpy kernel for bulldozer 2014-08-25 14:53:28 +02:00
wernsaar
9d2ace8bac added optimized daxpy kernel for bulldozer 2014-08-24 10:57:12 +02:00
wernsaar
b55f997302 added optimized daxpy kernel for nehalem 2014-08-23 17:53:07 +02:00
wernsaar
29125864b3 updated gemm.c 2014-08-23 17:28:01 +02:00
wernsaar
e45c960c2c added optimized saxpy kernel for nehalem 2014-08-23 17:15:21 +02:00
wernsaar
55e81da379 added axpy benchmark-test 2014-08-23 13:12:44 +02:00
wernsaar
ac76b6267f added optimized dgemv_n kernel for nehalem 2014-08-23 10:40:57 +02:00
wernsaar
f1b96c4846 added optimized ddot kernel for bulldozer 2014-08-22 21:19:29 +02:00
wernsaar
16d6be852d added optimized ddot kernel for nehalem 2014-08-22 20:34:41 +02:00
wernsaar
53ec5789e2 bugfix for Makefile 2014-08-22 17:02:55 +02:00
wernsaar
95a707ced3 update of KERNEL.BULLDOZER 2014-08-22 17:01:27 +02:00
wernsaar
5d97b0754c added optimized sdot kernel for nehalem 2014-08-22 17:00:26 +02:00
wernsaar
8a9e868919 added optimized sdot for bulldozer 2014-08-22 14:29:17 +02:00
wernsaar
7e404de3de bugfix in Makefile 2014-08-22 11:51:30 +02:00
wernsaar
e4472ad850 added sdot and ddot benchmarks 2014-08-22 11:42:07 +02:00
wernsaar
fb0b4552a5 added hemv benchmark 2014-08-22 10:00:09 +02:00
wernsaar
6f73ffc114 added benchmarks for csymv and zsymv 2014-08-21 19:33:57 +02:00
wernsaar
c8b0645266 added optimized symv_L kernels for nehalem 2014-08-21 14:27:00 +02:00
wernsaar
ec05ff3f64 added optimized ssymv_L kernel for bulldozer 2014-08-21 13:32:06 +02:00
wernsaar
f6f9122660 added optimized dsymv_L kernel for bulldozer 2014-08-21 13:02:53 +02:00
wernsaar
8247f38dc1 added optimized dsymv_U kernel for nehalem 2014-08-20 09:58:04 +02:00
wernsaar
ef6374196d updated optimized dsymv_U kernel for bulldozer 2014-08-20 09:00:56 +02:00
wernsaar
f824c2b751 updated optimized ssymv_U for bulldozer 2014-08-19 19:25:03 +02:00
wernsaar
4ba4ab623f added optimized ssymv_U kernel for nehalem 2014-08-19 17:09:45 +02:00
wernsaar
4f39447c05 added optimized ssymv_U kernel for bulldozer 2014-08-18 13:52:24 +02:00
wernsaar
74c9465672 added optimized dsymv_U kernel for bulldozer 2014-08-18 12:18:10 +02:00
wernsaar
101dd08173 add reference in C for symv_U 2014-08-16 13:52:50 +02:00
wernsaar
493d4fe7e5 added reference in C for symv_L 2014-08-16 11:36:48 +02:00
348 changed files with 59680 additions and 5948 deletions

View File

@@ -117,5 +117,15 @@ In chronological order:
* Isaac Dunham <https://github.com/idunham>
* [2014-08-03] Fixed link error on Linux/musl
* Dave Nuechterlein
* [2014-10-10] trmm and sgemm kernels (optimized for APM's X-Gene 1).
ARMv8 support.
* Dan Kortschak
* [2015-01-07] Added test for drotmg bug #484.
* Ton van den Heuvel <https://github.com/ton>
* [2015-03-18] Fix race condition during shutdown causing a crash in gotoblas_set_affinity().
* [Your name or handle] <[email or website]>
* [Date] [Brief summary of your changes]

View File

@@ -1,6 +1,62 @@
OpenBLAS ChangeLog
====================================================================
Version 0.2.10
Version 0.2.14
24-Mar-2015
common:
* Improve OpenBLASConfig.cmake. (#474, #475. Thanks, xantares.)
* Improve ger and gemv for small matrices by stack allocation.
e.g. make -DMAX_STACK_ALLOC=2048 (#482. Thanks, Jerome Robert.)
* Introduce openblas_get_num_threads and openblas_get_num_procs.
(#497. Thanks, Erik Schnetter.)
* Add ATLAS-style ?geadd function. (#509. Thanks, Martin Köhler.)
* Fix c/zsyr bug with negative incx. (#492.)
* Fix race condition during shutdown causing a crash in
gotoblas_set_affinity(). (#508. Thanks, Ton van den Heuvel.)
x86/x86-64:
* Support AMD Streamroller.
ARM:
* Add Cortex-A9 and Cortex-A15 targets.
====================================================================
Version 0.2.13
3-Dec-2014
common:
* Add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
for adding a prefix or suffix to all exported symbol names
in the shared library.(#459, Thanks Tony Kelman)
* Provide OpenBLASConfig.cmake at installation.
* Fix Fortran compiler detection on FreeBSD.
(#470, Thanks Mike Nolta)
x86/x86-64:
* Add generic kernel files for x86-64. make TARGET=GENERIC
* Fix a bug of sgemm kernel on Intel Sandy Bridge.
* Fix c_check bug on some amd64 systems. (#471, Thanks Mike Nolta)
ARM:
* Support APM's X-Gene 1 AArch64 processors.
Optimize trmm and sgemm. (#465, Thanks Dave Nuechterlein)
====================================================================
Version 0.2.12
13-Oct-2014
common:
* Added CBLAS interface for ?omatcopy and ?imatcopy.
* Enable ?gemm3m functions.
* Added benchmark for ?gemm3m.
* Optimized multithreading lower limits.
* Disabled SYMM3M and HEMM3M functions
because of segment violations.
x86/x86-64:
* Improved axpy and symv performance on AMD Bulldozer.
* Improved gemv performance on modern Intel and AMD CPUs.
====================================================================
Version 0.2.11
18-Aug-2014
common:
* Added some benchmark codes.

View File

@@ -9,10 +9,10 @@
If you want to allocate 64 large pages,
$shell> echo 0 > /pros/sys/vm/nr_hugepages # need to be reset
$shell> echo 65 > /pros/sys/vm/nr_hugepages # add 1 extra page
$shell> echo 3355443200 > /pros/sys/kernel/shmmax # just large number
$shell> echo 3355443200 > /pros/sys/kernel/shmall
$shell> echo 0 > /proc/sys/vm/nr_hugepages # need to be reset
$shell> echo 65 > /proc/sys/vm/nr_hugepages # add 1 extra page
$shell> echo 3355443200 > /proc/sys/kernel/shmmax # just large number
$shell> echo 3355443200 > /proc/sys/kernel/shmall
Also may add a few lines into /etc/security/limits.conf file.

View File

@@ -1,4 +1,4 @@
Copyright (c) 2011,2012 Lab of Parallel Software and Computational Science,ISCAS
Copyright (c) 2011-2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -12,9 +12,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

View File

@@ -20,6 +20,8 @@ ifneq ($(NO_LAPACK), 1)
SUBDIRS += lapack
endif
LAPACK_NOOPT := $(filter-out -O0 -O1 -O2 -O3 -Ofast,$(LAPACK_FFLAGS))
SUBDIRS_ALL = $(SUBDIRS) test ctest utest exports benchmark ../laswp ../bench
.PHONY : all libs netlib test ctest shared install
@@ -131,7 +133,7 @@ ifeq ($(CORE), UNKOWN)
$(error OpenBLAS: Detecting CPU failed. Please set TARGET explicitly, e.g. make TARGET=your_cpu_target. Please read README for the detail.)
endif
ifeq ($(NOFORTRAN), 1)
$(error OpenBLAS: Detecting fortran compiler failed. Please install fortran compiler, e.g. gfortran, ifort, openf90.)
$(info OpenBLAS: Detecting fortran compiler failed. Cannot compile LAPACK. Only compile BLAS.)
endif
ifeq ($(NO_STATIC), 1)
ifeq ($(NO_SHARED), 1)
@@ -231,7 +233,7 @@ ifndef NOFORTRAN
-@echo "FORTRAN = $(FC)" > $(NETLIB_LAPACK_DIR)/make.inc
-@echo "OPTS = $(LAPACK_FFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "POPTS = $(LAPACK_FPFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "NOOPT = $(LAPACK_FFLAGS) -O0" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "NOOPT = -O0 $(LAPACK_NOOPT)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "PNOOPT = $(LAPACK_FPFLAGS) -O0" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "LOADOPTS = $(FFLAGS) $(EXTRALIB)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "CC = $(CC)" >> $(NETLIB_LAPACK_DIR)/make.inc

View File

@@ -1,8 +1,23 @@
ifeq ($(CORE), ARMV7)
# ifeq logical or
ifeq ($(CORE), $(filter $(CORE),CORTEXA9 CORTEXA15))
ifeq ($(OSNAME), Android)
CCOMMON_OPT += -marm -mfpu=neon -mfloat-abi=hard -march=armv7-a
FCOMMON_OPT += -marm -mfpu=neon -mfloat-abi=hard -march=armv7-a
else
CCOMMON_OPT += -marm -mfpu=vfpv3 -mfloat-abi=hard -march=armv7-a
FCOMMON_OPT += -marm -mfpu=vfpv3 -mfloat-abi=hard -march=armv7-a
endif
endif
ifeq ($(CORE), ARMV7)
ifeq ($(OSNAME), Android)
CCOMMON_OPT += -marm -mfpu=neon -mfloat-abi=hard -march=armv7-a
FCOMMON_OPT += -marm -mfpu=neon -mfloat-abi=hard -march=armv7-a
else
CCOMMON_OPT += -marm -mfpu=vfpv3 -mfloat-abi=hard -march=armv7-a
FCOMMON_OPT += -marm -mfpu=vfpv3 -mfloat-abi=hard -march=armv7-a
endif
endif
ifeq ($(CORE), ARMV6)
CCOMMON_OPT += -marm -mfpu=vfp -mfloat-abi=hard -march=armv6

View File

@@ -9,6 +9,8 @@ OPENBLAS_INCLUDE_DIR := $(PREFIX)/include
OPENBLAS_LIBRARY_DIR := $(PREFIX)/lib
OPENBLAS_BINARY_DIR := $(PREFIX)/bin
OPENBLAS_BUILD_DIR := $(CURDIR)
OPENBLAS_CMAKE_DIR := $(OPENBLAS_LIBRARY_DIR)/cmake/openblas
OPENBLAS_CMAKE_CONFIG := OpenBLASConfig.cmake
.PHONY : install
.NOTPARALLEL : install
@@ -21,6 +23,7 @@ install : lib.grd
@-mkdir -p $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)
@-mkdir -p $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
@-mkdir -p $(DESTDIR)$(OPENBLAS_BINARY_DIR)
@-mkdir -p $(DESTDIR)$(OPENBLAS_CMAKE_DIR)
@echo Generating openblas_config.h in $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)
#for inc
@echo \#ifndef OPENBLAS_CONFIG_H > $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/openblas_config.h
@@ -43,11 +46,11 @@ ifndef NO_CBLAS
endif
ifndef NO_LAPACKE
@echo Copying LAPACKE header files to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
@-install -pDm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke.h
@-install -pDm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_config.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_config.h
@-install -pDm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_mangling_with_flags.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_mangling.h
@-install -pDm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_utils.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_utils.h
@echo Copying LAPACKE header files to $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)
@-install -pm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke.h
@-install -pm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_config.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_config.h
@-install -pm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_mangling_with_flags.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_mangling.h
@-install -pm644 $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_utils.h $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/lapacke_utils.h
endif
#for install static library
@@ -83,13 +86,31 @@ ifeq ($(OSNAME), Darwin)
ln -fs $(LIBDYNNAME) $(LIBPREFIX).dylib
endif
ifeq ($(OSNAME), WINNT)
@-cp $(LIBDLLNAME) $(OPENBLAS_BINARY_DIR)
@-cp $(LIBDLLNAME).a $(OPENBLAS_LIBRARY_DIR)
@-cp $(LIBDLLNAME) $(DESTDIR)$(OPENBLAS_BINARY_DIR)
@-cp $(LIBDLLNAME).a $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
endif
ifeq ($(OSNAME), CYGWIN_NT)
@-cp $(LIBDLLNAME) $(OPENBLAS_BINARY_DIR)
endif
endif
#Generating OpenBLASConfig.cmake
@echo Generating $(OPENBLAS_CMAKE_CONFIG) in $(DESTDIR)$(OPENBLAS_CMAKE_DIR)
@echo "SET(OpenBLAS_VERSION \"${VERSION}\")" > $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
@echo "SET(OpenBLAS_INCLUDE_DIRS ${OPENBLAS_INCLUDE_DIR})" >> $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
ifndef NO_SHARED
#ifeq logical or
ifeq ($(OSNAME), $(filter $(OSNAME),Linux FreeBSD NetBSD))
@echo "SET(OpenBLAS_LIBRARIES ${OPENBLAS_LIBRARY_DIR}/$(LIBPREFIX).so)" >> $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
endif
ifeq ($(OSNAME), $(filter $(OSNAME),WINNT CYGWIN_NT))
@echo "SET(OpenBLAS_LIBRARIES ${OPENBLAS_BINARY_DIR}/$(LIBDLLNAME))" >> $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
endif
ifeq ($(OSNAME), Darwin)
@echo "SET(OpenBLAS_LIBRARIES ${OPENBLAS_LIBRARY_DIR}/$(LIBPREFIX).dylib)" >> $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
endif
else
#only static
@echo "SET(OpenBLAS_LIBRARIES ${OPENBLAS_LIBRARY_DIR}/$(LIBPREFIX).$(LIBSUFFIX))" >> $(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)
endif
@echo Install OK!

View File

@@ -3,7 +3,7 @@
#
# This library's version
VERSION = 0.2.11
VERSION = 0.2.14
# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library
@@ -114,6 +114,9 @@ NO_AFFINITY = 1
# Support for IEEE quad precision(it's *real* REAL*16)( under testing)
# QUAD_PRECISION = 1
# Support for integer matrix and vector (e.g. iaxpy)
# INTEGER_PRECISION = 1
# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
@@ -159,6 +162,19 @@ COMMON_PROF = -pg
# Build Debug version
# DEBUG = 1
# Improve GEMV and GER for small matrices by stack allocation.
# For details, https://github.com/xianyi/OpenBLAS/pull/482
#
MAX_STACK_ALLOC=2048
# Add a prefix or suffix to all exported symbol names in the shared library.
# Avoid conflicts with other BLAS libraries, especially when using
# 64 bit integer interfaces in OpenBLAS.
# For details, https://github.com/xianyi/OpenBLAS/pull/459
#
# SYMBOLPREFIX=
# SYMBOLSUFFIX=
#
# End of user configuration
#

View File

@@ -23,6 +23,7 @@ CC = gcc
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Darwin)
CC = clang
# EXTRALIB += -Wl,-no_compact_unwind
endif
endif
@@ -61,6 +62,12 @@ endif
ifeq ($(TARGET), PILEDRIVER)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
ifeq ($(TARGET), STEAMROLLER)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
ifeq ($(TARGET), EXCAVATOR)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
endif
@@ -85,6 +92,12 @@ endif
ifeq ($(TARGET_CORE), PILEDRIVER)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
ifeq ($(TARGET_CORE), STEAMROLLER)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
ifeq ($(TARGET_CORE), EXCAVATOR)
GETARCH_FLAGS := -DFORCE_BARCELONA
endif
endif
@@ -186,13 +199,21 @@ LD = $(CROSS_SUFFIX)ld
RANLIB = $(CROSS_SUFFIX)ranlib
NM = $(CROSS_SUFFIX)nm
DLLWRAP = $(CROSS_SUFFIX)dllwrap
OBJCOPY = $(CROSS_SUFFIX)objcopy
OBJCONV = $(CROSS_SUFFIX)objconv
# For detect fortran failed, only build BLAS.
ifeq ($(NOFORTRAN), 1)
NO_LAPACK = 1
endif
#
# OS dependent settings
#
ifeq ($(OSNAME), Darwin)
export MACOSX_DEPLOYMENT_TARGET=10.2
export MACOSX_DEPLOYMENT_TARGET=10.6
MD5SUM = md5 -r
endif
@@ -288,6 +309,10 @@ CCOMMON_OPT += -DQUAD_PRECISION
NO_EXPRECISION = 1
endif
ifdef INTEGER_PRECISION
CCOMMON_OPT += -DINTEGER_PRECISION
endif
ifneq ($(ARCH), x86)
ifneq ($(ARCH), x86_64)
NO_EXPRECISION = 1
@@ -303,6 +328,10 @@ ifdef SANITY_CHECK
CCOMMON_OPT += -DSANITY_CHECK -DREFNAME=$(*F)f$(BU)
endif
ifdef MAX_STACK_ALLOC
CCOMMON_OPT += -DMAX_STACK_ALLOC=$(MAX_STACK_ALLOC)
endif
#
# Architecture dependent settings
#
@@ -339,7 +368,7 @@ FCOMMON_OPT += -m128bit-long-double
endif
ifeq ($(C_COMPILER), CLANG)
EXPRECISION = 1
CCOMMON_OPT += -DEXPRECISION
CCOMMON_OPT += -DEXPRECISION
FCOMMON_OPT += -m128bit-long-double
endif
endif
@@ -350,7 +379,14 @@ ifeq ($(C_COMPILER), INTEL)
CCOMMON_OPT += -wd981
endif
ifeq ($(USE_OPENMP), 1)
#check
ifeq ($(USE_THREAD), 0)
$(error OpenBLAS: Cannot set both USE_OPENMP=1 and USE_THREAD=0. The USE_THREAD=0 is only for building single thread version.)
endif
# ifeq logical or. GCC or LSB
ifeq ($(C_COMPILER), $(filter $(C_COMPILER),GCC LSB))
CCOMMON_OPT += -fopenmp
@@ -389,7 +425,7 @@ endif
ifeq ($(ARCH), x86_64)
DYNAMIC_CORE = PRESCOTT CORE2 PENRYN DUNNINGTON NEHALEM OPTERON OPTERON_SSE3 BARCELONA BOBCAT ATOM NANO
ifneq ($(NO_AVX), 1)
DYNAMIC_CORE += SANDYBRIDGE BULLDOZER PILEDRIVER
DYNAMIC_CORE += SANDYBRIDGE BULLDOZER PILEDRIVER STEAMROLLER EXCAVATOR
endif
ifneq ($(NO_AVX2), 1)
DYNAMIC_CORE += HASWELL
@@ -559,7 +595,7 @@ else
FCOMMON_OPT += -m32
endif
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -fopenmp
endif
endif
@@ -571,14 +607,14 @@ ifneq ($(INTERFACE64), 0)
FCOMMON_OPT += -i8
endif
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -openmp
endif
endif
ifeq ($(F_COMPILER), FUJITSU)
CCOMMON_OPT += -DF_INTERFACE_FUJITSU
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -openmp
endif
endif
@@ -596,7 +632,7 @@ endif
else
FCOMMON_OPT += -q32
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -openmp
endif
endif
@@ -614,7 +650,7 @@ FCOMMON_OPT += -tp p7-64
else
FCOMMON_OPT += -tp p7
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -mp
endif
endif
@@ -643,7 +679,7 @@ FCOMMON_OPT += -mabi=n32
endif
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -mp
endif
endif
@@ -680,7 +716,7 @@ FCOMMON_OPT += -m64
endif
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FEXTRALIB += -lstdc++
FCOMMON_OPT += -mp
endif
@@ -728,14 +764,14 @@ FCOMMON_OPT += -m32
else
FCOMMON_OPT += -m64
endif
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -xopenmp=parallel
endif
endif
ifeq ($(F_COMPILER), COMPAQ)
CCOMMON_OPT += -DF_INTERFACE_COMPAQ
ifdef USE_OPENMP
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -openmp
endif
endif
@@ -844,6 +880,14 @@ else
LIBPREFIX = libopenblas_$(LIBNAMESUFFIX)
endif
ifndef SYMBOLPREFIX
SYMBOLPREFIX =
endif
ifndef SYMBOLSUFFIX
SYMBOLSUFFIX =
endif
KERNELDIR = $(TOPDIR)/kernel/$(ARCH)
include $(TOPDIR)/Makefile.$(ARCH)

View File

@@ -4,6 +4,7 @@ QBLASOBJS_P = $(QBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
CBLASOBJS_P = $(CBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
ZBLASOBJS_P = $(ZBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
XBLASOBJS_P = $(XBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
IBLASOBJS_P = $(IBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
COMMONOBJS_P = $(COMMONOBJS:.$(SUFFIX)=.$(PSUFFIX))
@@ -22,12 +23,18 @@ BLASOBJS += $(QBLASOBJS) $(XBLASOBJS)
BLASOBJS_P += $(QBLASOBJS_P) $(XBLASOBJS_P)
endif
ifdef INTEGER_PRECISION
BLASOBJS += $(IBLASOBJS)
BLASOBJS_P += $(IBLASOBJS_P)
endif
$(SBLASOBJS) $(SBLASOBJS_P) : override CFLAGS += -UDOUBLE -UCOMPLEX
$(DBLASOBJS) $(DBLASOBJS_P) : override CFLAGS += -DDOUBLE -UCOMPLEX
$(QBLASOBJS) $(QBLASOBJS_P) : override CFLAGS += -DXDOUBLE -UCOMPLEX
$(CBLASOBJS) $(CBLASOBJS_P) : override CFLAGS += -UDOUBLE -DCOMPLEX
$(ZBLASOBJS) $(ZBLASOBJS_P) : override CFLAGS += -DDOUBLE -DCOMPLEX
$(XBLASOBJS) $(XBLASOBJS_P) : override CFLAGS += -DXDOUBLE -DCOMPLEX
$(IBLASOBJS) $(IBLASOBJS_P) : override CFLAGS += -DINTEGER -UCOMPLEX
$(SBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(DBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
@@ -35,6 +42,7 @@ $(QBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(CBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(ZBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(XBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(IBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
libs :: $(BLASOBJS) $(COMMONOBJS)
$(AR) $(ARFLAGS) -ru $(TOPDIR)/$(LIBNAME) $^

View File

@@ -55,16 +55,24 @@ Please read GotoBLAS_01Readme.txt
#### x86/x86-64:
- **Intel Xeon 56xx (Westmere)**: Used GotoBLAS2 Nehalem codes.
- **Intel Sandy Bridge**: Optimized Level-3 BLAS with AVX on x86-64.
- **Intel Haswell**: Optimized Level-3 BLAS with AVX on x86-64 (identical to Sandy Bridge).
- **Intel Sandy Bridge**: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
- **Intel Haswell**: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
- **AMD Bobcat**: Used GotoBLAS2 Barcelona codes.
- **AMD Bulldozer**: x86-64 S/DGEMM AVX kernels. (Thank Werner Saar)
- **AMD PILEDRIVER**: Used Bulldozer codes.
- **AMD Bulldozer**: x86-64 ?GEMM FMA4 kernels. (Thank Werner Saar)
- **AMD PILEDRIVER**: Uses Bulldozer codes with some optimizations.
- **AMD STEAMROLLER**: Uses Bulldozer codes with some optimizations.
#### MIPS64:
- **ICT Loongson 3A**: Optimized Level-3 BLAS and the part of Level-1,2.
- **ICT Loongson 3B**: Experimental
#### ARM:
- **ARMV6**: Optimized BLAS for vfpv2 and vfpv3-d16 ( e.g. BCM2835, Cortex M0+ )
- **ARMV7**: Optimized BLAS for vfpv3-d32 ( e.g. Cortex A8, A9 and A15 )
#### ARM64:
- **ARMV8**: Experimental
### Support OS:
- **GNU/Linux**
- **MingWin/Windows**: Please read <https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio>.
@@ -116,8 +124,8 @@ Please see Changelog.txt to obtain the differences between GotoBLAS2 1.13 BSD ve
* Please read [Faq](https://github.com/xianyi/OpenBLAS/wiki/Faq) at first.
* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
* The number of CPUs/Cores should less than or equal to 256.
* On Linux, OpenBLAS sets the processor affinity by default. This may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html). You can build the library with NO_AFFINITY=1.
* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
## Contributing

View File

@@ -19,6 +19,7 @@ PENRYN
DUNNINGTON
NEHALEM
SANDYBRIDGE
HASWELL
ATOM
b)AMD CPU:
@@ -30,6 +31,9 @@ SHANGHAI
ISTANBUL
BOBCAT
BULLDOZER
PILEDRIVER
STEAMROLLER
EXCAVATOR
c)VIA CPU:
SSE_GENERIC
@@ -59,3 +63,12 @@ ITANIUM2
SPARC
SPARCV7
6.ARM CPU:
CORTEXA15
CORTEXA9
ARMV7
ARMV6
ARMV5
7.ARM 64-bit CPU:
ARMV8

9
benchmark/Make_exe.sh Executable file
View File

@@ -0,0 +1,9 @@
#!/bin/bash
for f in *.goto *.acml *.mkl *.atlas
do
if [ -f "$f" ]; then
mv $f `echo $f|tr '.' '_'`.exe
fi
done

File diff suppressed because it is too large Load Diff

196
benchmark/asum.c Normal file
View File

@@ -0,0 +1,196 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef ASUM
#ifdef COMPLEX
#ifdef DOUBLE
#define ASUM BLASFUNC(dzasum)
#else
#define ASUM BLASFUNC(scasum)
#endif
#else
#ifdef DOUBLE
#define ASUM BLASFUNC(dasum)
#else
#define ASUM BLASFUNC(sasum)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
FLOAT result;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
result = ASUM (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
#ifdef COMPLEX
fprintf(stderr, " %10.2f MFlops\n", 4. * (double)m / timeg * 1.e-6);
#else
fprintf(stderr, " %10.2f MFlops\n", 2. * (double)m / timeg * 1.e-6);
#endif
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

201
benchmark/axpy.c Normal file
View File

@@ -0,0 +1,201 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef AXPY
#ifdef COMPLEX
#ifdef DOUBLE
#define AXPY BLASFUNC(zaxpy)
#else
#define AXPY BLASFUNC(caxpy)
#endif
#else
#ifdef DOUBLE
#define AXPY BLASFUNC(daxpy)
#else
#define AXPY BLASFUNC(saxpy)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT alpha[2] = { 2.0, 2.0 };
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
AXPY (&m, alpha, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -71,8 +71,14 @@ double fabs(double);
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
@@ -99,6 +105,7 @@ int gettimeofday(struct timeval *tv, void *tz){
#endif
static __inline double getmflops(int ratio, int m, double secs){
double mm = (double)m;
@@ -117,7 +124,7 @@ static __inline double getmflops(int ratio, int m, double secs){
}
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
#ifndef COMPLEX
char *trans[] = {"T", "N"};
@@ -273,4 +280,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

201
benchmark/copy.c Normal file
View File

@@ -0,0 +1,201 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef COPY
#ifdef COMPLEX
#ifdef DOUBLE
#define COPY BLASFUNC(zcopy)
#else
#define COPY BLASFUNC(ccopy)
#endif
#else
#ifdef DOUBLE
#define COPY BLASFUNC(dcopy)
#else
#define COPY BLASFUNC(scopy)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT alpha[2] = { 2.0, 2.0 };
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
COPY (&m, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MBytes\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

195
benchmark/dot.c Normal file
View File

@@ -0,0 +1,195 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef DOT
#ifdef DOUBLE
#define DOT BLASFUNC(ddot)
#else
#define DOT BLASFUNC(sdot)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT result;
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
result = DOT (&m, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -139,11 +139,12 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a,*vl,*vr,*wi,*wr,*work,*rwork;
FLOAT wkopt[4];
char job='V';
char jobr='N';
char *p;
blasint m, i, j, info,lwork;
@@ -202,9 +203,9 @@ int MAIN__(int argc, char *argv[]){
lwork = -1;
m=to;
#ifndef COMPLEX
GEEV (&job, &job, &m, a, &m, wr, wi, vl, &m, vr, &m, wkopt, &lwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, wi, vl, &m, vr, &m, wkopt, &lwork, &info);
#else
GEEV (&job, &job, &m, a, &m, wr, vl, &m, vr, &m, wkopt, &lwork,rwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, vl, &m, vr, &m, wkopt, &lwork,rwork, &info);
#endif
lwork = (blasint)wkopt[0];
@@ -226,16 +227,16 @@ int MAIN__(int argc, char *argv[]){
lwork = -1;
#ifndef COMPLEX
GEEV (&job, &job, &m, a, &m, wr, wi, vl, &m, vr, &m, wkopt, &lwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, wi, vl, &m, vr, &m, wkopt, &lwork, &info);
#else
GEEV (&job, &job, &m, a, &m, wr, vl, &m, vr, &m, wkopt, &lwork,rwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, vl, &m, vr, &m, wkopt, &lwork,rwork, &info);
#endif
lwork = (blasint)wkopt[0];
#ifndef COMPLEX
GEEV (&job, &job, &m, a, &m, wr, wi, vl, &m, vr, &m, work, &lwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, wi, vl, &m, vr, &m, work, &lwork, &info);
#else
GEEV (&job, &job, &m, a, &m, wr, vl, &m, vr, &m, work, &lwork,rwork, &info);
GEEV (&job, &jobr, &m, a, &m, wr, vl, &m, vr, &m, work, &lwork,rwork, &info);
#endif
gettimeofday( &stop, (struct timezone *)0);
@@ -257,4 +258,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,14 +118,15 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
FLOAT beta [] = {0.0, 0.0};
char trans='N';
blasint m, i, j;
blasint m, n, i, j;
int loops = 1;
int has_param_n=0;
int l;
char *p;
@@ -142,7 +143,9 @@ int MAIN__(int argc, char *argv[]){
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d\n", from, to, step);
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
fprintf(stderr, "From : %3d To : %3d Step=%d : Trans=%c\n", from, to, step, trans);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
@@ -160,51 +163,61 @@ int MAIN__(int argc, char *argv[]){
if ( p != NULL )
loops = atoi(p);
if ((p = getenv("OPENBLAS_PARAM_N"))) {
n = atoi(p);
has_param_n=1;
}
#ifdef linux
srandom(getpid());
#endif
for(j = 0; j < m; j++){
for(i = 0; i < to * COMPSIZE; i++){
a[i + j * to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
fprintf(stderr, " SIZE Flops\n");
fprintf(stderr, " SIZE Flops Time\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
if ( has_param_n == 1 && n <= m )
n=n;
else
n=m;
fprintf(stderr, " %6dx%d : ", (int)m, (int)n);
gettimeofday( &start, (struct timezone *)0);
for (l=0; l<loops; l++)
{
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
gettimeofday( &start, (struct timezone *)0);
GEMM (&trans, &trans, &m, &n, &m, alpha, a, &m, b, &m, beta, c, &m );
GEMM (&trans, &trans, &m, &m, &m, alpha, a, &m, b, &m, beta, c, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg /= loops;
timeg = time1/loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / timeg * 1.e-6);
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)n / timeg * 1.e-6, time1);
}
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

212
benchmark/gemm3m.c Normal file
View File

@@ -0,0 +1,212 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef GEMM
#ifndef COMPLEX
#ifdef DOUBLE
#define GEMM BLASFUNC(dgemm)
#else
#define GEMM BLASFUNC(sgemm)
#endif
#else
#ifdef DOUBLE
#define GEMM BLASFUNC(zgemm3m)
#else
#define GEMM BLASFUNC(cgemm3m)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
char trans='N';
blasint m, i, j;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
fprintf(stderr, "From : %3d To : %3d Step=%d : Trans=%c\n", from, to, step, trans);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( b = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( c = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
p = getenv("OPENBLAS_LOOPS");
if ( p != NULL )
loops = atoi(p);
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
gettimeofday( &start, (struct timezone *)0);
GEMM (&trans, &trans, &m, &m, &m, alpha, a, &m, b, &m, beta, c, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
@@ -128,6 +128,7 @@ int MAIN__(int argc, char *argv[]){
blasint inc_x=1,inc_y=1;
blasint n=0;
int has_param_n = 0;
int has_param_m = 0;
int loops = 1;
int l;
char *p;
@@ -145,29 +146,38 @@ int MAIN__(int argc, char *argv[]){
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
int tomax = to;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
if ((p = getenv("OPENBLAS_PARAM_N"))) {
n = atoi(p);
if ((n>0) && (n<=to)) has_param_n = 1;
if ((n>0)) has_param_n = 1;
if ( n > tomax ) tomax = n;
}
if ( has_param_n == 0 )
if ((p = getenv("OPENBLAS_PARAM_M"))) {
m = atoi(p);
if ((m>0)) has_param_m = 1;
if ( m > tomax ) tomax = m;
}
if ( has_param_n == 1 )
fprintf(stderr, "From : %3d To : %3d Step = %3d Trans = '%c' N = %d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,trans,n,inc_x,inc_y,loops);
else
fprintf(stderr, "From : %3d To : %3d Step = %3d Trans = '%c' Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,trans,inc_x,inc_y,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr, "From : %3d To : %3d Step = %3d Trans = '%c' Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,trans,inc_x,inc_y,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * tomax * tomax * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * tomax * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * tomax * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
@@ -177,53 +187,83 @@ int MAIN__(int argc, char *argv[]){
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
if (has_param_m == 0)
{
timeg=0;
for(m = from; m <= to; m += step)
{
timeg=0;
if ( has_param_n == 0 ) n = m;
fprintf(stderr, " %6dx%d : ", (int)m,(int)n);
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
if ( has_param_n == 0 ) n = m;
for (l=0; l<loops; l++)
{
fprintf(stderr, " %6dx%d : ", (int)m,(int)n);
for(i = 0; i < n * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(i = 0; i < n * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
GEMV (&trans, &m, &n, alpha, a, &m, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
for (l=0; l<loops; l++)
{
timeg /= loops;
for(i = 0; i < n * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
fprintf(stderr, " %10.2f MFlops\n", COMPSIZE * COMPSIZE * 2. * (double)m * (double)n / timeg * 1.e-6);
for(i = 0; i < n * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
}
}
else
{
GEMV (&trans, &m, &n, alpha, a, &m, x, &inc_x, beta, y, &inc_y );
for(n = from; n <= to; n += step)
{
timeg=0;
fprintf(stderr, " %6dx%d : ", (int)m,(int)n);
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
gettimeofday( &stop, (struct timezone *)0);
for (l=0; l<loops; l++)
{
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
for(i = 0; i < n * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
timeg += time1;
for(i = 0; i < n * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
GEMV (&trans, &m, &n, alpha, a, &m, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
}
timeg /= loops;
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)n / timeg * 1.e-6);
fprintf(stderr, " %10.2f MFlops\n", COMPSIZE * COMPSIZE * 2. * (double)m * (double)n / timeg * 1.e-6);
}
}
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -35,12 +35,19 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#undef GER
#ifdef COMPLEX
#ifdef DOUBLE
#define GER BLASFUNC(zgeru)
#else
#define GER BLASFUNC(cgeru)
#endif
#else
#ifdef DOUBLE
#define GER BLASFUNC(dger)
#else
#define GER BLASFUNC(sger)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
@@ -108,7 +115,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
@@ -214,5 +221,5 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

218
benchmark/gesv.c Normal file
View File

@@ -0,0 +1,218 @@
/*********************************************************************/
/* Copyright 2009, 2010 The University of Texas at Austin. */
/* All rights reserved. */
/* */
/* Redistribution and use in source and binary forms, with or */
/* without modification, are permitted provided that the following */
/* conditions are met: */
/* */
/* 1. Redistributions of source code must retain the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer. */
/* */
/* 2. Redistributions in binary form must reproduce the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer in the documentation and/or other materials */
/* provided with the distribution. */
/* */
/* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
/* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
/* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
/* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
/* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
/* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
/* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
/* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
/* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
/* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
/* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
/* POSSIBILITY OF SUCH DAMAGE. */
/* */
/* The views and conclusions contained in the software and */
/* documentation are those of the authors and should not be */
/* interpreted as representing official policies, either expressed */
/* or implied, of The University of Texas at Austin. */
/*********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
double fabs(double);
#undef GESV
#undef GETRS
#ifndef COMPLEX
#ifdef XDOUBLE
#define GESV BLASFUNC(qgesv)
#elif defined(DOUBLE)
#define GESV BLASFUNC(dgesv)
#else
#define GESV BLASFUNC(sgesv)
#endif
#else
#ifdef XDOUBLE
#define GESV BLASFUNC(xgesv)
#elif defined(DOUBLE)
#define GESV BLASFUNC(zgesv)
#else
#define GESV BLASFUNC(cgesv)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *b;
blasint *ipiv;
blasint m, i, j, info;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d\n", from, to, step);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( b = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( ipiv = (blasint *)malloc(sizeof(blasint) * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops Time\n");
for(m = from; m <= to; m += step){
fprintf(stderr, " %dx%d : ", (int)m, (int)m);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
b[i + j * m * COMPSIZE] = 0.0;
}
}
for (j = 0; j < m; ++j) {
for (i = 0; i < m * COMPSIZE; ++i) {
b[i] += a[i + j * m * COMPSIZE];
}
}
gettimeofday( &start, (struct timezone *)0);
GESV (&m, &m, a, &m, ipiv, b, &m, &info);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
fprintf(stderr,
"%10.2f MFlops %10.6f s\n",
COMPSIZE * COMPSIZE * (2. / 3. * (double)m * (double)m * (double)m + 2. * (double)m * (double)m * (double)m ) / (time1) * 1.e-6 , time1);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -137,7 +137,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a,*work;
FLOAT wkopt[4];
@@ -231,4 +231,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -107,7 +107,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -189,4 +189,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

208
benchmark/hemv.c Normal file
View File

@@ -0,0 +1,208 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef HEMV
#ifdef DOUBLE
#define HEMV BLASFUNC(zhemv)
#else
#define HEMV BLASFUNC(chemv)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
char uplo='L';
blasint m, i, j;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = '%c' Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,uplo,inc_x,inc_y,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6dx%d : ", (int)m,(int)m);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
HEMV (&uplo, &m, alpha, a, &m, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -106,7 +106,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -188,4 +188,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -108,7 +108,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -186,4 +186,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -137,7 +137,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b;
blasint *ipiv;
@@ -270,4 +270,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -52,6 +52,11 @@ C)
awk '/MFlops/ { print $3,int($9) }'|tail --lines=+2
;;
B)
# Copy Benchmark
awk '/MBytes/ { print $1,int($3) }'|tail --lines=+2
;;
*)
awk '/MFlops/ { print $1,int($3) }'|tail --lines=+2

View File

@@ -88,6 +88,10 @@ double fabs(double);
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
@@ -114,7 +118,7 @@ int gettimeofday(struct timeval *tv, void *tz){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
#ifndef COMPLEX
char *trans[] = {"T", "N"};
@@ -278,5 +282,5 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

202
benchmark/scal.c Normal file
View File

@@ -0,0 +1,202 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SCAL
#ifdef COMPLEX
#ifdef DOUBLE
#define SCAL BLASFUNC(zscal)
#else
#define SCAL BLASFUNC(cscal)
#endif
#else
#ifdef DOUBLE
#define SCAL BLASFUNC(dscal)
#else
#define SCAL BLASFUNC(sscal)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT alpha[2] = { 2.0, 2.0 };
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
SCAL (&m, alpha, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
#ifdef COMPLEX
fprintf(stderr, " %10.2f MFlops\n", 6. * (double)m / timeg * 1.e-6);
#else
fprintf(stderr, " %10.2f MFlops\n", 1. * (double)m / timeg * 1.e-6);
#endif
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_cgemm(N,l):
A = randn(N,N).astype('float32') + randn(N,N).astype('float32') * 1j;
B = randn(N,N).astype('float32') + randn(N,N).astype('float32') * 1j;
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 8*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_cgemm(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_cgemv(N,l):
A = randn(N,N).astype('float32') + randn(N,N).astype('float32') * 1j;
B = randn(N).astype('float32') + randn(N).astype('float32') * 1j;
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 8*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_cgemv(i,LOOPS)

View File

@@ -0,0 +1,58 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
from scipy.linalg.blas import daxpy
def run_daxpy(N,l):
x = randn(N).astype('float64')
y = randn(N).astype('float64')
start = time.time();
for i in range(0,l):
y = daxpy(x,y, a=2.0 )
end = time.time()
timediff = (end -start)
mflops = ( 2*N ) *l / timediff
mflops *= 1e-6
size = "%d" % (N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_daxpy(i,LOOPS)

56
benchmark/scripts/NUMPY/ddot.py Executable file
View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_ddot(N,l):
A = randn(N).astype('float64')
B = randn(N).astype('float64')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N ) *l / timediff
mflops *= 1e-6
size = "%d" % (N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_ddot(i,LOOPS)

55
benchmark/scripts/NUMPY/deig.py Executable file
View File

@@ -0,0 +1,55 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_deig(N,l):
A = randn(N,N).astype('float64')
start = time.time();
for i in range(0,l):
la,v = numpy.linalg.eig(A)
end = time.time()
timediff = (end -start)
mflops = ( 26.33 *N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_deig(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_dgemm(N,l):
A = randn(N,N).astype('float64')
B = randn(N,N).astype('float64')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_dgemm(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_dgemv(N,l):
A = randn(N,N).astype('float64')
B = randn(N).astype('float64')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_dgemv(i,LOOPS)

View File

@@ -0,0 +1,58 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
from scipy.linalg.lapack import dgesv
def run_dgesv(N,l):
a = randn(N,N).astype('float64')
b = randn(N,N).astype('float64')
start = time.time();
for i in range(0,l):
dgesv(a,b,1,1)
end = time.time()
timediff = (end -start)
mflops = ( 2.0/3.0 *N*N*N + 2.0*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_dgesv(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_dsolve(N,l):
A = randn(N,N).astype('float64')
B = randn(N,N).astype('float64')
start = time.time();
for i in range(0,l):
ref = numpy.linalg.solve(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2.0/3.0 *N*N*N + 2.0*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_dsolve(i,LOOPS)

56
benchmark/scripts/NUMPY/sdot.py Executable file
View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_sdot(N,l):
A = randn(N).astype('float32')
B = randn(N).astype('float32')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N ) *l / timediff
mflops *= 1e-6
size = "%d" % (N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_sdot(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_sgemm(N,l):
A = randn(N,N).astype('float32')
B = randn(N,N).astype('float32')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_sgemm(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_sgemv(N,l):
A = randn(N,N).astype('float32')
B = randn(N).astype('float32')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_sgemv(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_zgemm(N,l):
A = randn(N,N).astype('float64') + randn(N,N).astype('float64') * 1j;
B = randn(N,N).astype('float64') + randn(N,N).astype('float64') * 1j;
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 8*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_zgemm(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_zgemv(N,l):
A = randn(N,N).astype('float64') + randn(N,N).astype('float64') * 1j;
B = randn(N).astype('float64') + randn(N).astype('float64') * 1j;
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 8*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d" % (N,N)
print("%14s :\t%20f MFlops\t%20f sec" % (size,mflops,timediff))
if __name__ == "__main__":
N=128
NMAX=2048
NINC=128
LOOPS=1
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
for i in range (N,NMAX+NINC,NINC):
run_zgemv(i,LOOPS)

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = single(rand(n,n)) + single(rand(n,n)) * 1i;
B = single(rand(n,n)) + single(rand(n,n)) * 1i;
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 4.0 * 2.0*n*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = single(rand(n,n)) + single(rand(n,n)) * 1i;
B = single(rand(n,1)) + single(rand(n,1)) * 1i;
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 4.0 * 2.0*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

56
benchmark/scripts/OCTAVE/deig.m Executable file
View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n));
start = clock();
l=0;
while l < loops
[V,lambda] = eig(A);
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 26.33 *n*n*n ) *loops / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg );
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n));
B = double(rand(n,n));
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 2.0*n*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n));
B = double(rand(n,1));
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 2.0*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,59 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n));
B = double(rand(n,n));
start = clock();
l=0;
while l < loops
x = linsolve(A,B);
#x = A / B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
#r = norm(A*x - B)/norm(B)
mflops = ( 2.0/3.0 *n*n*n + 2.0*n*n*n ) *loops / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg );
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = single(rand(n,n));
B = single(rand(n,n));
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 2.0*n*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = single(rand(n,n));
B = single(rand(n,1));
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 2.0*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n)) + double(rand(n,n)) * 1i;
B = double(rand(n,n)) + double(rand(n,n)) * 1i;
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 4.0 * 2.0*n*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

View File

@@ -0,0 +1,56 @@
#!/usr/bin/octave --silent
nfrom = 128 ;
nto = 2048;
nstep = 128;
loops = 1;
arg_list = argv();
for i = 1:nargin
switch(i)
case 1
nfrom = str2num(arg_list{i});
case 2
nto = str2num(arg_list{i});
case 3
nstep = str2num(arg_list{i});
case 4
loops = str2num(arg_list{i});
endswitch
endfor
p = getenv("OPENBLAS_LOOPS");
if p
loops = str2num(p);
endif
printf("From %d To %d Step=%d Loops=%d\n",nfrom, nto, nstep, loops);
printf(" SIZE FLOPS TIME\n");
n = nfrom;
while n <= nto
A = double(rand(n,n)) + double(rand(n,n)) * 1i;
B = double(rand(n,1)) + double(rand(n,1)) * 1i;
start = clock();
l=0;
while l < loops
C = A * B;
l = l + 1;
endwhile
timeg = etime(clock(), start);
mflops = ( 4.0 * 2.0*n*n *loops ) / ( timeg * 1.0e6 );
st1 = sprintf("%dx%d : ", n,n);
printf("%20s %10.2f MFlops %10.6f sec\n", st1, mflops, timeg);
n = n + nstep;
endwhile

62
benchmark/scripts/R/deig.R Executable file
View File

@@ -0,0 +1,62 @@
#!/usr/bin/Rscript
argv <- commandArgs(trailingOnly = TRUE)
nfrom = 128
nto = 2048
nstep = 128
loops = 1
if ( length(argv) > 0 ) {
for ( z in 1:length(argv) ) {
if ( z == 1 ) {
nfrom <- as.numeric(argv[z])
} else if ( z==2 ) {
nto <- as.numeric(argv[z])
} else if ( z==3 ) {
nstep <- as.numeric(argv[z])
} else if ( z==4 ) {
loops <- as.numeric(argv[z])
}
}
}
p=Sys.getenv("OPENBLAS_LOOPS")
if ( p != "" ) {
loops <- as.numeric(p)
}
cat(sprintf("From %.0f To %.0f Step=%.0f Loops=%.0f\n",nfrom, nto, nstep, loops))
cat(sprintf(" SIZE Flops Time\n"))
n = nfrom
while ( n <= nto ) {
A <- matrix(runif(n*n), ncol = n, nrow = n, byrow = TRUE)
l = 1
start <- proc.time()[3]
while ( l <= loops ) {
ev <- eigen(A)
l = l + 1
}
end <- proc.time()[3]
timeg = end - start
mflops = (26.66 *n*n*n ) * loops / ( timeg * 1.0e6 )
st = sprintf("%.0fx%.0f :",n , n)
cat(sprintf("%20s %10.2f MFlops %10.6f sec\n", st, mflops, timeg))
n = n + nstep
}

63
benchmark/scripts/R/dgemm.R Executable file
View File

@@ -0,0 +1,63 @@
#!/usr/bin/Rscript
argv <- commandArgs(trailingOnly = TRUE)
nfrom = 128
nto = 2048
nstep = 128
loops = 1
if ( length(argv) > 0 ) {
for ( z in 1:length(argv) ) {
if ( z == 1 ) {
nfrom <- as.numeric(argv[z])
} else if ( z==2 ) {
nto <- as.numeric(argv[z])
} else if ( z==3 ) {
nstep <- as.numeric(argv[z])
} else if ( z==4 ) {
loops <- as.numeric(argv[z])
}
}
}
p=Sys.getenv("OPENBLAS_LOOPS")
if ( p != "" ) {
loops <- as.numeric(p)
}
cat(sprintf("From %.0f To %.0f Step=%.0f Loops=%.0f\n",nfrom, nto, nstep, loops))
cat(sprintf(" SIZE Flops Time\n"))
n = nfrom
while ( n <= nto ) {
A <- matrix(runif(n*n), ncol = n, nrow = n, byrow = TRUE)
B <- matrix(runif(n*n), ncol = n, nrow = n, byrow = TRUE)
l = 1
start <- proc.time()[3]
while ( l <= loops ) {
C <- A %*% B
l = l + 1
}
end <- proc.time()[3]
timeg = end - start
mflops = ( 2.0 *n*n*n ) * loops / ( timeg * 1.0e6 )
st = sprintf("%.0fx%.0f :",n , n)
cat(sprintf("%20s %10.2f MFlops %10.6f sec\n", st, mflops, timeg))
n = n + nstep
}

63
benchmark/scripts/R/dsolve.R Executable file
View File

@@ -0,0 +1,63 @@
#!/usr/bin/Rscript
argv <- commandArgs(trailingOnly = TRUE)
nfrom = 128
nto = 2048
nstep = 128
loops = 1
if ( length(argv) > 0 ) {
for ( z in 1:length(argv) ) {
if ( z == 1 ) {
nfrom <- as.numeric(argv[z])
} else if ( z==2 ) {
nto <- as.numeric(argv[z])
} else if ( z==3 ) {
nstep <- as.numeric(argv[z])
} else if ( z==4 ) {
loops <- as.numeric(argv[z])
}
}
}
p=Sys.getenv("OPENBLAS_LOOPS")
if ( p != "" ) {
loops <- as.numeric(p)
}
cat(sprintf("From %.0f To %.0f Step=%.0f Loops=%.0f\n",nfrom, nto, nstep, loops))
cat(sprintf(" SIZE Flops Time\n"))
n = nfrom
while ( n <= nto ) {
A <- matrix(runif(n*n), ncol = n, nrow = n, byrow = TRUE)
B <- matrix(runif(n*n), ncol = n, nrow = n, byrow = TRUE)
l = 1
start <- proc.time()[3]
while ( l <= loops ) {
solve(A,B)
l = l + 1
}
end <- proc.time()[3]
timeg = end - start
mflops = (2.0/3.0 *n*n*n + 2.0 *n*n*n ) * loops / ( timeg * 1.0e6 )
st = sprintf("%.0fx%.0f :",n , n)
cat(sprintf("%20s %10.2f MFlops %10.6f sec\n", st, mflops, timeg))
n = n + nstep
}

201
benchmark/swap.c Normal file
View File

@@ -0,0 +1,201 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above swapright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above swapright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE SWAPRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SWAP
#ifdef COMPLEX
#ifdef DOUBLE
#define SWAP BLASFUNC(zswap)
#else
#define SWAP BLASFUNC(cswap)
#endif
#else
#ifdef DOUBLE
#define SWAP BLASFUNC(dswap)
#else
#define SWAP BLASFUNC(sswap)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT alpha[2] = { 2.0, 2.0 };
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
SWAP (&m, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MBytes\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -200,4 +200,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
@@ -215,4 +215,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -200,4 +200,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *c;
FLOAT alpha[] = {1.0, 1.0};
@@ -196,4 +196,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

42
benchmark/tplot-header Normal file
View File

@@ -0,0 +1,42 @@
# **********************************************************************************
# Copyright (c) 2014, The OpenBLAS Project
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# 3. Neither the name of the OpenBLAS project nor the names of
# its contributors may be used to endorse or promote products
# derived from this software without specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
# USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# **********************************************************************************
set term x11 font sans;
set ylabel "MFlops";
set xlabel "Size";
set grid xtics;
set grid ytics;
set key left;
set timestamp "generated on %Y-%m-%d by `whoami`"
set title "Sgemv\nTRANS=T\nBulldozer"
plot '1-THREAD' smooth bezier, '2-THREADS' smooth bezier, '4-THREADS' smooth bezier;
set output "print.png";
show title;
show plot;
show output;

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b;
FLOAT alpha[] = {1.0, 1.0};
@@ -199,4 +199,4 @@ int MAIN__(int argc, char *argv[]){
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -118,7 +118,7 @@ static void *huge_malloc(BLASLONG size){
#endif
int MAIN__(int argc, char *argv[]){
int main(int argc, char *argv[]){
FLOAT *a, *b;
FLOAT alpha[] = {1.0, 1.0};
@@ -130,11 +130,21 @@ int MAIN__(int argc, char *argv[]){
char trans='N';
char diag ='U';
int l;
int loops = 1;
double timeg;
if ((p = getenv("OPENBLAS_SIDE"))) side=*p;
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
if ((p = getenv("OPENBLAS_DIAG"))) diag=*p;
p = getenv("OPENBLAS_LOOPS");
if ( p != NULL )
loops = atoi(p);
blasint m, i, j;
int from = 1;
@@ -150,7 +160,7 @@ int MAIN__(int argc, char *argv[]){
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Side = %c Uplo = %c Trans = %c Diag = %c\n", from, to, step,side,uplo,trans,diag);
fprintf(stderr, "From : %3d To : %3d Step = %3d Side = %c Uplo = %c Trans = %c Diag = %c Loops = %d\n", from, to, step,side,uplo,trans,diag,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
@@ -171,32 +181,39 @@ int MAIN__(int argc, char *argv[]){
for(m = from; m <= to; m += step)
{
fprintf(stderr, " %6d : ", (int)m);
timeg=0.0;
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
fprintf(stderr, " %6d : ", (int)m);
gettimeofday( &start, (struct timezone *)0);
for (l=0; l<loops; l++)
{
TRSM (&side, &uplo, &trans, &diag, &m, &m, alpha, a, &m, b, &m);
gettimeofday( &stop, (struct timezone *)0);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
gettimeofday( &start, (struct timezone *)0);
TRSM (&side, &uplo, &trans, &diag, &m, &m, alpha, a, &m, b, &m);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m * (double)m / time1 * 1.e-6);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
time1 = timeg/loops;
fprintf(stderr, " %10.2f MFlops\n", COMPSIZE * COMPSIZE * 1. * (double)m * (double)m * (double)m / time1 * 1.e-6);
}
return 0;
}
void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

196
benchmark/zdot-intel.c Normal file
View File

@@ -0,0 +1,196 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#define RETURN_BY_STACK 1
#include "common.h"
#undef DOT
#ifdef DOUBLE
#define DOT BLASFUNC(zdotu)
#else
#define DOT BLASFUNC(cdotu)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT _Complex result;
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
DOT (&result, &m, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

195
benchmark/zdot.c Normal file
View File

@@ -0,0 +1,195 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef DOT
#ifdef DOUBLE
#define DOT BLASFUNC(zdotu)
#else
#define DOT BLASFUNC(cdotu)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT _Complex result;
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
result = DOT (&m, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -3,6 +3,9 @@
# Checking cross compile
$hostos = `uname -s | sed -e s/\-.*//`; chop($hostos);
$hostarch = `uname -m | sed -e s/i.86/x86/`;chop($hostarch);
$hostarch = "x86_64" if ($hostarch eq "amd64");
$hostarch = "arm" if ($hostarch =~ /^arm.*/);
$hostarch = "arm64" if ($hostarch eq "aarch64");
$binary = $ENV{"BINARY"};
@@ -54,6 +57,7 @@ $os = osf if ($data =~ /OS_OSF/);
$os = WINNT if ($data =~ /OS_WINNT/);
$os = CYGWIN_NT if ($data =~ /OS_CYGWIN_NT/);
$os = Interix if ($data =~ /OS_INTERIX/);
$os = Android if ($data =~ /OS_ANDROID/);
$architecture = x86 if ($data =~ /ARCH_X86/);
$architecture = x86_64 if ($data =~ /ARCH_X86_64/);
@@ -80,6 +84,10 @@ if (($architecture eq "mips32") || ($architecture eq "mips64")) {
$defined = 1;
}
if (($architecture eq "arm") || ($architecture eq "arm64")) {
$defined = 1;
}
if ($architecture eq "alpha") {
$defined = 1;
$binary = 64;

39
cblas.h
View File

@@ -13,6 +13,12 @@ extern "C" {
void openblas_set_num_threads(int num_threads);
void goto_set_num_threads(int num_threads);
/*Get the number of threads on runtime.*/
int openblas_get_num_threads(void);
/*Get the number of physical processors (cores).*/
int openblas_get_num_procs(void);
/*Get the build configure on runtime.*/
char* openblas_get_config(void);
@@ -243,8 +249,13 @@ void cblas_dgemm(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLA
OPENBLAS_CONST double alpha, OPENBLAS_CONST double *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST double *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST double beta, double *C, OPENBLAS_CONST blasint ldc);
void cblas_cgemm(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransA, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransB, OPENBLAS_CONST blasint M, OPENBLAS_CONST blasint N, OPENBLAS_CONST blasint K,
OPENBLAS_CONST float *alpha, OPENBLAS_CONST float *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST float *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST float *beta, float *C, OPENBLAS_CONST blasint ldc);
void cblas_cgemm3m(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransA, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransB, OPENBLAS_CONST blasint M, OPENBLAS_CONST blasint N, OPENBLAS_CONST blasint K,
OPENBLAS_CONST float *alpha, OPENBLAS_CONST float *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST float *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST float *beta, float *C, OPENBLAS_CONST blasint ldc);
void cblas_zgemm(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransA, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransB, OPENBLAS_CONST blasint M, OPENBLAS_CONST blasint N, OPENBLAS_CONST blasint K,
OPENBLAS_CONST double *alpha, OPENBLAS_CONST double *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST double *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST double *beta, double *C, OPENBLAS_CONST blasint ldc);
void cblas_zgemm3m(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransA, OPENBLAS_CONST enum CBLAS_TRANSPOSE TransB, OPENBLAS_CONST blasint M, OPENBLAS_CONST blasint N, OPENBLAS_CONST blasint K,
OPENBLAS_CONST double *alpha, OPENBLAS_CONST double *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST double *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST double *beta, double *C, OPENBLAS_CONST blasint ldc);
void cblas_ssymm(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_SIDE Side, OPENBLAS_CONST enum CBLAS_UPLO Uplo, OPENBLAS_CONST blasint M, OPENBLAS_CONST blasint N,
OPENBLAS_CONST float alpha, OPENBLAS_CONST float *A, OPENBLAS_CONST blasint lda, OPENBLAS_CONST float *B, OPENBLAS_CONST blasint ldb, OPENBLAS_CONST float beta, float *C, OPENBLAS_CONST blasint ldc);
@@ -318,6 +329,34 @@ void cblas_caxpby(OPENBLAS_CONST blasint n, OPENBLAS_CONST float *alpha, OPENBLA
void cblas_zaxpby(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *alpha, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx,OPENBLAS_CONST double *beta, double *y, OPENBLAS_CONST blasint incy);
void cblas_somatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float calpha, OPENBLAS_CONST float *a,
OPENBLAS_CONST blasint clda, float *b, OPENBLAS_CONST blasint cldb);
void cblas_domatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double calpha, OPENBLAS_CONST double *a,
OPENBLAS_CONST blasint clda, double *b, OPENBLAS_CONST blasint cldb);
void cblas_comatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float* calpha, OPENBLAS_CONST float* a,
OPENBLAS_CONST blasint clda, float*b, OPENBLAS_CONST blasint cldb);
void cblas_zomatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double* calpha, OPENBLAS_CONST double* a,
OPENBLAS_CONST blasint clda, double *b, OPENBLAS_CONST blasint cldb);
void cblas_simatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float calpha, float *a,
OPENBLAS_CONST blasint clda, OPENBLAS_CONST blasint cldb);
void cblas_dimatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double calpha, double *a,
OPENBLAS_CONST blasint clda, OPENBLAS_CONST blasint cldb);
void cblas_cimatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float* calpha, float* a,
OPENBLAS_CONST blasint clda, OPENBLAS_CONST blasint cldb);
void cblas_zimatcopy(OPENBLAS_CONST enum CBLAS_ORDER CORDER, OPENBLAS_CONST enum CBLAS_TRANSPOSE CTRANS, OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double* calpha, double* a,
OPENBLAS_CONST blasint clda, OPENBLAS_CONST blasint cldb);
void cblas_sgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float calpha, float *a, OPENBLAS_CONST blasint clda, OPENBLAS_CONST float cbeta,
float *c, OPENBLAS_CONST blasint cldc);
void cblas_dgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double calpha, double *a, OPENBLAS_CONST blasint clda, OPENBLAS_CONST double cbeta,
double *c, OPENBLAS_CONST blasint cldc);
void cblas_cgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST float *calpha, float *a, OPENBLAS_CONST blasint clda, OPENBLAS_CONST float *cbeta,
float *c, OPENBLAS_CONST blasint cldc);
void cblas_zgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double *calpha, double *a, OPENBLAS_CONST blasint clda, OPENBLAS_CONST double *cbeta,
double *c, OPENBLAS_CONST blasint cldc);
#ifdef __cplusplus
}
#endif /* __cplusplus */

View File

@@ -13,6 +13,12 @@ extern "C" {
void openblas_set_num_threads(int num_threads);
void goto_set_num_threads(int num_threads);
/*Get the number of threads on runtime.*/
int openblas_get_num_threads(void);
/*Get the number of physical processors (cores).*/
int openblas_get_num_procs(void);
/*Get the build configure on runtime.*/
char* openblas_get_config(void);
@@ -231,8 +237,12 @@ void cblas_dgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA, enum CBLAS
double alpha, double *A, blasint lda, double *B, blasint ldb, double beta, double *C, blasint ldc);
void cblas_cgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB, blasint M, blasint N, blasint K,
float *alpha, float *A, blasint lda, float *B, blasint ldb, float *beta, float *C, blasint ldc);
void cblas_cgemm3m(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB, blasint M, blasint N, blasint K,
float *alpha, float *A, blasint lda, float *B, blasint ldb, float *beta, float *C, blasint ldc);
void cblas_zgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB, blasint M, blasint N, blasint K,
double *alpha, double *A, blasint lda, double *B, blasint ldb, double *beta, double *C, blasint ldc);
void cblas_zgemm3m(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB, blasint M, blasint N, blasint K,
double *alpha, double *A, blasint lda, double *B, blasint ldb, double *beta, double *C, blasint ldc);
void cblas_ssymm(enum CBLAS_ORDER Order, enum CBLAS_SIDE Side, enum CBLAS_UPLO Uplo, blasint M, blasint N,
float alpha, float *A, blasint lda, float *B, blasint ldb, float beta, float *C, blasint ldc);
@@ -306,6 +316,32 @@ void cblas_caxpby(blasint n, float *alpha, float *x, blasint incx,float *beta, f
void cblas_zaxpby(blasint n, double *alpha, double *x, blasint incx,double *beta, double *y, blasint incy);
void cblas_somatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, float calpha, float *a,
blasint clda, float *b, blasint cldb);
void cblas_domatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, double calpha, double *a,
blasint clda, double *b, blasint cldb);
void cblas_comatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, void* calpha, void* a,
blasint clda, void *b, blasint cldb);
void cblas_zomatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, void* calpha, void* a,
blasint clda, void *b, blasint cldb);
void cblas_simatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, float calpha, float *a,
blasint clda, blasint cldb);
void cblas_dimatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, double calpha, double *a,
blasint clda, blasint cldb);
void cblas_cimatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, float* calpha, float* a,
blasint clda, blasint cldb);
void cblas_zimatcopy( enum CBLAS_ORDER CORDER, enum CBLAS_TRANSPOSE CTRANS, blasint crows, blasint ccols, double* calpha, double* a,
blasint clda, blasint cldb);
void cblas_sgeadd( enum CBLAS_ORDER CORDER, blasint crows, blasint ccols, float calpha, float *a, blasint clda, float cbeta,
float *c, blasint cldc);
void cblas_dgeadd( enum CBLAS_ORDER CORDER, blasint crows, blasint ccols, double calpha, double *a, blasint clda, double cbeta,
double *c, blasint cldc);
void cblas_cgeadd( enum CBLAS_ORDER CORDER, blasint crows, blasint ccols, float *calpha, float *a, blasint clda, float *cbeta,
float *c, blasint cldc);
void cblas_zgeadd( enum CBLAS_ORDER CORDER, blasint crows, blasint ccols, double *calpha, double *a, blasint clda, double *cbeta,
double *c, blasint cldc);
#ifdef __cplusplus
}

View File

@@ -93,6 +93,10 @@ extern "C" {
#include <sched.h>
#endif
#ifdef OS_ANDROID
#define NO_SYSV_IPC
#endif
#ifdef OS_WINDOWS
#ifdef ATOM
#define GOTO_ATOM ATOM
@@ -106,7 +110,9 @@ extern "C" {
#endif
#else
#include <sys/mman.h>
#ifndef NO_SYSV_IPC
#include <sys/shm.h>
#endif
#include <sys/time.h>
#include <unistd.h>
#include <math.h>
@@ -270,6 +276,11 @@ typedef int blasint;
#define SIZE 8
#define BASE_SHIFT 3
#define ZBASE_SHIFT 4
#elif defined(INTEGER) //extend for integer matrix
#define FLOAT int
#define SIZE 4
#define BASE_SHIFT 2
#define ZBASE_SHIFT 3
#else
#define FLOAT float
#define SIZE 4
@@ -327,6 +338,14 @@ typedef int blasint;
#endif
#endif
/*
#ifdef STEAMROLLER
#ifndef YIELDING
#define YIELDING __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop;\n");
#endif
#endif
*/
#ifndef YIELDING
#define YIELDING sched_yield()
#endif
@@ -491,6 +510,8 @@ void blas_set_parameter(void);
int blas_get_cpu_number(void);
void *blas_memory_alloc (int);
void blas_memory_free (void *);
void *blas_memory_alloc_nolock (int); //use malloc without blas_lock
void blas_memory_free_nolock (void *);
int get_num_procs (void);

View File

@@ -1,5 +1,5 @@
/*****************************************************************************
Copyright (c) 2011, Lab of Parallel Software and Computational Science,ICSAS
Copyright (c) 2011-2015, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -13,9 +13,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
@@ -27,53 +28,23 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************************/
/*********************************************************************/
/* Copyright 2009, 2010 The University of Texas at Austin. */
/* All rights reserved. */
/* */
/* Redistribution and use in source and binary forms, with or */
/* without modification, are permitted provided that the following */
/* conditions are met: */
/* */
/* 1. Redistributions of source code must retain the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer. */
/* */
/* 2. Redistributions in binary form must reproduce the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer in the documentation and/or other materials */
/* provided with the distribution. */
/* */
/* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
/* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
/* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
/* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
/* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
/* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
/* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
/* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
/* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
/* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
/* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
/* POSSIBILITY OF SUCH DAMAGE. */
/* */
/* The views and conclusions contained in the software and */
/* documentation are those of the authors and should not be */
/* interpreted as representing official policies, either expressed */
/* or implied, of The University of Texas at Austin. */
/*********************************************************************/
#ifndef COMMON_ARM
#define COMMON_ARM
#if defined(ARMV5) || defined(ARMV6)
#define MB
#define WMB
#else
#define MB __asm__ __volatile__ ("dmb ish" : : : "memory")
#define WMB __asm__ __volatile__ ("dmb ishst" : : : "memory")
#endif
#define INLINE inline
#define RETURN_BY_COMPLEX
@@ -88,9 +59,12 @@ static void __inline blas_lock(volatile BLASULONG *address){
while (*address) {YIELDING;};
__asm__ __volatile__(
"1: \n\t"
"ldrex r2, [%1] \n\t"
"mov r2, #0 \n\t"
"strex r3, r2, [%1] \n\t"
"cmp r3, #0 \n\t"
"bne 1b \n\t"
"mov %0 , r3 \n\t"
: "=r"(ret), "=r"(address)
: "1"(address)

View File

@@ -1,5 +1,5 @@
/*****************************************************************************
Copyright (c) 2011, Lab of Parallel Software and Computational Science,ICSAS
Copyright (c) 2011-2015, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -13,9 +13,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
@@ -27,52 +28,14 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************************/
/*********************************************************************/
/* Copyright 2009, 2010 The University of Texas at Austin. */
/* All rights reserved. */
/* */
/* Redistribution and use in source and binary forms, with or */
/* without modification, are permitted provided that the following */
/* conditions are met: */
/* */
/* 1. Redistributions of source code must retain the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer. */
/* */
/* 2. Redistributions in binary form must reproduce the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer in the documentation and/or other materials */
/* provided with the distribution. */
/* */
/* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
/* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
/* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
/* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
/* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
/* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
/* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
/* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
/* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
/* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
/* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
/* POSSIBILITY OF SUCH DAMAGE. */
/* */
/* The views and conclusions contained in the software and */
/* documentation are those of the authors and should not be */
/* interpreted as representing official policies, either expressed */
/* or implied, of The University of Texas at Austin. */
/*********************************************************************/
#ifndef COMMON_ARM64
#define COMMON_ARM64
#define MB
#define WMB
#define MB __asm__ __volatile__ ("dmb ish" : : : "memory")
#define WMB __asm__ __volatile__ ("dmb ishst" : : : "memory")
#define INLINE inline
@@ -81,26 +44,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#ifndef ASSEMBLER
static void __inline blas_lock(volatile BLASULONG *address){
/*
int register ret;
int register tmp;
do {
while (*address) {YIELDING;};
__asm__ __volatile__(
"ldrex r2, [%1] \n\t"
"mov r2, #0 \n\t"
"strex r3, r2, [%1] \n\t"
"mov %0 , r3 \n\t"
: "=r"(ret), "=r"(address)
"1: \n\t"
"ldaxr %2, [%1] \n\t"
"mov %2, #0 \n\t"
"stlxr %w0, %2, [%1] \n\t"
"cbnz %w0, 1b \n\t"
"mov %0 , #0 \n\t"
: "=r"(ret), "=r"(address), "=r"(tmp)
: "1"(address)
: "memory", "r2" , "r3"
: "memory", "%w0"
//, "%r2" , "%r3"
);
} while (ret);
*/
}
@@ -119,9 +86,9 @@ static inline int blas_quickdivide(blasint x, blasint y){
}
#if defined(DOUBLE)
#define GET_IMAGE(res) __asm__ __volatile__("vstr.f64 d1, %0" : "=m"(res) : : "memory")
#define GET_IMAGE(res) __asm__ __volatile__("str d1, %0" : "=m"(res) : : "memory")
#else
#define GET_IMAGE(res) __asm__ __volatile__("vstr.f32 s1, %0" : "=m"(res) : : "memory")
#define GET_IMAGE(res) __asm__ __volatile__("str s1, %0" : "=m"(res) : : "memory")
#endif
#define GET_IMAGE_CANCEL
@@ -138,7 +105,6 @@ static inline int blas_quickdivide(blasint x, blasint y){
#if defined(ASSEMBLER) && !defined(NEEDPARAM)
#define PROLOGUE \
.arm ;\
.global REALNAME ;\
.func REALNAME ;\
REALNAME:
@@ -167,3 +133,4 @@ REALNAME:
#endif
#endif

View File

@@ -220,6 +220,7 @@
#define COMATCOPY_K_CTC comatcopy_k_ctc
#define COMATCOPY_K_RTC comatcopy_k_rtc
#define CGEADD_K cgeadd_k
#else
@@ -402,6 +403,7 @@
#define COMATCOPY_K_RNC gotoblas -> comatcopy_k_rnc
#define COMATCOPY_K_CTC gotoblas -> comatcopy_k_ctc
#define COMATCOPY_K_RTC gotoblas -> comatcopy_k_rtc
#define CGEADD_K gotoblas -> cgeadd_k
#endif

View File

@@ -149,6 +149,7 @@
#define DOMATCOPY_K_RN domatcopy_k_rn
#define DOMATCOPY_K_CT domatcopy_k_ct
#define DOMATCOPY_K_RT domatcopy_k_rt
#define DGEADD_K dgeadd_k
#else
@@ -267,6 +268,8 @@
#define DOMATCOPY_K_CT gotoblas -> domatcopy_k_ct
#define DOMATCOPY_K_RT gotoblas -> domatcopy_k_rt
#define DGEADD_K gotoblas -> dgeadd_k
#endif
#define DGEMM_NN dgemm_nn

9
common_i.h Normal file
View File

@@ -0,0 +1,9 @@
#ifndef COMMON_I_H
#define COMMON_I_H
#ifndef DYNAMIC_ARCH
#define IAXPYU_K iaxpy_k
#else
#error
#endif
#endif

View File

@@ -93,6 +93,7 @@ openblas_complex_xdouble BLASFUNC(xdotc) (blasint *, xdouble *, blasint *, xdo
void BLASFUNC(saxpy) (blasint *, float *, float *, blasint *, float *, blasint *);
void BLASFUNC(daxpy) (blasint *, double *, double *, blasint *, double *, blasint *);
void BLASFUNC(iaxpy) (blasint *, int *, int *, blasint *, int *, blasint *);
void BLASFUNC(qaxpy) (blasint *, xdouble *, xdouble *, blasint *, xdouble *, blasint *);
void BLASFUNC(caxpy) (blasint *, float *, float *, blasint *, float *, blasint *);
void BLASFUNC(zaxpy) (blasint *, double *, double *, blasint *, double *, blasint *);
@@ -754,6 +755,12 @@ void BLASFUNC(dimatcopy) (char *, char *, blasint *, blasint *, double *, do
void BLASFUNC(cimatcopy) (char *, char *, blasint *, blasint *, float *, float *, blasint *, blasint *);
void BLASFUNC(zimatcopy) (char *, char *, blasint *, blasint *, double *, double *, blasint *, blasint *);
void BLASFUNC(sgeadd) (blasint *, blasint *, float *, float *, blasint *, float *, float *, blasint*);
void BLASFUNC(dgeadd) (blasint *, blasint *, double *, double *, blasint *, double *, double *, blasint*);
void BLASFUNC(cgeadd) (blasint *, blasint *, float *, float *, blasint *, float *, float *, blasint*);
void BLASFUNC(zgeadd) (blasint *, blasint *, double *, double *, blasint *, double *, double *, blasint*);
#ifdef __cplusplus
}

View File

@@ -60,6 +60,8 @@ int daxpy_k (BLASLONG, BLASLONG, BLASLONG, double,
double *, BLASLONG, double *, BLASLONG, double *, BLASLONG);
int qaxpy_k (BLASLONG, BLASLONG, BLASLONG, xdouble,
xdouble *, BLASLONG, xdouble *, BLASLONG, xdouble *, BLASLONG);
int iaxpy_k (BLASLONG, BLASLONG, BLASLONG, int,
int *, BLASLONG, int *, BLASLONG, int *, BLASLONG);
int caxpy_k (BLASLONG, BLASLONG, BLASLONG, float, float,
float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int zaxpy_k (BLASLONG, BLASLONG, BLASLONG, double, double,

View File

@@ -1762,6 +1762,11 @@ int zomatcopy_k_rnc(BLASLONG, BLASLONG, double, double, double *, BLASLONG, dou
int zomatcopy_k_ctc(BLASLONG, BLASLONG, double, double, double *, BLASLONG, double *, BLASLONG);
int zomatcopy_k_rtc(BLASLONG, BLASLONG, double, double, double *, BLASLONG, double *, BLASLONG);
int sgeadd_k(BLASLONG, BLASLONG, float, float*, BLASLONG, float, float *, BLASLONG);
int dgeadd_k(BLASLONG, BLASLONG, double, double*, BLASLONG, double, double *, BLASLONG);
int cgeadd_k(BLASLONG, BLASLONG, float, float, float*, BLASLONG, float, float, float *, BLASLONG);
int zgeadd_k(BLASLONG, BLASLONG, double,double, double*, BLASLONG, double, double, double *, BLASLONG);
#ifdef __CUDACC__
}

View File

@@ -47,6 +47,10 @@
#include "common_z.h"
#include "common_x.h"
#ifdef INTEGER_PRECISION
#include "common_i.h"
#endif
#ifndef COMPLEX
#ifdef XDOUBLE
@@ -634,7 +638,10 @@
#define OMATCOPY_K_RN DOMATCOPY_K_RN
#define OMATCOPY_K_CT DOMATCOPY_K_CT
#define OMATCOPY_K_RT DOMATCOPY_K_RT
#define GEADD_K DGEADD_K
#elif defined(INTEGER)
#define AXPYU_K IAXPYU_K
#else
#define AMAX_K SAMAX_K
@@ -932,6 +939,7 @@
#define OMATCOPY_K_CT SOMATCOPY_K_CT
#define OMATCOPY_K_RT SOMATCOPY_K_RT
#define GEADD_K SGEADD_K
#endif
#else
#ifdef XDOUBLE
@@ -1746,6 +1754,7 @@
#define OMATCOPY_K_RNC ZOMATCOPY_K_RNC
#define OMATCOPY_K_CTC ZOMATCOPY_K_CTC
#define OMATCOPY_K_RTC ZOMATCOPY_K_RTC
#define GEADD_K ZGEADD_K
#else
@@ -2159,6 +2168,8 @@
#define OMATCOPY_K_CTC COMATCOPY_K_CTC
#define OMATCOPY_K_RTC COMATCOPY_K_RTC
#define GEADD_K CGEADD_K
#endif
#endif

View File

@@ -1,5 +1,5 @@
/*****************************************************************************
Copyright (c) 2011, Lab of Parallel Software and Computational Science,ICSAS
Copyright (c) 2011-2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -13,9 +13,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
@@ -27,7 +28,6 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************************/
/*********************************************************************/

View File

@@ -435,6 +435,9 @@ BLASLONG (*icamin_k)(BLASLONG, float *, BLASLONG);
int (*chemm_outcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*chemm_oltcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int cgemm3m_p, cgemm3m_q, cgemm3m_r;
int cgemm3m_unroll_m, cgemm3m_unroll_n, cgemm3m_unroll_mn;
int (*cgemm3m_kernel)(BLASLONG, BLASLONG, BLASLONG, float, float, float *, float *, float *, BLASLONG);
int (*cgemm3m_incopyb)(BLASLONG, BLASLONG, float *, BLASLONG, float *);
@@ -595,6 +598,9 @@ BLASLONG (*izamin_k)(BLASLONG, double *, BLASLONG);
int (*zhemm_outcopy)(BLASLONG, BLASLONG, double *, BLASLONG, BLASLONG, BLASLONG, double *);
int (*zhemm_oltcopy)(BLASLONG, BLASLONG, double *, BLASLONG, BLASLONG, BLASLONG, double *);
int zgemm3m_p, zgemm3m_q, zgemm3m_r;
int zgemm3m_unroll_m, zgemm3m_unroll_n, zgemm3m_unroll_mn;
int (*zgemm3m_kernel)(BLASLONG, BLASLONG, BLASLONG, double, double, double *, double *, double *, BLASLONG);
int (*zgemm3m_incopyb)(BLASLONG, BLASLONG, double *, BLASLONG, double *);
@@ -757,6 +763,9 @@ BLASLONG (*ixamin_k)(BLASLONG, xdouble *, BLASLONG);
int (*xhemm_outcopy)(BLASLONG, BLASLONG, xdouble *, BLASLONG, BLASLONG, BLASLONG, xdouble *);
int (*xhemm_oltcopy)(BLASLONG, BLASLONG, xdouble *, BLASLONG, BLASLONG, BLASLONG, xdouble *);
int xgemm3m_p, xgemm3m_q, xgemm3m_r;
int xgemm3m_unroll_m, xgemm3m_unroll_n, xgemm3m_unroll_mn;
int (*xgemm3m_kernel)(BLASLONG, BLASLONG, BLASLONG, xdouble, xdouble, xdouble *, xdouble *, xdouble *, BLASLONG);
int (*xgemm3m_incopyb)(BLASLONG, BLASLONG, xdouble *, BLASLONG, xdouble *);
@@ -846,6 +855,10 @@ BLASLONG (*ixamin_k)(BLASLONG, xdouble *, BLASLONG);
int (*zomatcopy_k_rnc) (BLASLONG, BLASLONG, double, double, double*, BLASLONG, double*, BLASLONG);
int (*zomatcopy_k_rtc) (BLASLONG, BLASLONG, double, double, double*, BLASLONG, double*, BLASLONG);
int (*sgeadd_k) (BLASLONG, BLASLONG, float, float *, BLASLONG, float, float *, BLASLONG);
int (*dgeadd_k) (BLASLONG, BLASLONG, double, double *, BLASLONG, double, double *, BLASLONG);
int (*cgeadd_k) (BLASLONG, BLASLONG, float, float, float *, BLASLONG, float, float, float *, BLASLONG);
int (*zgeadd_k) (BLASLONG, BLASLONG, float, double, double *, BLASLONG, double, double, double *, BLASLONG);
} gotoblas_t;
@@ -900,6 +913,27 @@ extern gotoblas_t *gotoblas;
#define XGEMM_UNROLL_N gotoblas -> xgemm_unroll_n
#define XGEMM_UNROLL_MN gotoblas -> xgemm_unroll_mn
#define CGEMM3M_P gotoblas -> cgemm3m_p
#define CGEMM3M_Q gotoblas -> cgemm3m_q
#define CGEMM3M_R gotoblas -> cgemm3m_r
#define CGEMM3M_UNROLL_M gotoblas -> cgemm3m_unroll_m
#define CGEMM3M_UNROLL_N gotoblas -> cgemm3m_unroll_n
#define CGEMM3M_UNROLL_MN gotoblas -> cgemm3m_unroll_mn
#define ZGEMM3M_P gotoblas -> zgemm3m_p
#define ZGEMM3M_Q gotoblas -> zgemm3m_q
#define ZGEMM3M_R gotoblas -> zgemm3m_r
#define ZGEMM3M_UNROLL_M gotoblas -> zgemm3m_unroll_m
#define ZGEMM3M_UNROLL_N gotoblas -> zgemm3m_unroll_n
#define ZGEMM3M_UNROLL_MN gotoblas -> zgemm3m_unroll_mn
#define XGEMM3M_P gotoblas -> xgemm3m_p
#define XGEMM3M_Q gotoblas -> xgemm3m_q
#define XGEMM3M_R gotoblas -> xgemm3m_r
#define XGEMM3M_UNROLL_M gotoblas -> xgemm3m_unroll_m
#define XGEMM3M_UNROLL_N gotoblas -> xgemm3m_unroll_n
#define XGEMM3M_UNROLL_MN gotoblas -> xgemm3m_unroll_mn
#else
#define DTB_ENTRIES DTB_DEFAULT_ENTRIES
@@ -972,6 +1006,55 @@ extern gotoblas_t *gotoblas;
#define XGEMM_UNROLL_N XGEMM_DEFAULT_UNROLL_N
#define XGEMM_UNROLL_MN MAX((XGEMM_UNROLL_M), (XGEMM_UNROLL_N))
#ifdef CGEMM3M_DEFAULT_UNROLL_N
#define CGEMM3M_P CGEMM3M_DEFAULT_P
#define CGEMM3M_Q CGEMM3M_DEFAULT_Q
#define CGEMM3M_R CGEMM3M_DEFAULT_R
#define CGEMM3M_UNROLL_M CGEMM3M_DEFAULT_UNROLL_M
#define CGEMM3M_UNROLL_N CGEMM3M_DEFAULT_UNROLL_N
#define CGEMM3M_UNROLL_MN MAX((CGEMM3M_UNROLL_M), (CGEMM3M_UNROLL_N))
#else
#define CGEMM3M_P SGEMM_DEFAULT_P
#define CGEMM3M_Q SGEMM_DEFAULT_Q
#define CGEMM3M_R SGEMM_DEFAULT_R
#define CGEMM3M_UNROLL_M SGEMM_DEFAULT_UNROLL_M
#define CGEMM3M_UNROLL_N SGEMM_DEFAULT_UNROLL_N
#define CGEMM3M_UNROLL_MN MAX((CGEMM_UNROLL_M), (CGEMM_UNROLL_N))
#endif
#ifdef ZGEMM3M_DEFAULT_UNROLL_N
#define ZGEMM3M_P ZGEMM3M_DEFAULT_P
#define ZGEMM3M_Q ZGEMM3M_DEFAULT_Q
#define ZGEMM3M_R ZGEMM3M_DEFAULT_R
#define ZGEMM3M_UNROLL_M ZGEMM3M_DEFAULT_UNROLL_M
#define ZGEMM3M_UNROLL_N ZGEMM3M_DEFAULT_UNROLL_N
#define ZGEMM3M_UNROLL_MN MAX((ZGEMM_UNROLL_M), (ZGEMM_UNROLL_N))
#else
#define ZGEMM3M_P DGEMM_DEFAULT_P
#define ZGEMM3M_Q DGEMM_DEFAULT_Q
#define ZGEMM3M_R DGEMM_DEFAULT_R
#define ZGEMM3M_UNROLL_M DGEMM_DEFAULT_UNROLL_M
#define ZGEMM3M_UNROLL_N DGEMM_DEFAULT_UNROLL_N
#define ZGEMM3M_UNROLL_MN MAX((ZGEMM_UNROLL_M), (ZGEMM_UNROLL_N))
#endif
#define XGEMM3M_P QGEMM_DEFAULT_P
#define XGEMM3M_Q QGEMM_DEFAULT_Q
#define XGEMM3M_R QGEMM_DEFAULT_R
#define XGEMM3M_UNROLL_M QGEMM_DEFAULT_UNROLL_M
#define XGEMM3M_UNROLL_N QGEMM_DEFAULT_UNROLL_N
#define XGEMM3M_UNROLL_MN MAX((QGEMM_UNROLL_M), (QGEMM_UNROLL_N))
#endif
#endif
@@ -1054,14 +1137,14 @@ extern gotoblas_t *gotoblas;
#endif
#ifdef XDOUBLE
#define GEMM3M_UNROLL_M QGEMM_UNROLL_M
#define GEMM3M_UNROLL_N QGEMM_UNROLL_N
#define GEMM3M_UNROLL_M XGEMM3M_UNROLL_M
#define GEMM3M_UNROLL_N XGEMM3M_UNROLL_N
#elif defined(DOUBLE)
#define GEMM3M_UNROLL_M DGEMM_UNROLL_M
#define GEMM3M_UNROLL_N DGEMM_UNROLL_N
#define GEMM3M_UNROLL_M ZGEMM3M_UNROLL_M
#define GEMM3M_UNROLL_N ZGEMM3M_UNROLL_N
#else
#define GEMM3M_UNROLL_M SGEMM_UNROLL_M
#define GEMM3M_UNROLL_N SGEMM_UNROLL_N
#define GEMM3M_UNROLL_M CGEMM3M_UNROLL_M
#define GEMM3M_UNROLL_N CGEMM3M_UNROLL_N
#endif
@@ -1123,31 +1206,31 @@ extern gotoblas_t *gotoblas;
#ifndef GEMM3M_P
#ifdef XDOUBLE
#define GEMM3M_P QGEMM_P
#define GEMM3M_P XGEMM3M_P
#elif defined(DOUBLE)
#define GEMM3M_P DGEMM_P
#define GEMM3M_P ZGEMM3M_P
#else
#define GEMM3M_P SGEMM_P
#define GEMM3M_P CGEMM3M_P
#endif
#endif
#ifndef GEMM3M_Q
#ifdef XDOUBLE
#define GEMM3M_Q QGEMM_Q
#define GEMM3M_Q XGEMM3M_Q
#elif defined(DOUBLE)
#define GEMM3M_Q DGEMM_Q
#define GEMM3M_Q ZGEMM3M_Q
#else
#define GEMM3M_Q SGEMM_Q
#define GEMM3M_Q CGEMM3M_Q
#endif
#endif
#ifndef GEMM3M_R
#ifdef XDOUBLE
#define GEMM3M_R QGEMM_R
#define GEMM3M_R XGEMM3M_R
#elif defined(DOUBLE)
#define GEMM3M_R DGEMM_R
#define GEMM3M_R ZGEMM3M_R
#else
#define GEMM3M_R SGEMM_R
#define GEMM3M_R CGEMM3M_R
#endif
#endif

View File

@@ -1,5 +1,5 @@
/*****************************************************************************
Copyright (c) 2011, Lab of Parallel Software and Computational Science,ICSAS
Copyright (c) 2011-2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -13,9 +13,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

View File

@@ -153,6 +153,7 @@
#define SOMATCOPY_K_CT somatcopy_k_ct
#define SOMATCOPY_K_RT somatcopy_k_rt
#define SGEADD_K sgeadd_k
#else
@@ -274,6 +275,7 @@
#define SOMATCOPY_K_CT gotoblas -> somatcopy_k_ct
#define SOMATCOPY_K_RT gotoblas -> somatcopy_k_rt
#define SGEADD_K gotoblas -> sgeadd_k
#endif

View File

@@ -65,6 +65,7 @@ extern int blas_omp_linked;
#define BLAS_XDOUBLE 0x0002U
#define BLAS_REAL 0x0000U
#define BLAS_COMPLEX 0x0004U
#define BLAS_INTEGER 0x0008U
#define BLAS_TRANSA 0x0030U /* 2bit */
#define BLAS_TRANSA_N 0x0000U

View File

@@ -171,7 +171,7 @@ static __inline int blas_quickdivide(unsigned int x, unsigned int y){
#define MMXSTORE movd
#endif
#if defined(PILEDRIVER) || defined(BULLDOZER)
#if defined(PILEDRIVER) || defined(BULLDOZER) || defined(STEAMROLLER) || defined(EXCAVATOR)
//Enable some optimazation for barcelona.
#define BARCELONA_OPTIMIZATION
#endif

View File

@@ -46,6 +46,7 @@
#define __volatile__
#endif
/*
#ifdef HAVE_SSE2
#define MB __asm__ __volatile__ ("mfence");
#define WMB __asm__ __volatile__ ("sfence");
@@ -53,6 +54,10 @@
#define MB
#define WMB
#endif
*/
#define MB
#define WMB
static void __inline blas_lock(volatile BLASULONG *address){
@@ -99,7 +104,9 @@ static __inline void cpuid(int op, int *eax, int *ebx, int *ecx, int *edx){
: "0" (op));
}
/*
#define WHEREAMI
*/
static inline int WhereAmI(void){
int eax, ebx, ecx, edx;
@@ -111,6 +118,7 @@ static inline int WhereAmI(void){
return apicid;
}
#ifdef CORE_BARCELONA
#define IFLUSH gotoblas_iflush()
#define IFLUSH_HALF gotoblas_iflush_half()
@@ -218,7 +226,7 @@ static __inline int blas_quickdivide(unsigned int x, unsigned int y){
#ifdef ASSEMBLER
#if defined(PILEDRIVER) || defined(BULLDOZER)
#if defined(PILEDRIVER) || defined(BULLDOZER) || defined(STEAMROLLER) || defined(EXCAVATOR)
//Enable some optimazation for barcelona.
#define BARCELONA_OPTIMIZATION
#endif

View File

@@ -220,6 +220,7 @@
#define ZOMATCOPY_K_CTC zomatcopy_k_ctc
#define ZOMATCOPY_K_RTC zomatcopy_k_rtc
#define ZGEADD_K zgeadd_k
#else
@@ -403,6 +404,8 @@
#define ZOMATCOPY_K_CTC gotoblas -> zomatcopy_k_ctc
#define ZOMATCOPY_K_RTC gotoblas -> zomatcopy_k_rtc
#define ZGEADD_K gotoblas -> zgeadd_k
#endif
#define ZGEMM_NN zgemm_nn

12
cpuid.h
View File

@@ -104,10 +104,12 @@
#define CORE_ATOM 18
#define CORE_NANO 19
#define CORE_SANDYBRIDGE 20
#define CORE_BOBCAT 21
#define CORE_BULLDOZER 22
#define CORE_BOBCAT 21
#define CORE_BULLDOZER 22
#define CORE_PILEDRIVER 23
#define CORE_HASWELL 24
#define CORE_HASWELL 24
#define CORE_STEAMROLLER 25
#define CORE_EXCAVATOR 26
#define HAVE_SSE (1 << 0)
#define HAVE_SSE2 (1 << 1)
@@ -200,6 +202,8 @@ typedef struct {
#define CPUTYPE_BOBCAT 45
#define CPUTYPE_BULLDOZER 46
#define CPUTYPE_PILEDRIVER 47
#define CPUTYPE_HASWELL 48
#define CPUTYPE_HASWELL 48
#define CPUTYPE_STEAMROLLER 49
#define CPUTYPE_EXCAVATOR 50
#endif

View File

@@ -30,16 +30,27 @@
#define CPU_UNKNOWN 0
#define CPU_ARMV6 1
#define CPU_ARMV7 2
#define CPU_CORTEXA15 3
#define CPU_CORTEXA9 3
#define CPU_CORTEXA15 4
static char *cpuname[] = {
"UNKOWN",
"ARMV6",
"ARMV7",
"CORTEXA9",
"CORTEXA15"
};
static char *cpuname_lower[] = {
"unknown",
"armv6",
"armv7",
"cortexa9",
"cortexa15"
};
int get_feature(char *search)
{
@@ -85,6 +96,29 @@ int detect(void)
char buffer[512], *p;
p = (char *) NULL ;
infile = fopen("/proc/cpuinfo", "r");
while (fgets(buffer, sizeof(buffer), infile))
{
if (!strncmp("CPU part", buffer, 8))
{
p = strchr(buffer, ':') + 2;
break;
}
}
fclose(infile);
if(p != NULL) {
if (strstr(p, "0xc09")) {
return CPU_CORTEXA9;
}
if (strstr(p, "0xc0f")) {
return CPU_CORTEXA15;
}
}
p = (char *) NULL ;
infile = fopen("/proc/cpuinfo", "r");
while (fgets(buffer, sizeof(buffer), infile))
@@ -142,21 +176,7 @@ void get_architecture(void)
void get_subarchitecture(void)
{
int d = detect();
switch (d)
{
case CPU_ARMV7:
printf("ARMV7");
break;
case CPU_ARMV6:
printf("ARMV6");
break;
default:
printf("UNKNOWN");
break;
}
printf("%s", cpuname[d]);
}
void get_subdirname(void)
@@ -170,6 +190,36 @@ void get_cpuconfig(void)
int d = detect();
switch (d)
{
case CPU_CORTEXA9:
printf("#define CORTEXA9\n");
printf("#define HAVE_VFP\n");
printf("#define HAVE_VFPV3\n");
if ( get_feature("neon")) printf("#define HAVE_NEON\n");
if ( get_feature("vfpv4")) printf("#define HAVE_VFPV4\n");
printf("#define L1_DATA_SIZE 32768\n");
printf("#define L1_DATA_LINESIZE 32\n");
printf("#define L2_SIZE 1048576\n");
printf("#define L2_LINESIZE 32\n");
printf("#define DTB_DEFAULT_ENTRIES 128\n");
printf("#define DTB_SIZE 4096\n");
printf("#define L2_ASSOCIATIVE 4\n");
break;
case CPU_CORTEXA15:
printf("#define CORTEXA15\n");
printf("#define HAVE_VFP\n");
printf("#define HAVE_VFPV3\n");
if ( get_feature("neon")) printf("#define HAVE_NEON\n");
if ( get_feature("vfpv4")) printf("#define HAVE_VFPV4\n");
printf("#define L1_DATA_SIZE 32768\n");
printf("#define L1_DATA_LINESIZE 32\n");
printf("#define L2_SIZE 1048576\n");
printf("#define L2_LINESIZE 32\n");
printf("#define DTB_DEFAULT_ENTRIES 128\n");
printf("#define DTB_SIZE 4096\n");
printf("#define L2_ASSOCIATIVE 4\n");
break;
case CPU_ARMV7:
printf("#define ARMV7\n");
@@ -206,18 +256,7 @@ void get_libname(void)
{
int d = detect();
switch (d)
{
case CPU_ARMV7:
printf("armv7\n");
break;
case CPU_ARMV6:
printf("armv6\n");
break;
}
printf("%s", cpuname_lower[d]);
}

217
cpuid_arm64.c Normal file
View File

@@ -0,0 +1,217 @@
/**************************************************************************
Copyright (c) 2013, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <string.h>
#define CPU_UNKNOWN 0
#define CPU_ARMV8 1
static char *cpuname[] = {
"UNKOWN",
"ARMV8"
};
int get_feature(char *search)
{
#ifdef linux
FILE *infile;
char buffer[2048], *p,*t;
p = (char *) NULL ;
infile = fopen("/proc/cpuinfo", "r");
while (fgets(buffer, sizeof(buffer), infile))
{
if (!strncmp("Features", buffer, 8))
{
p = strchr(buffer, ':') + 2;
break;
}
}
fclose(infile);
if( p == NULL ) return;
t = strtok(p," ");
while( t = strtok(NULL," "))
{
if (!strcmp(t, search)) { return(1); }
}
#endif
return(0);
}
int detect(void)
{
#ifdef linux
FILE *infile;
char buffer[512], *p;
p = (char *) NULL ;
infile = fopen("/proc/cpuinfo", "r");
while (fgets(buffer, sizeof(buffer), infile))
{
if ((!strncmp("model name", buffer, 10)) || (!strncmp("Processor", buffer, 9)))
{
p = strchr(buffer, ':') + 2;
break;
}
}
fclose(infile);
if(p != NULL)
{
if (strstr(p, "AArch64"))
{
return CPU_ARMV8;
}
}
#endif
return CPU_UNKNOWN;
}
char *get_corename(void)
{
return cpuname[detect()];
}
void get_architecture(void)
{
printf("ARM");
}
void get_subarchitecture(void)
{
int d = detect();
switch (d)
{
case CPU_ARMV8:
printf("ARMV8");
break;
default:
printf("UNKNOWN");
break;
}
}
void get_subdirname(void)
{
printf("arm64");
}
void get_cpuconfig(void)
{
int d = detect();
switch (d)
{
case CPU_ARMV8:
printf("#define ARMV8\n");
printf("#define L1_DATA_SIZE 32768\n");
printf("#define L1_DATA_LINESIZE 64\n");
printf("#define L2_SIZE 262144\n");
printf("#define L2_LINESIZE 64\n");
printf("#define DTB_DEFAULT_ENTRIES 64\n");
printf("#define DTB_SIZE 4096\n");
printf("#define L2_ASSOCIATIVE 4\n");
break;
}
}
void get_libname(void)
{
int d = detect();
switch (d)
{
case CPU_ARMV8:
printf("armv8\n");
break;
}
}
void get_features(void)
{
#ifdef linux
FILE *infile;
char buffer[2048], *p,*t;
p = (char *) NULL ;
infile = fopen("/proc/cpuinfo", "r");
while (fgets(buffer, sizeof(buffer), infile))
{
if (!strncmp("Features", buffer, 8))
{
p = strchr(buffer, ':') + 2;
break;
}
}
fclose(infile);
if( p == NULL ) return;
t = strtok(p," ");
while( t = strtok(NULL," "))
{
}
#endif
return;
}

View File

@@ -1,5 +1,5 @@
/*****************************************************************************
Copyright (c) 2011,2012 Lab of Parallel Software and Computational Science,ISCAS
Copyright (c) 2011-2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -13,9 +13,10 @@ met:
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the ISCAS nor the names of its contributors may
be used to endorse or promote products derived from this software
without specific prior written permission.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

View File

@@ -59,9 +59,16 @@
void cpuid(int op, int *eax, int *ebx, int *ecx, int *edx);
#else
static inline void cpuid(int op, int *eax, int *ebx, int *ecx, int *edx){
#if defined(__i386__) && defined(__PIC__)
__asm__ __volatile__
("mov %%ebx, %%edi;"
"cpuid;"
"xchgl %%ebx, %%edi;"
: "=a" (*eax), "=D" (*ebx), "=c" (*ecx), "=d" (*edx) : "a" (op) : "cc");
#else
__asm__ __volatile__
("cpuid": "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "a" (op) : "cc");
#endif
}
#endif
@@ -846,11 +853,24 @@ int get_cacheinfo(int type, cache_info_t *cacheinfo){
if (get_vendor() == VENDOR_INTEL) {
cpuid(0x80000000, &cpuid_level, &ebx, &ecx, &edx);
if (cpuid_level >= 0x80000006) {
cpuid(0x80000006, &eax, &ebx, &ecx, &edx);
if(L2.size<=0){
//If we didn't detect L2 correctly before,
cpuid(0x80000006, &eax, &ebx, &ecx, &edx);
L2.size = BITMASK(ecx, 16, 0xffff);
L2.associative = BITMASK(ecx, 12, 0x0f);
L2.linesize = BITMASK(ecx, 0, 0xff);
L2.size = BITMASK(ecx, 16, 0xffff);
L2.associative = BITMASK(ecx, 12, 0x0f);
switch (L2.associative){
case 0x06:
L2.associative = 8;
break;
case 0x08:
L2.associative = 16;
break;
}
L2.linesize = BITMASK(ecx, 0, 0xff);
}
}
}
@@ -909,10 +929,22 @@ int get_cacheinfo(int type, cache_info_t *cacheinfo){
if (L2ITB.associative == 0xff) L2ITB.associative = 0;
L2ITB.linesize = BITMASK(ebx, 0, 0xff);
L2.size = BITMASK(ecx, 16, 0xffff);
L2.associative = BITMASK(ecx, 12, 0xf);
if (L2.associative == 0xff) L2.associative = 0;
L2.linesize = BITMASK(ecx, 0, 0xff);
if(L2.size <= 0){
//If we didn't detect L2 correctly before,
L2.size = BITMASK(ecx, 16, 0xffff);
L2.associative = BITMASK(ecx, 12, 0xf);
switch (L2.associative){
case 0x06:
L2.associative = 8;
break;
case 0x08:
L2.associative = 16;
break;
}
if (L2.associative == 0xff) L2.associative = 0;
L2.linesize = BITMASK(ecx, 0, 0xff);
}
L3.size = BITMASK(edx, 18, 0x3fff) * 512;
L3.associative = BITMASK(edx, 12, 0xf);
@@ -1066,6 +1098,16 @@ int get_cpuname(void){
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
case 13:
//Broadwell
if(support_avx())
#ifndef NO_AVX2
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
@@ -1080,11 +1122,36 @@ int get_cpuname(void){
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
case 7:
case 15:
//Broadwell
if(support_avx())
#ifndef NO_AVX2
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
}
break;
case 5:
switch (model) {
case 6:
//Broadwell
if(support_avx())
#ifndef NO_AVX2
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
}
break;
}
break;
case 0x7:
@@ -1130,6 +1197,21 @@ int get_cpuname(void){
return CPUTYPE_PILEDRIVER;
else
return CPUTYPE_BARCELONA; //OS don't support AVX.
case 0:
switch(exmodel){
case 3:
if(support_avx())
return CPUTYPE_STEAMROLLER;
else
return CPUTYPE_BARCELONA; //OS don't support AVX.
case 6:
if(support_avx())
return CPUTYPE_EXCAVATOR;
else
return CPUTYPE_BARCELONA; //OS don't support AVX.
}
break;
}
break;
case 5:
@@ -1258,6 +1340,8 @@ static char *cpuname[] = {
"BULLDOZER",
"PILEDRIVER",
"HASWELL",
"STEAMROLLER",
"EXCAVATOR",
};
static char *lowercpuname[] = {
@@ -1309,6 +1393,8 @@ static char *lowercpuname[] = {
"bulldozer",
"piledriver",
"haswell",
"steamroller",
"excavator",
};
static char *corename[] = {
@@ -1337,6 +1423,8 @@ static char *corename[] = {
"BULLDOZER",
"PILEDRIVER",
"HASWELL",
"STEAMROLLER",
"EXCAVATOR",
};
static char *corename_lower[] = {
@@ -1365,6 +1453,8 @@ static char *corename_lower[] = {
"bulldozer",
"piledriver",
"haswell",
"steamroller",
"excavator",
};
@@ -1483,6 +1573,16 @@ int get_coretype(void){
return CORE_HASWELL;
#else
return CORE_SANDYBRIDGE;
#endif
else
return CORE_NEHALEM;
case 13:
//broadwell
if(support_avx())
#ifndef NO_AVX2
return CORE_HASWELL;
#else
return CORE_SANDYBRIDGE;
#endif
else
return CORE_NEHALEM;
@@ -1497,11 +1597,36 @@ int get_coretype(void){
return CORE_HASWELL;
#else
return CORE_SANDYBRIDGE;
#endif
else
return CORE_NEHALEM;
case 7:
case 15:
//broadwell
if(support_avx())
#ifndef NO_AVX2
return CORE_HASWELL;
#else
return CORE_SANDYBRIDGE;
#endif
else
return CORE_NEHALEM;
}
break;
case 5:
switch (model) {
case 6:
//broadwell
if(support_avx())
#ifndef NO_AVX2
return CORE_HASWELL;
#else
return CORE_SANDYBRIDGE;
#endif
else
return CORE_NEHALEM;
}
break;
}
break;
@@ -1530,7 +1655,25 @@ int get_coretype(void){
return CORE_PILEDRIVER;
else
return CORE_BARCELONA; //OS don't support AVX.
case 0:
switch(exmodel){
case 3:
if(support_avx())
return CORE_STEAMROLLER;
else
return CORE_BARCELONA; //OS don't support AVX.
case 6:
if(support_avx())
return CORE_EXCAVATOR;
else
return CORE_BARCELONA; //OS don't support AVX.
}
break;
}
}else return CORE_BARCELONA;
}
}

View File

@@ -44,6 +44,10 @@ COMPILER_DEC
COMPILER_GNU
#endif
#if defined(__ANDROID__)
OS_ANDROID
#endif
#if defined(__linux__)
OS_LINUX
#endif

View File

@@ -27,12 +27,18 @@ ctestl2o = c_cblas2.o c_c2chke.o auxiliary.o c_xerbla.o constant.o
ctestl3o = c_cblas3.o c_c3chke.o auxiliary.o c_xerbla.o constant.o
ctestl3o_3m = c_cblas3_3m.o c_c3chke_3m.o auxiliary.o c_xerbla.o constant.o
ztestl1o = c_zblas1.o
ztestl2o = c_zblas2.o c_z2chke.o auxiliary.o c_xerbla.o constant.o
ztestl3o = c_zblas3.o c_z3chke.o auxiliary.o c_xerbla.o constant.o
ztestl3o_3m = c_zblas3_3m.o c_z3chke_3m.o auxiliary.o c_xerbla.o constant.o
all :: all1 all2 all3
all1: xscblat1 xdcblat1 xccblat1 xzcblat1
@@ -74,6 +80,18 @@ else
OPENBLAS_NUM_THREADS=2 ./xzcblat3 < zin3
endif
all3_3m: xzcblat3_3m xccblat3_3m
ifeq ($(USE_OPENMP), 1)
OMP_NUM_THREADS=2 ./xccblat3_3m < cin3_3m
OMP_NUM_THREADS=2 ./xzcblat3_3m < zin3_3m
else
OPENBLAS_NUM_THREADS=2 ./xccblat3_3m < cin3_3m
OPENBLAS_NUM_THREADS=2 ./xzcblat3_3m < zin3_3m
endif
clean ::
rm -f x*
@@ -103,6 +121,9 @@ xccblat2: $(ctestl2o) c_cblat2.o $(TOPDIR)/$(LIBNAME)
xccblat3: $(ctestl3o) c_cblat3.o $(TOPDIR)/$(LIBNAME)
$(FC) $(FLDFLAGS) -o xccblat3 c_cblat3.o $(ctestl3o) $(LIB) $(EXTRALIB) $(CEXTRALIB)
xccblat3_3m: $(ctestl3o_3m) c_cblat3_3m.o $(TOPDIR)/$(LIBNAME)
$(FC) $(FLDFLAGS) -o xccblat3_3m c_cblat3_3m.o $(ctestl3o_3m) $(LIB) $(EXTRALIB) $(CEXTRALIB)
# Double complex
xzcblat1: $(ztestl1o) c_zblat1.o $(TOPDIR)/$(LIBNAME)
$(FC) $(FLDFLAGS) -o xzcblat1 c_zblat1.o $(ztestl1o) $(LIB) $(EXTRALIB) $(CEXTRALIB)
@@ -111,4 +132,9 @@ xzcblat2: $(ztestl2o) c_zblat2.o $(TOPDIR)/$(LIBNAME)
xzcblat3: $(ztestl3o) c_zblat3.o $(TOPDIR)/$(LIBNAME)
$(FC) $(FLDFLAGS) -o xzcblat3 c_zblat3.o $(ztestl3o) $(LIB) $(EXTRALIB) $(CEXTRALIB)
xzcblat3_3m: $(ztestl3o_3m) c_zblat3_3m.o $(TOPDIR)/$(LIBNAME)
$(FC) $(FLDFLAGS) -o xzcblat3_3m c_zblat3_3m.o $(ztestl3o_3m) $(LIB) $(EXTRALIB) $(CEXTRALIB)
include $(TOPDIR)/Makefile.tail

View File

@@ -45,8 +45,10 @@ void F77_c3chke(char * rout) {
F77_xerbla(cblas_rout,&cblas_info);
}
if (strncmp( sf,"cblas_cgemm" ,11)==0) {
cblas_rout = "cblas_cgemm" ;
cblas_rout = "cblas_cgemm" ;
cblas_info = 1;
cblas_cgemm( INVALID, CblasNoTrans, CblasNoTrans, 0, 0, 0,

Some files were not shown because too many files have changed in this diff Show More