Compare commits

...

692 Commits

Author SHA1 Message Date
Zhang Xianyi
77460ac255 Fix gemm_batch bug for SMALL_MATRIX_OPT=1. 2020-12-14 09:50:47 +08:00
Zhang Xianyi
88e6806e3f Init cblas_?gemm_batch implementation. 2020-12-14 09:50:32 +08:00
Xianyi Zhang
4130d1732e Refs #2587 fix small matrix c/zgemm bug. 2020-08-28 22:36:36 +08:00
Xianyi Zhang
255b6dd0fa Merge branch 'develop' into small_matrices 2020-08-28 21:38:58 +08:00
Xianyi Zhang
741d6c5cb8 Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2020-08-28 21:00:54 +08:00
Martin Kroeker
514a3d7d63 Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler
085aae8bdb Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Xianyi Zhang
712ca43069 Change a1b0 gemm to b0 gemm. 2020-08-28 07:55:27 +08:00
Martin Kroeker
5c6c2cd4f6 Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker
e54be4ba1c Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj
48a1364e10 Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing
0c1c903f1e Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-08-24 02:45:54 +08:00
Martin Kroeker
a073fa870e Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker
b2053239fc Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Martin Kroeker
b11bb6e728 Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker
1840bc5b52 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:55:18 +02:00
Martin Kroeker
7c0977c267 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker
fb3d80c42a Merge pull request #78 from xianyi/develop
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker
9ee21a0a39 Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker
bd3207b4b4 Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker
b8ebfc9335 Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker
7c1986640b fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker
71d33c952d Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker
6a3c074786 -march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker
430f741b30 -march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Martin Kroeker
6f4dc7445d Fix typo 2020-08-19 16:36:55 +02:00
Martin Kroeker
81fbe8d088 -march=cooperlake only available in gcc >= 10 2020-08-19 16:10:15 +02:00
Martin Kroeker
bb9cf766f5 make march=cooperlake option conditional on gcc >= 10.1 2020-08-19 15:06:30 +02:00
Martin Kroeker
75eeb265d7 [WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
2c72972570 Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel
6b731d917f Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker
5dcf47cd97 Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker
aa286e301b Add typedef for bfloat16 if needed 2020-08-17 15:32:14 +02:00
Martin Kroeker
9f0ef9cdfc Merge pull request #77 from xianyi/develop
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker
6bfc66663c revert 2020-08-17 15:20:41 +02:00
Martin Kroeker
a8c6fb9e1c revert 2020-08-17 15:20:16 +02:00
Martin Kroeker
5ec8f716cf revert 2020-08-17 15:19:40 +02:00
Martin Kroeker
82f8a0aeba Update .drone.yml 2020-08-15 15:46:18 +02:00
Martin Kroeker
d57d503c15 Update Makefile 2020-08-15 14:46:26 +02:00
Martin Kroeker
37ac23e8a3 Add simple MT sgemm precision test and INTERFACE64 build 2020-08-15 13:38:05 +02:00
Martin Kroeker
6a93e3b2ba Add simple sgemm preicsion test 2020-08-15 13:33:52 +02:00
Martin Kroeker
47ce1dd08f Update gemm64.cpp 2020-08-15 13:31:28 +02:00
Martin Kroeker
f5fcc5baec Add trivial gemm test for multithread consistency 2020-08-15 13:30:29 +02:00
Chen, Guobing
e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
efdd237a91 Add a dedicated POWER9 build to the Travis CI (#2774)
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker
4573cb2f43 Merge pull request #2765 from martin-frbg/issue2760
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker
2a4bb797db Merge pull request #2773 from martin-frbg/issue2770
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker
cbbe38bb88 Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker
619343278d Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0 2020-08-11 13:40:40 +02:00
Martin Kroeker
fee361ae64 fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Martin Kroeker
62f4c84f27 Merge pull request #76 from xianyi/develop
rebase
2020-08-11 13:25:12 +02:00
Marius Hillenbrand
e115c97e05 s390x/SGEMM: adjust default P and Q to multiples of M
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:46 +02:00
Marius Hillenbrand
07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14
For small register blockings that are too small to fill up vector
registers with column vectors, we currently use a generic code block.
Replace that with instantiations of the generic code as individual
functions, so that the compiler can optimize each one separately.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:39 +02:00
Marius Hillenbrand
e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.

Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:55:42 +02:00
Martin Kroeker
7219c9cb87 Merge pull request #2764 from martin-frbg/lapacktests
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker
c9d32674ea Add memory barrier to the blas_lock implementation for Linux
as recommended by cparrott73 in #2760
2020-08-09 19:17:04 +02:00
Martin Kroeker
64259d521a Fix use of unallocated array in workspace query and wrong type of argument to xSCAL 2020-08-09 13:02:27 +02:00
Martin Kroeker
6f5ca44c1a Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3 2020-08-09 12:59:20 +02:00
Martin Kroeker
d28b3f2776 Create Jenkinsfile for OSUOSL PowerCI 2020-08-08 18:05:20 +02:00
Martin Kroeker
ba3f7b3acf Merge pull request #2761 from RajalakshmiSR/Makefile_err
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00
Rajalakshmi Srinivasaraghavan
475b5c95b9 Remove extra symbol in Makefile
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker
cd60080d4a Merge pull request #2758 from martin-frbg/undef_shift
Fix GCC ubsan warnings in x86_64 complex dot and gemv_t kernels
2020-08-03 23:30:26 +02:00
Martin Kroeker
4847bfdddd Merge pull request #2757 from martin-frbg/cmake64
Fix lapack-tests linking to a suffixed libopenblas in cmake builds
2020-08-02 23:05:21 +02:00
Martin Kroeker
81dcfdcf39 Multiply by 2 instead of left-shifting a potentially negative number
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2 Multiply instead of doing a left shift of a potentially negative number
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb Multiply by two instead of left-shifting one place
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c Multiply by two rather than left shift by one place
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Martin Kroeker
aaf1a17168 Apply current library name suffix 2020-08-02 17:58:33 +02:00
Martin Kroeker
53add6a80d Apply library name suffix to openblas if any 2020-08-02 17:57:12 +02:00
Martin Kroeker
9eb897cc01 Merge pull request #75 from xianyi/develop
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker
7cead56258 Merge pull request #2753 from martin-frbg/issue2751
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker
6794ac3415 Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed 2020-08-02 11:20:08 +02:00
Martin Kroeker
ecf4b9e0fc Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition 2020-08-01 17:06:03 +02:00
Martin Kroeker
dfe5d09641 Merge pull request #2756 from martin-frbg/issue2755
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker
60cd5e55fc Protect against inadvertent activation of USE_CUDA 2020-08-01 12:31:39 +02:00
Martin Kroeker
da9e2a7ada Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes 2020-07-31 16:03:33 +02:00
Martin Kroeker
c88cbc5e0d Merge pull request #2752 from kadler/cpuid_aix
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00
Kevin Adler
589c74aed3 Use systemcfg APIs for CPU detection on AIX
AIX libc already provides ready access to an integer that contains a bit
identifying the CPU it's running on, so there's no need to call a
program and grep its output. Additionally, prtconf is not available in
the PASE runtime, which provides an AIX emulation layer on the IBM i
operating system.

The AIX systemcfg.h also provides macro definitions like POWER_8,
POWER_9, etc for all the bits defining the CPUs as well as macros like
__power_8(), __power_9_andup() that return booleans, but I did not use
them. Since these macros depend on the level of the OS in which it is
built, they may not be defined and instead the associated hex literals
are used directly.
2020-07-30 21:06:40 -05:00
Martin Kroeker
104aa678b0 Fix inadvertent version number reversal to 0.3.9.dev caused by #2710 2020-07-30 11:40:52 +02:00
Martin Kroeker
c6b48e0394 Merge pull request #2749 from martin-frbg/make_ppc
Reorganize OpenMP build options for POWER and allow compiling for POWER9 with old gcc
2020-07-30 11:35:53 +02:00
Martin Kroeker
4927251298 Merge pull request #2750 from RajalakshmiSR/dgemv_p10
dgemv optimization for POWER10
2020-07-30 10:13:19 +02:00
Rajalakshmi Srinivasaraghavan
f77b6a83f4 dgemv optimization for POWER10
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1.  Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Martin Kroeker
39724e8128 Separate OpenMP handling and allow compilation of Power9 code with older gcc 2020-07-30 01:14:08 +02:00
Martin Kroeker
525db5401c Merge pull request #74 from xianyi/develop
rebase
2020-07-30 01:04:09 +02:00
Martin Kroeker
cb097beba2 Merge pull request #2741 from martin-frbg/issue2739
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker
7c02f4b1f7 Merge pull request #2744 from martin-frbg/issue2738
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker
383262035d Merge pull request #2740 from RajalakshmiSR/clang-power
Fix compilation issues with clang on POWER
2020-07-28 18:15:25 +02:00
Martin Kroeker
5fa581c87e Put hint to use git develop rather than master branch in README 2020-07-28 14:22:41 +00:00
Martin Kroeker
12918358aa Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Martin Kroeker
200f5c44cc Add AMD Renoir models and preliminary support for ZEN3 as ZEN2
also remap erroneous family 16 entry to BOBCAT and reclaim erroneous family 25 "Barcelona" for Zen3
2020-07-28 13:45:23 +00:00
Martin Kroeker
64e2e4aaf3 missing braces 2020-07-27 20:19:22 +00:00
Martin Kroeker
921ec4e9e2 Adjust A53 SGEMM parameters to reflect move to 8x8 kernel 2020-07-27 19:54:46 +00:00
Rajalakshmi Srinivasaraghavan
d557584b71 Fix compilation issues with clang on POWER
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Martin Kroeker
a4ceb1ade9 Merge pull request #2737 from ashwinyes/add_thunderx3_target
ARM64: Add THUNDERX3T110 Target
2020-07-27 15:19:47 +02:00
Ashwin Sekhar T K
4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker
49b83e00b7 Merge pull request #2735 from martin-frbg/move_potrf
Move potrf_parallel.c from lapack/getrf to lapack/potrf where it belongs
2020-07-26 19:54:11 +02:00
Martin Kroeker
769ed9ffad Merge pull request #2734 from RajalakshmiSR/p10_fix
Fix to store results in correct order for POWER10 GEMM kernels
2020-07-25 09:02:32 +02:00
Martin Kroeker
f194ad59e1 Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99 Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Rajalakshmi Srinivasaraghavan
9be2688c78 Fix to store results in correct order for POWER10 GEMM kernels
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker
6a2a60038c Merge pull request #2720 from martin-frbg/issue2694
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker
251a09ec90 Typo fix 2020-07-24 16:04:58 +00:00
Martin Kroeker
95d37e1575 Regroup the 32 and 64bit sections and restore 64bit CAXPY 2020-07-24 10:13:46 +00:00
Martin Kroeker
3523bb778e Merge pull request #2721 from martin-frbg/p8align
Fix alignment errors in the power8 saxpy kernel
2020-07-24 11:06:20 +02:00
Martin Kroeker
a50d0e29c8 Merge pull request #2731 from martin-frbg/pgippc
Fixes for compilation on POWER with PGI compilers
2020-07-24 11:05:16 +02:00
Martin Kroeker
bf1f0734ff Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 2020-07-23 20:40:13 +00:00
Martin Kroeker
ca3561cab9 Add ifdefs around call to altivec microkernel 2020-07-23 18:30:42 +00:00
Martin Kroeker
21072e502a Typo fix 2020-07-23 17:34:56 +00:00
Martin Kroeker
7c6e56b5df Rewrite assignment to complex for better portability 2020-07-23 17:10:59 +02:00
Martin Kroeker
661c6bfa5a Exclude altivec code paths if the compiler does not support them 2020-07-23 17:08:20 +02:00
Martin Kroeker
9796e552ea Avoid undefining NAME,CNAME etc for pgcc as it makes it ignore the new defininitions 2020-07-23 17:03:28 +02:00
Martin Kroeker
d6b6e5ccd7 Merge pull request #73 from xianyi/develop
rebase
2020-07-23 16:59:06 +02:00
Martin Kroeker
349b722d8d Merge pull request #2729 from martin-frbg/issue2728
Unify BUFFER_SIZE settings for x86_64 again to fix DYNAMIC_ARCH crashes
2020-07-22 22:45:57 +02:00
Martin Kroeker
6c33764ca4 Unify BUFFER_SIZE settings for x86_64 again to fix potentially fatal mismatch in DYNAMIC_ARCH builds 2020-07-22 17:30:55 +00:00
Martin Kroeker
d1b9613fd4 Merge pull request #2727 from wyphan/develop
Patch for building on POWERPC with PGI compilers (was Patch for building on Summit)
2020-07-21 17:06:53 +02:00
Martin Kroeker
3cfc74b1a0 Merge pull request #2726 from martin-frbg/2725-2
Add detection of stdatomic.h for cmake
2020-07-21 16:42:06 +02:00
Wileam Phan
9ae154ba89 Patch for building on Summit 2020-07-20 23:30:28 -04:00
Martin Kroeker
9e21a100e3 Add trivial check for stdatomic.h 2020-07-20 22:52:09 +00:00
Martin Kroeker
31d30312dc Merge pull request #72 from xianyi/develop
rebase
2020-07-21 00:49:12 +02:00
Martin Kroeker
fcfb7ffafb Merge pull request #2725 from martin-frbg/ccheck_c11
Have c_check probe availability of C11 atomics support and stdatomic.h
2020-07-18 23:08:08 +02:00
Martin Kroeker
bbe119ee3b Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd Update conditional for atomics to use HAVE_C11 2020-07-18 17:14:50 +00:00
Martin Kroeker
a36eb19ae0 Update conditional for C11 atomics to use HAVE_C11 2020-07-18 17:13:24 +00:00
Martin Kroeker
ce45af8151 Update conditional for atomics to use HAVE_C11 2020-07-18 17:09:56 +00:00
Martin Kroeker
6f38de06d2 Update conditional for atomics to use HAVE_C11 2020-07-18 17:09:01 +00:00
Martin Kroeker
09eb9d2584 Update conditional for atomics to HAVE_C11 2020-07-18 17:07:38 +00:00
Martin Kroeker
791e046744 Update conditional for atomics to use HAVE_C11 2020-07-18 17:05:59 +00:00
Martin Kroeker
94bab9d1f9 Update conditional for atomics to use HAVE_C11 2020-07-18 17:03:31 +00:00
Martin Kroeker
97d6eb97b1 Report availability of C11 support 2020-07-18 16:59:33 +00:00
Martin Kroeker
4afd11dae5 Add a check for C11 atomics and stdatomic.h 2020-07-18 16:57:41 +00:00
Martin Kroeker
72ec6280c7 Merge pull request #2724 from martin-frbg/loongsonreadme
Update cross-compiling example in README to reflect change in Loongson gcc
2020-07-18 18:08:40 +02:00
Martin Kroeker
26b7f24d16 Update cross-compiling example to reflect change in Loongson gcc
for #2723
2020-07-18 12:51:37 +00:00
Martin Kroeker
0db4218fed Merge pull request #2722 from martin-frbg/cmakefcheck
Handle lack of fortran compiler more gracefully in cmake
2020-07-17 10:33:03 +02:00
Martin Kroeker
9d000ecaa2 include CheckLanguage module 2020-07-16 22:36:35 +00:00
Martin Kroeker
a847d00366 handle missing lack of fortran compiler more gracefully 2020-07-16 22:17:39 +00:00
Martin Kroeker
0033f8be0d Use vec_vsx_ld/st to fix misaligned accesses flagged by asan 2020-07-16 23:32:54 +02:00
Martin Kroeker
f308e741b2 remove debug output and revert changes to cdot and crot 2020-07-15 10:00:07 +02:00
Martin Kroeker
4f5d26bb02 Merge pull request #2716 from RajalakshmiSR/p10_ldflag
Add new linker option for POWER10
2020-07-15 01:20:54 +02:00
Rajalakshmi Srinivasaraghavan
417c4e8af8 Add new linker option for POWER10
While building with DYNAMIC_ARCH on POWER9 with POWER10
aware toolchain, new LDFLAG is needed to avoid POWER10
instructions on PLT calls .
2020-07-14 11:54:04 -05:00
Martin Kroeker
da17abec87 fix trailing whitespace 2020-07-14 18:20:03 +02:00
Martin Kroeker
f8c2697701 Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8 2020-07-14 18:11:19 +02:00
Martin Kroeker
b144423f0f Do not define USE_TRMM for 32bit POWER8 2020-07-14 18:10:12 +02:00
Martin Kroeker
bd2498c886 Use POWER6 GEMM parameters on 32bit POWER8 2020-07-14 18:07:58 +02:00
Martin Kroeker
d8e2edfc20 Merge pull request #71 from xianyi/develop
rebase
2020-07-14 18:01:34 +02:00
Martin Kroeker
419b8686d1 Merge pull request #2682 from martin-frbg/aix
[WIP] fix compilation on AIX
2020-07-13 14:43:24 +02:00
Martin Kroeker
3ab15ff34c Merge pull request #2651 from leezu/actionsflang
Add flang build to Github Actions
2020-07-13 13:00:39 +02:00
Martin Kroeker
8916c4ae2c Merge branch 'develop' into actionsflang 2020-07-12 20:37:29 +02:00
Martin Kroeker
4fa283de66 Merge pull request #2706 from jussienko/use-always-omp-threads
Fix OpenMP builds defaulting to singlethreading with OMP_PLACES or OMP_PROC_BIND set
2020-07-12 20:17:11 +02:00
Martin Kroeker
5865c7d4d6 Make 32bit POWER8 use POWER6 kernels for now 2020-07-12 18:59:01 +02:00
Martin Kroeker
ae3a90f78f merge overwritten part of power10 support 2020-07-12 18:51:58 +02:00
Martin Kroeker
009864edde Merge pull request #2710 from martin-frbg/cmake-lapacktest
Add LAPACK-TESTING to the cmake build
2020-07-10 12:06:50 +02:00
Martin Kroeker
3de80b3f5a Merge pull request #2713 from RajalakshmiSR/p10-gcc10
Change minimum gcc version for POWER10
2020-07-10 10:43:33 +02:00
Rajalakshmi Srinivasaraghavan
af1e140e35 Change minimum gcc version for POWER10
As the MMA patches for POWER10 are backported to gcc10.2, changing
the minimum gcc version needed to build OpenBLAS for POWER10.
2020-07-09 21:46:06 -05:00
Martin Kroeker
d4a0299e16 Do not build lapack-test on MSVC for now (same as with BLAS test) 2020-07-09 13:57:27 +02:00
Martin Kroeker
f766024749 enable fortran for cmake 2020-07-09 13:44:25 +02:00
Martin Kroeker
c502760bef Modify for building with OpenBLAS 2020-07-09 13:13:16 +02:00
Martin Kroeker
29b5887d5f Modify for building with OpenBLAS 2020-07-09 13:12:35 +02:00
Martin Kroeker
60188a8c82 Append crude hack for enabling lapack tests in the OpenBLAS build 2020-07-09 11:44:31 +02:00
Martin Kroeker
1d63631afe Add lapack-test 2020-07-09 11:42:02 +02:00
Martin Kroeker
e82bb953a7 Merge pull request #2708 from RajalakshmiSR/p10_future
Changing mcpu option as power10
2020-07-08 12:26:44 +02:00
Martin Kroeker
ed7e155c35 Merge branch 'develop' into aix 2020-07-07 18:52:06 +02:00
Rajalakshmi Srinivasaraghavan
45d819ca82 Changing mcpu option as power10
As compiler enabled mcpu option as power10, changing it from future.
2020-07-07 11:25:20 -05:00
Martin Kroeker
8751a69271 Obtain actual cpu count on AIX and suppress spurious NO_AVX512 on non-x86 2020-07-07 15:46:32 +02:00
Jussi Enkovaara
10a2923f64 fixes #2238
Always obey omp_get_max_threads() when build with USE_OPENMP
2020-07-07 13:35:43 +03:00
Martin Kroeker
5ff83a4261 Merge pull request #2670 from mhillenibm/dumpfullversion_on_gcc7
RFC: Use -dumpfullversion to get minor version on gcc-7 and newer
2020-07-07 00:12:28 +02:00
Martin Kroeker
5bc9680a86 Merge pull request #2703 from martin-frbg/issue2702
Compatibility fix for gcc < 4.7
2020-07-02 22:32:51 +02:00
Martin Kroeker
4ab3651591 Option -mavx2 requires at least gcc 4.7 2020-07-02 17:00:15 +02:00
Martin Kroeker
a83680b40b Merge pull request #69 from xianyi/develop
rebase
2020-07-02 16:56:00 +02:00
Martin Kroeker
c3aa036e99 Merge pull request #2693 from EGuesnet/AIX-build-on-POWER8-32bits
AIX build on POWER8 32bits
2020-07-01 08:29:52 +02:00
EGuesnet
634e1305f9 Update cgemm_kernel_8x4_power8.S 2020-06-30 15:16:39 +02:00
Martin Kroeker
c467516132 Merge pull request #2688 from martin-frbg/cometlake
Add autodetection of Intel Comet Lake H and S models
2020-06-27 17:47:24 +02:00
Martin Kroeker
83f4746825 Add support for Comet Lake H and S 2020-06-27 14:41:24 +02:00
Martin Kroeker
584ef8d4ae Add support for Comet Lake H & S 2020-06-27 14:36:37 +02:00
Martin Kroeker
8dfda02e89 Merge pull request #68 from xianyi/develop
rebase
2020-06-27 14:29:29 +02:00
Martin Kroeker
28d69e0097 Merge pull request #2687 from martin-frbg/utfbom
Strip UTF8 byte order marker from source files
2020-06-26 22:53:09 +02:00
Martin Kroeker
c2467c9619 Merge pull request #2686 from RajalakshmiSR/p10_shgemm
powerpc: Optimized SHGEMM kernel for POWER10
2020-06-26 22:52:45 +02:00
Martin Kroeker
f86e749df4 Merge pull request #2683 from mtreinish/add-comet-lake-support
Add cpu detection support for comet lake U
2020-06-26 12:11:03 +02:00
Martin Kroeker
d199c2787d Merge pull request #2680 from kavanabhat/aix_makefile_fix
Fix for #2671
2020-06-26 11:27:28 +02:00
Martin Kroeker
e30ad0e521 Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
Rajalakshmi Srinivasaraghavan
d23419accc powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Matthew Treinish
2f9c10810c Also set CPUTYPE in get_cpuname() 2020-06-25 15:53:56 -04:00
Matthew Treinish
f37e941d52 Add support to driver/others/dynamic.c too 2020-06-25 11:56:49 -04:00
Matthew Treinish
2a91452bdd Add cpu detection support for comet lake U
Comet Lake U CPUs have family: 6, model: 6, extended family: 0, and
extended model: 10 were not being correctly detected by GETARCH during
openblas builds and would show CORE=UNKNOWN and LIBCORE=unknown. This
commit adds the necessary information to cpuid_x86 to detect extended
family 10 model 6 and return the proper core information. It's
essentially just a skylake cpu, not skylake x, so I just took the used
the same return fields as skylake.
2020-06-25 11:32:09 -04:00
Martin Kroeker
c854ef5471 Fix variable names in conditional 2020-06-25 13:29:52 +02:00
Martin Kroeker
c0afc11742 Fix POWERPC builds on AIX (gcc/gfortran 7)
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Martin Kroeker
c592f0f80a Fix utest build on AIX 2020-06-25 12:58:13 +02:00
Martin Kroeker
3f613b1301 Tentative changes for building on AIX 2020-06-25 12:57:00 +02:00
Martin Kroeker
72a0ec8e75 Fix reading of CPU name from prtconf output on AIX 2020-06-25 12:55:10 +02:00
Martin Kroeker
3446e58daf Fix handling of uname output on AIX 2020-06-25 12:31:35 +02:00
Martin Kroeker
4ca8becc4b Merge pull request #67 from xianyi/develop
rebase
2020-06-25 10:33:03 +02:00
Martin Kroeker
dfe819f3bd Merge pull request #2679 from RajalakshmiSR/P10_GEMM
POWER10 Optimized GEMM kernels
2020-06-25 08:31:38 +02:00
Martin Kroeker
4369e52555 Merge pull request #2677 from brada4/develop
address vs2019 C4293 warning
2020-06-25 08:31:17 +02:00
Gordon Fossum
bb2f52844b powerpc: Optimized ZGEMM kernel for POWER10
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan
571eadb880 powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat
df4ade070f Fix for #2671 2020-06-24 04:25:47 -05:00
User User-User
e6b9275034 address vs2019 C4293 2020-06-24 09:12:23 +03:00
Martin Kroeker
53ea5bfece Merge pull request #66 from xianyi/develop
rebase
2020-06-23 10:13:44 +02:00
Martin Kroeker
93592d1260 Merge pull request #2675 from wjc404/develop
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
Martin Kroeker
6eaeb01263 Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker
45d542c9d1 Merge pull request #65 from xianyi/develop
rebase
2020-06-21 12:41:01 +02:00
wjc404
086d87a302 AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
Martin Kroeker
af501eb753 Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection
Zarch fix gcc detection
2020-06-17 17:55:25 +02:00
Martin Kroeker
0eb6c4dded Merge pull request #2672 from mhillenibm/test_num_threads
cpp_thread_test: Change adjustment of concurrency on systems with <52 hw threads
2020-06-17 17:54:31 +02:00
Marius Hillenbrand
de838c38ef cpp_thread_test/dgemv: fail early if concurrency is zero
The two test cases dgemv_tester and dgemm_tester accept the degree of
concurrency as command line argument (amongst others). Fail early if
value 0 has been specified, instead of later with less-clear symptoms.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:15:44 +02:00
Marius Hillenbrand
478898b37a cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems
... instead of (number of hw threads - 4) to avoid invalid numbers on
smaller systems. Currently, systems with 4 or fewer CPUs (e.g., small CI
VMs) would fail the test. Fixes one of the issues discussed in #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:08:48 +02:00
Marius Hillenbrand
cde4690721 RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x
In gcc-7.1, the behavior of -dumpversion changed to be configured
at compile-time. On some distributions it only dumps the major version
(e.g., Ubuntu), so the current checks for the gcc minor version report
false negatives. As a replacement, gcc-7.1 introduced -dumpfullversion
which always prints the full version.

Update the gcc version detection in Makefile.system to employ
-dumpfullversion with gcc-7 and newer.

Posting this patch for discussion, since it emerged from discussions
around issue #2668 and PR #2669. It is not solving a problem right now,
but may be useful in the future.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:45:59 +02:00
Marius Hillenbrand
2389291766 Makefile.system: remove duplicate variable GCCVERSIONGT5
... to bring unified gcc version detection with common variables to the
one remaining spot in Makefile.system.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Marius Hillenbrand
a2d13ea611 Fix gcc version detection for zarch
Employ common variables for gcc version detection and fix the broken
check for gcc >= 5.2.
Fixes #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Martin Kroeker
1bd3cd66c2 Increment version to 0.3.10.dev 2020-06-14 22:05:19 +02:00
Martin Kroeker
1c53e1366d Increment version to 0.3.10.dev 2020-06-14 22:04:37 +02:00
Martin Kroeker
95dbeff66d Merge branch 'release-0.3.0' into develop 2020-06-14 22:02:45 +02:00
Martin Kroeker
3b673a24b7 Increment version to 0.3.10.dev 2020-06-14 21:57:52 +02:00
Martin Kroeker
1eb1979050 Increment version to 0.3.10.dev 2020-06-14 21:57:15 +02:00
Martin Kroeker
efc53b6e7e Merge pull request #2665 from martin-frbg/flang-fixes-2a
Fix spelling of flang option -Mrecursive, add -Kieee and workaround for AOCC optimizer bug
2020-06-14 21:56:08 +02:00
Martin Kroeker
72888497e2 Update with 0.3.10 changes 2020-06-14 21:55:31 +02:00
Martin Kroeker
7e3e006af6 Merge pull request #2666 from martin-frbg/blastest
Update BLAS tests to what netlib 3.9.0 uses
2020-06-14 18:28:37 +02:00
Martin Kroeker
d906d14402 Merge pull request #2664 from ACSimon33/exported_symbols
Add missing exported symbols.
2020-06-14 18:27:03 +02:00
Martin Kroeker
3785c0e82b Merge pull request #2663 from martin-frbg/issue2654
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-14 18:26:43 +02:00
Martin Kroeker
f2d8879af6 Merge pull request #2661 from martin-frbg/issue2660
Report selected DYNAMIC_ARCH kernel rather than one of its aliases in gotoblas_corename
2020-06-14 18:25:37 +02:00
Martin Kroeker
6876221cf3 Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker
79cdcde717 Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang 2020-06-14 17:18:16 +02:00
Martin Kroeker
18a11137f1 Update BLAS tests to correspond to Reference-LAPACK 3.9.0
replaces calculation of machine precision with call to epsilon intrinsic and removes the requirement for previous output files to be removed before rerunning tests
2020-06-14 10:26:25 +02:00
Martin Kroeker
1dd712131e Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Martin Kroeker
0ed2adf0b2 Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:01:20 +02:00
Martin Kroeker
abf670757b Respect predefined defaults for AR, AS, LD and RANLIB 2020-06-13 23:21:13 +02:00
Simon Märtens
41fc6f3cd2 Added missing exported symbols. 2020-06-13 22:37:39 +02:00
Martin Kroeker
007d9f97d7 Make gotoblas_corename report the name of the selected TARGET rather than its aliases 2020-06-13 19:25:28 +02:00
Martin Kroeker
63d26090f5 Merge pull request #64 from xianyi/develop
rebase
2020-06-13 19:14:47 +02:00
Rajalakshmi Srinivasaraghavan
9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker
3a1b58d54a Merge pull request #2653 from craft-zhang/cortex-a53
fix INIT8x4 of SGEMM on Arm Cortex-A53
2020-06-10 12:19:33 +02:00
Martin Kroeker
f7659be4a0 Merge pull request #2652 from martin-frbg/flang-fixes
Fixes for compilation with flang binary release 20190329
2020-06-09 20:31:06 +02:00
ZhangDanfeng
bc6fd20a40 fix INIT8x4
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-10 01:01:16 +08:00
Martin Kroeker
3ce469a34f Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker
ba2c5b404d When building with flang, use it also for the final link step to get dependencies right 2020-06-09 16:09:34 +02:00
Martin Kroeker
f07a80354b Apply previously AOCC-specific workaround to all versions of flang 2020-06-09 16:07:03 +02:00
Martin Kroeker
fdd1b50263 Merge pull request #63 from xianyi/develop
rebase
2020-06-09 15:54:30 +02:00
Leonard Lausen
b98923f33a Test enforce -O1 for flang 2020-06-09 06:54:47 +00:00
Leonard Lausen
4cb1db0e3b Test flang build 2020-06-09 06:31:17 +00:00
Martin Kroeker
430e8b45fe Merge pull request #2648 from martin-frbg/lapack411
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
2020-06-07 19:45:52 +02:00
Martin Kroeker
88fe85f4e0 Merge pull request #2647 from martin-frbg/aocc-flang
Small fixes for flang in general and the AMD AOCC version of it in particular
2020-06-07 19:45:11 +02:00
Martin Kroeker
89091e6b64 Merge pull request #2645 from martin-frbg/misc_fixes
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker
522aaf53bf Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
Reference-LAPACK issue 411
2020-06-07 14:30:20 +02:00
Martin Kroeker
c3574ffe53 Merge pull request #2646 from wjc404/develop
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
Martin Kroeker
4e28dc6353 Use only -O1 with AMD AOCC version of flang
to prevent miscompilation of LAPACK codes and tests on Ryzen
2020-06-07 00:05:02 +02:00
Martin Kroeker
13c28889a2 Update "cosmetic fixes for non-C99 compilers" 2020-06-06 15:22:27 +02:00
wjc404
0e3ac4a06b Add files via upload 2020-06-06 14:56:57 +08:00
Martin Kroeker
28915eed72 Cosmetic fixes for non-C99 compilers 2020-06-05 10:05:34 +02:00
Martin Kroeker
7f60fb6b91 Delete spurious copy of common_param.h 2020-06-05 10:04:16 +02:00
Martin Kroeker
0464e662ad make blas_quickdivide unsigned and guard against miscompilation 2020-06-05 10:03:36 +02:00
Martin Kroeker
0f9a935a5a Merge pull request #62 from xianyi/develop
rebase
2020-06-05 09:51:06 +02:00
Martin Kroeker
79cd69fea4 Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker
bb12c2c854 Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker
32c1c1e125 Update azure-pipelines.yml 2020-06-04 19:03:46 +02:00
Martin Kroeker
f1953b8b81 Update azure-pipelines.yml 2020-06-04 17:58:13 +02:00
Martin Kroeker
6e97df7b47 Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker
729303e5ed Merge pull request #2643 from craft-zhang/cortex-a53
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker
547965530f Merge pull request #2638 from leezu/actions
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng
9b7877ccf1 sgemm copy source init
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1 Insert prefetch
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:08:48 +08:00
Martin Kroeker
3eda3d34c3 Merge pull request #2641 from martin-frbg/ppcg4
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker
a8f42ae85c set cmake build type to Release 2020-06-03 15:28:59 +02:00
Martin Kroeker
e6e2e531bc revert clang pragma 2020-06-03 15:16:27 +02:00
Martin Kroeker
456dc04441 Update sgemm_kernel_16x4_skylakex_3.c 2020-06-03 15:15:41 +02:00
Martin Kroeker
89323458a9 preset optimization level for apple clang 2020-06-03 15:07:25 +02:00
Martin Kroeker
e153bdeb70 Update dynamic_arch.yml 2020-06-03 13:46:43 +02:00
Martin Kroeker
c2001f7756 Make cmake build verbose to see options in use 2020-06-03 12:18:15 +02:00
Martin Kroeker
c2b3f0b3f6 Revert "keep Apple Clang from optimizing this" 2020-06-03 10:22:15 +02:00
Martin Kroeker
f16e39554d Change PPCG4 CGEMM_M to match kernel change 2020-06-03 09:15:29 +02:00
Martin Kroeker
b1ee81228a Change complex DOT and ROT to generic kernels and switch CGEMM
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker
9f7358d7dc Keep Apple Clang from optimizing this 2020-06-03 08:52:53 +02:00
Martin Kroeker
54fa90fb25 Keep apple clang 11.0.3 from trying to optimize this (and running out of registers) 2020-06-02 17:31:45 +02:00
Leonard Lausen
5a709b8340 Print CPU info in output 2020-06-01 20:51:11 +00:00
Leonard Lausen
b31a68b835 Add Github Actions test for DYNAMIC_ARCH builds 2020-06-01 20:16:59 +00:00
Martin Kroeker
86552bf4c7 Update f_check 2020-05-31 15:22:12 +02:00
Martin Kroeker
a349d48d89 Merge pull request #2636 from martin-frbg/issue2634
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker
4db00121dc Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker
909897f13b Document option USE_LOCKING 2020-05-31 12:37:57 +02:00
Martin Kroeker
e79245acd9 Merge pull request #2635 from ilayn/patch-1
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat
76d2612e0c BUG: Fix the loop range in ZHEEQUB.f 2020-05-30 14:11:11 +02:00
Martin Kroeker
ced49466f0 Use the fortran compiler to link LAPACK-related benchmarks
to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.
2020-05-29 13:35:51 +02:00
Martin Kroeker
6e270f91ec add support for RETURN_BY_STACK semantics, e.g. clang 2020-05-29 13:29:10 +02:00
Martin Kroeker
200296b0f4 remove libomp from link list only for pgfortran
at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default
2020-05-29 13:23:51 +02:00
Martin Kroeker
dd7a650792 Merge pull request #59 from xianyi/develop
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker
4a4c50a7ce Merge pull request #2627 from pkubaj/patch-1
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker
d069780e63 Merge pull request #2626 from docularxu/working-gcc-version-detections
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj
33c8790603 Add powerpc (32-bit)
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu
06387ac0e6 make GCC version detection OS-independent
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.

The fix is to move GCC version detection into common part, not
wrapped by anything.

Signed-off-by: Guodong Xu <guodong.xu@linaro.com>
2020-05-25 10:57:03 +00:00
Martin Kroeker
f1a18d245b Merge pull request #2618 from craft-zhang/cortex-A53
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫
2a3aa91354 update CONTRIBUTORS.md, adding myself 2020-05-20 22:35:26 +08:00
张丹枫
ea5bdc3f72 split cortex-a53 param to match 8x8 kernel 2020-05-20 22:34:47 +08:00
张丹枫
9df79ae9a3 update sgemm and strmm kernel selecting strategy 2020-05-20 22:26:58 +08:00
张丹枫
a1fc6041cd use general register to speedup 2020-05-20 22:26:58 +08:00
张丹枫
edb423d772 align general register using to strmm_kernel_8x8 2020-05-20 22:26:58 +08:00
zhangdanfeng
0e6eb8c247 sgemm kernel use sgemm_kernel_8x8_cortexa53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
zhangdanfeng
d475db29c6 optimized for cortex-a53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
Martin Kroeker
729ac6bd4a Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
2020-05-20 14:51:04 +02:00
Marius Hillenbrand
89fe17f20e s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Marius Hillenbrand
bdd795ed03 s390x/GEMM: replace 0-init with peeled first iteration
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Martin Kroeker
e1038ea836 Merge pull request #2622 from martin-frbg/issue2619
Improve declaration of LAPACKE_get_nancheck
2020-05-19 23:07:22 +02:00
Martin Kroeker
6baa9a778d Improve declaration of LAPACKE_get_nancheck 2020-05-19 17:59:31 +02:00
Martin Kroeker
cf46c9f84e Merge pull request #2617 from martin-frbg/issue2616
Add workaround for unhandled  gmake jobserver flags in c_check/f_check
2020-05-18 13:23:58 +02:00
Martin Kroeker
55602fce56 Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:28:14 +02:00
Martin Kroeker
3d5e159e7a Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:26:57 +02:00
Martin Kroeker
2931feb575 Merge pull request #58 from xianyi/develop
rebase
2020-05-17 15:23:32 +02:00
Martin Kroeker
20245ded5f Merge pull request #2615 from mhillenibm/z14_alignment_hints
s390x: improvise vector alignment hints for older compilers
2020-05-14 21:06:34 +02:00
Marius Hillenbrand
2840432e49 s390x: improvise vector alignment hints for older compilers
Introduce inline assembly so that we can employ vector loads with
alignment hints on older compilers (pre gcc-9), since these are still
used in distributions such as RHEL 8 and Ubuntu 18.04 LTS.

Informing the hardware about alignment can speed up vector loads. For
that purpose, we can encode hints about 8-byte or 16-byte alignment of
the memory operand into the opcodes. gcc-9 and newer automatically emit
such hints, where applicable. Add a bit of inline assembly that achieves
the same for older compilers. Since an older binutils may not know about
the additional operand for the hints, we explicitly encode the opcode in
hex.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-14 15:36:03 +02:00
Martin Kroeker
ea78106c71 Merge pull request #2614 from mhillenibm/gemm_vec_z14
s390x: Improve performance of SGEMM and STRMM on z14 and newer
2020-05-13 15:09:23 +02:00
Marius Hillenbrand
cb9dc36dd5 Update CONTRIBUTORS.md
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 16:14:00 +02:00
Marius Hillenbrand
1b0b4349a1 s390x/Z14: Change register blocking for SGEMM to 16x4
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.

On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
71b6eaf459 s390x: Use new sgemm kernel also for strmm on Z14 and newer
Employ the newly added GEMM kernel also for STRMM on Z14. The
implementation in C with vector intrinsics exploits FP32 SIMD operations
and thereby gains performance over the existing assembly code. Extend
the implementation for handling triangular matrix multiplication,
accordingly. As added benefit, the more flexible C code enables us to
adjust register blocking in the subsequent commit.

Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
43c0d4f312 s390x: Add vectorized sgemm kernel for Z14 and newer
Add a new GEMM kernel implementation to exploit the FP32 SIMD
operations introduced with z14 and employ it for SGEMM on z14 and newer
architectures.

The SIMD extensions introduced with z13 support operations on
double-sized scalars in vector registers. Thus, the existing SGEMM code
would extend floats to doubles before operating on them. z14 extended
SIMD support to operations on 32-bit floats. By employing these
instructions, we can operate on twice the number of scalars per
instruction (four floats in each vector registers) and avoid the
conversion operations.

The code is written in C with explicit vectorization. In experiments,
this kernel improves performance on z14 and z15 by around 2x over the
current implementation in assembly. The flexibilty of the C code paves
the way for adjustments in subsequent commits.

Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking (e.g., partial register blocks with
fewer than UNROLL_M rows and/or fewer than UNROLL_N columns).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
d7c1677c20 Update CONTRIBUTORS.md, adding myself
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 11:09:28 +02:00
Marius Hillenbrand
0dbe61a612 s390x: choose SIMD kernels at run-time based on OS and compiler support
Extend and simplify the run-time detection for dynamic architecture support for z
to check HW_CAP and only use SIMD features if advertised by the OS.
While at it, also honor the env variable LD_HWCAP_MASK and do not use
the CPU features masked there.

Note that we can only use the SIMD features on z13 or newer (i.e.,
Vector Facility or Vector-Enhancements Facilities) when the operating
system supports properly context-switching the vector registers. The OS
advertises that support as a bit in the HW_CAP value in the auxiliary
vector. While all recent Linux kernels have that support, we should
maintain compatibility with older versions that may still be in use.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
62cf391cbb s390x: only build kernels supported by gcc with dynamic arch support
When building with dynamic arch support, only build kernels for
architectures that are supported by the gcc we are building with.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
8c338616f9 s390x: gate dynamic arch detection on gcc version and add generic
When building OpenBLAS with DYNAMIC_ARCH=1 on s390x (aka zarch), make
sure to include support for systems without the facilities introduced
with z13 (i.e., zarch_generic). Adjust runtime detection to fallback to
that generic code when running on a unknown platform other than Z13
through Z15.

When detecting a Z13 or newer system, add a check for gcc support for
the architecture-specific features before selecting the respective
kernel. Fallback to Z13 or generic code, in case.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 11:01:16 +02:00
Martin Kroeker
f94c53ec0a Merge pull request #2612 from RajalakshmiSR/testshgemm
Improve shgemm test
2020-05-12 08:34:02 +02:00
Rajalakshmi Srinivasaraghavan
8efba9b7c0 Improve shgemm test
This patch adds another check to test shgemm results.
2020-05-11 17:15:10 -05:00
Martin Kroeker
4fffa556d8 Merge pull request #2611 from RajalakshmiSR/bench_half
Include shgemm in benchtest
2020-05-11 21:08:41 +02:00
Rajalakshmi Srinivasaraghavan
ce90e2bd3f Include shgemm in benchtest
This patch is to enable benchtest for half precision gemm
when BUILD_HALF is set during make.
2020-05-11 09:57:46 -05:00
Martin Kroeker
948b6712ba Merge pull request #2610 from martin-frbg/issue2552-3
Temporary workaround for excessive LAPACK test failures with COMPLEX on Skylake-X
2020-05-10 13:10:31 +02:00
Martin Kroeker
2271c3506b Work around excessive LAPACK test failures on Skylake-X
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
2020-05-09 23:49:18 +02:00
Martin Kroeker
db00b21445 Merge pull request #2609 from martin-frbg/issue2552-2
Correct ifort options
2020-05-09 21:33:02 +02:00
Martin Kroeker
58d26b4448 Correct ifort options
to same as suggested by reference-lapack
2020-05-09 17:15:36 +02:00
Martin Kroeker
8e47d14053 Merge pull request #2608 from martin-frbg/issue2604
Handle trailing whitespace and empty variables in KERNEL files
2020-05-09 16:36:14 +02:00
Martin Kroeker
cd10b35fe9 Handle trailing spaces and empty condition variables 2020-05-09 13:42:33 +02:00
Martin Kroeker
9472dd99cd Merge pull request #57 from xianyi/develop
rebase
2020-05-09 13:20:44 +02:00
Martin Kroeker
7181665452 Merge pull request #2605 from RajalakshmiSR/cmake-power
Fix cmake compilation issue - POWER9
2020-05-09 11:29:28 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc Fix cmake compilation issue - POWER9
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Martin Kroeker
63e45def70 Merge pull request #2603 from martin-frbg/issue2552
Add FFLAGS_DRV entry to the generated make.inc to fix lapack-test failure with Intel compilers
2020-05-08 22:08:39 +02:00
Martin Kroeker
ec0f228632 Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort
fixes #2552
2020-05-08 18:06:12 +02:00
Martin Kroeker
90e2941c61 Merge pull request #56 from xianyi/develop
rebase
2020-05-07 22:43:48 +02:00
Martin Kroeker
10d5f3c87b Merge pull request #2602 from ashwinyes/thunderx2_develop
DAXPY Optimizations for ThunderX2
2020-05-07 22:06:41 +02:00
Ashwin Sekhar T K
8353cb245a ARM64: Improve DAXPY for ThunderX2
Improve performance of DAXPY for ThunderX2
when the vector fits in L1 Cache.
2020-05-07 09:22:50 -07:00
Martin Kroeker
ec2dd7b875 Merge pull request #2601 from martin-frbg/issue818
Undefine NAME/CNAME etc  in Makefile.system before defining them
2020-05-07 10:12:33 +02:00
Martin Kroeker
4e82eb9f8a Undefine ASMNAME/NAME/CNAME before defining them
to avoid redefinition warning when environment variables like CFLAGS are being used (fixes #818)
2020-05-07 00:31:32 +02:00
Martin Kroeker
61300bb735 Merge pull request #55 from xianyi/develop
rebase
2020-05-07 00:27:14 +02:00
Martin Kroeker
33e9b12464 Merge pull request #2597 from martin-frbg/appleclang
Use Clang 9.0.0 miscompilation fix for corresponding AppleClang version as well
2020-05-05 13:55:08 +02:00
Martin Kroeker
90dba9f716 Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
As discussed on the original PR #2329, the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
2020-05-05 10:44:50 +02:00
Martin Kroeker
424d551e01 Merge pull request #53 from xianyi/develop
rebase
2020-05-01 15:18:46 +02:00
Martin Kroeker
596f5df9e8 Merge pull request #2591 from RajalakshmiSR/testhalf
Add test for shgemm
2020-05-01 09:59:39 +02:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
a54e35e780 Merge pull request #2586 from martin-frbg/miscfixes
Trivial fix for compiler warnings
2020-04-29 22:01:41 +02:00
Rajalakshmi Srinivasaraghavan
564b0d39ef Add test for shgemm
This patch has Makefile changes to add test for shgemm which
compares sgemm and shgemm result.
2020-04-29 13:40:34 -05:00
Martin Kroeker
5d58b11101 Merge pull request #52 from xianyi/develop
rebase
2020-04-29 14:36:15 +02:00
Martin Kroeker
d394d4e677 Merge pull request #2585 from martin-frbg/mips64fix
Increase default BUFFER_SIZE on MIPS64
2020-04-28 19:47:55 +02:00
Xianyi Zhang
9d3a317abc Refs #2587 Fix typos. 2020-04-29 00:19:19 +08:00
Xianyi Zhang
92372c70fc Fix gemm interface bug for small matrix. 2020-04-28 23:15:20 +08:00
Xianyi Zhang
43bef4aaac Add alpha=1.0 beta=0.0 for small gemm. 2020-04-28 22:35:36 +08:00
Xianyi Zhang
aae6af94bb Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2020-04-28 19:02:41 +08:00
Martin Kroeker
f4248af26e Fix compiler warnings 2020-04-28 10:43:12 +02:00
Martin Kroeker
2d89603e9d Increase BUFFER_SIZE on mips64 to match SGEMM parameters 2020-04-28 10:40:40 +02:00
Martin Kroeker
26bc15258a Merge pull request #51 from xianyi/develop
rebase
2020-04-28 10:38:50 +02:00
Martin Kroeker
141998dce2 Merge pull request #2584 from martin-frbg/issue2583
[WIP] Have CMAKE parse conditional lines in KERNEL files
2020-04-28 10:35:12 +02:00
Martin Kroeker
3bd56846bb Silence a debug message 2020-04-27 16:27:09 +02:00
Martin Kroeker
e7bbdfdf84 Have CMAKE parse conditional lines in KERNEL files
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker
b6795db731 Merge pull request #2582 from martin-frbg/mips32fix
Increase BUFFER_SIZE on MIPS32 to accomodate SGEMM requirements
2020-04-27 09:18:34 +02:00
Martin Kroeker
5e0dbf8dfe Increase default BUFFER_SIZE to accomodate SGEMM parameters
in response to compile-time warning from #2551
2020-04-26 22:21:05 +02:00
Martin Kroeker
955d73127f Merge pull request #50 from xianyi/develop
rebase
2020-04-26 22:17:56 +02:00
Martin Kroeker
a8c1bea7ae Merge pull request #2581 from martin-frbg/raji
Fix travis configuration and update CONTRIBUTORS.md
2020-04-25 19:57:10 +02:00
Martin Kroeker
e43b49e064 Drop the set -e from travis scripts 2020-04-25 16:18:54 +02:00
Martin Kroeker
3e28db7f38 Update CONTRIBUTORS.md 2020-04-25 13:51:44 +02:00
Martin Kroeker
4b69ee31af Merge pull request #2580 from martin-frbg/issue2538-3
Increase POWER8 ZGEMM_R and use same R values for POWER9
2020-04-25 00:28:18 +02:00
Martin Kroeker
03ff213c51 Increase POWER8 ZGEMM_R and use same R values for POWER9
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
2020-04-24 21:46:54 +02:00
Martin Kroeker
299d1c8de0 Merge pull request #2578 from martin-frbg/issue2576
Quote getarch include paths in prebuild.cmake
2020-04-24 14:32:46 +02:00
Martin Kroeker
70869d571f Quote include paths for getarch to protect any embedded spaces 2020-04-24 10:30:44 +02:00
Martin Kroeker
cba87222b2 Merge pull request #49 from xianyi/develop
rebase
2020-04-24 10:21:48 +02:00
Martin Kroeker
f80dd2151e xcode 11.4.1 for homebrew ? 2020-04-23 14:31:09 +02:00
Martin Kroeker
4412ee1754 Switch homebrew build env to new xcode 11.4
default 11.3.1 in the github image is causing brew to fail with "outdated xcode" message
2020-04-23 10:54:46 +02:00
Martin Kroeker
f6104b68c1 Merge pull request #2571 from martin-frbg/issue2299
Work around IDAMAX/IZAMAX bugs on POWER8BE with ELFv2 FreeBSD
2020-04-22 18:27:13 +02:00
Martin Kroeker
84f2c71e93 Merge pull request #2573 from martin-frbg/issue2572
Enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 15:04:49 +02:00
Martin Kroeker
06208c8d01 Limit this fix to ELFv2 builds 2020-04-22 14:16:40 +02:00
Martin Kroeker
c90b28dee6 Export ELF_VERSION for use in powerpc kernel configurations 2020-04-22 14:14:20 +02:00
Martin Kroeker
6275b43918 Avoid duplicate printout of byte order and report ELF_VERSION 2020-04-22 14:12:27 +02:00
Martin Kroeker
2db5178e2d enable cblas interfaces to GEMM3M in CMAKE builds 2020-04-22 11:01:28 +02:00
Martin Kroeker
57549f5c92 Merge pull request #2569 from martin-frbg/issue2472-2
Fix linker option passing for MSVS and ReLAPACK
2020-04-21 20:26:53 +02:00
Martin Kroeker
f5c4c28b98 Work around POWER8BE bugs on FreeBSD (ELFv2)
for #2299
2020-04-21 17:17:17 +02:00
Martin Kroeker
239282d5e2 Use CMAKE_SHARED_LINKER_FLAGS to pass MSVC linker option
target_link_libraries does not work here according to issue 2472
2020-04-20 22:30:51 +02:00
Martin Kroeker
568674477c Merge pull request #48 from xianyi/develop
rebase
2020-04-20 21:51:59 +02:00
Martin Kroeker
fa42588e1f Merge pull request #2565 from martin-frbg/mips24k
Support MIPS32 24K family as P5600
2020-04-20 17:13:53 +02:00
Martin Kroeker
8a6d26458b Merge pull request #2559 from RajalakshmiSR/shgemm
Add half precision gemm for bfloat16 in OpenBLAS
2020-04-19 22:09:55 +02:00
Martin Kroeker
db86f516b9 Merge pull request #2568 from martin-frbg/azure-win
Add a Windows/CL build job to the Azure CI
2020-04-19 19:06:33 +02:00
Martin Kroeker
aec353b5a7 Add a Windows/CL build to the Azure Ci configuration 2020-04-19 19:04:33 +02:00
Martin Kroeker
c62fbefad4 Merge pull request #2567 from xianyi/revert-2566-azurewin
Revert "Add Windows build job on Azure CI"
2020-04-19 19:01:58 +02:00
Martin Kroeker
04706e760d Revert "Add Windows build job on Azure CI (#2566)"
This reverts commit e1e543b145.
2020-04-19 19:00:37 +02:00
Martin Kroeker
e1e543b145 Add Windows build job on Azure CI (#2566)
* Add Windows-CL build job on Azure
2020-04-19 16:16:15 +02:00
Martin Kroeker
e55ec82bb9 Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
Martin Kroeker
7353ea5afc Delete KERNEL.24K 2020-04-19 15:44:19 +02:00
Martin Kroeker
6a04efb122 Rename KERNEL files to include MIPS prefix 2020-04-19 15:43:54 +02:00
Martin Kroeker
5afb66812f Update getarch.c 2020-04-19 14:55:31 +02:00
Martin Kroeker
0d18f231fc Update getarch.c 2020-04-19 13:52:58 +02:00
Martin Kroeker
2f4a8e5bc4 Rename the FORCE entries for 24K and 1004K to include the MIPS prefix 2020-04-19 13:22:19 +02:00
Martin Kroeker
4f70512b97 Update kernel.cmake 2020-04-19 08:10:26 +02:00
Martin Kroeker
8792fc4d5f Disable RPCC macro on MIPS24K 2020-04-19 07:21:48 +02:00
Martin Kroeker
577c5d9f8f Update README.md 2020-04-19 06:54:52 +02:00
Martin Kroeker
6721f2750e Update TargetList.txt 2020-04-19 06:51:57 +02:00
Martin Kroeker
b0b02a080d Add compiler options for MIPS32 24K/1004K 2020-04-19 06:50:51 +02:00
Martin Kroeker
a1fc98dc57 rename 1004K, 24K to MIPS1004K, MIPS24K to avoid identifier naming problem 2020-04-18 23:50:23 +02:00
Martin Kroeker
d0737b0142 Update kernel.cmake 2020-04-18 21:36:28 +02:00
Martin Kroeker
7dbb59b256 Update common_macro.h 2020-04-18 21:34:14 +02:00
Martin Kroeker
00172d440b Typo fix in MIPS24K addition 2020-04-18 21:16:49 +02:00
Martin Kroeker
d712ea724c Add MIPS24K support 2020-04-18 21:10:18 +02:00
Martin Kroeker
61bbae3ac1 Handle MIPS24K like P5600
and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)
2020-04-18 21:09:32 +02:00
Martin Kroeker
1c1ca2bc0a Merge pull request #47 from xianyi/develop
rebase
2020-04-18 21:07:14 +02:00
Martin Kroeker
c7d668c248 Update common_macro.h 2020-04-18 16:04:38 +02:00
Martin Kroeker
a83a59b038 Use generic kernels for ishama,shasum,shdot,shrot 2020-04-18 15:53:51 +02:00
Martin Kroeker
0a19bd813c Use generic codes for shamax and shcopy 2020-04-18 12:52:51 +02:00
Martin Kroeker
e7afe8a969 Define AXPBY_K fallback for float16 2020-04-18 11:10:15 +02:00
Martin Kroeker
f361de30a3 Use generic axpy.c for SHAXPY as x86 lacks saxpy.c 2020-04-18 11:07:16 +02:00
Martin Kroeker
9f6d6f6cb6 use saxpy.c instead of axpy.S for SHAXPY 2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan
22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Martin Kroeker
236a3d8ce6 Merge pull request #2563 from zelong-1024/develop
[OpenBLAS]: benchmark error of potrf
2020-04-16 11:45:32 +02:00
l00536773
6b7ef6543a [OpenBLAS]: benchmark error of potrf
[description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F.
[solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation.
[dts]:
2020-04-16 10:55:10 +08:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16 Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
Martin Kroeker
250e6f8039 Merge pull request #2557 from martin-frbg/dronebadge
Update and reformat README
2020-04-15 20:23:43 +02:00
Martin Kroeker
7a6d0016b0 Merge pull request #2556 from martin-frbg/epicdrone
Add a drone.io multithread test for x86_64
2020-04-15 20:23:17 +02:00
Martin Kroeker
e8e8a6e608 Restore USE_OPENMP in the x86 thread test 2020-04-15 19:26:12 +02:00
Martin Kroeker
579811fb6a Move all 19.04-based jobs back to ubuntu 18.04 2020-04-15 17:38:33 +02:00
Rajalakshmi Srinivasaraghavan
a87793e03c Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan
ac6a22ae78 Update header 2020-04-14 22:58:39 -05:00
Rajalakshmi Srinivasaraghavan
ff010f496e Build shgemm for all architecture 2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
84a9614345 try x86_64 test without openmp 2020-04-14 19:18:35 +02:00
Martin Kroeker
b969533703 Add drone.io badge, mention EMAG8180 support, reformat the DYNAMIC_ARCH paragraph 2020-04-14 10:53:28 +02:00
Martin Kroeker
0f08f3efa6 Add a multithread test for x86_64 2020-04-13 22:46:12 +02:00
Martin Kroeker
c861b2a7bd Merge pull request #2553 from martin-frbg/issue2444
Add a read memory barrier to the traversal of the buffer slot list
2020-04-13 21:28:59 +02:00
Martin Kroeker
cf62adffbb Merge pull request #2555 from martin-frbg/issue1137
Handle unaligned data in the SSE2 copy kernel
2020-04-13 18:29:56 +02:00
Martin Kroeker
3eec7d382c ARMV7 does not support DMB ISHLD, use DMB ISH 2020-04-13 15:56:31 +02:00
Martin Kroeker
5b0093b5fe Convert aligned moves to unaligned
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
2020-04-13 14:58:52 +02:00
Martin Kroeker
f41600e66f Add a read barrier in the traversing of the buffer list
Needed on systems with weak memory ordering - the inferior, partially working fix from #2544 was already removed in #2551
2020-04-13 12:34:02 +02:00
Martin Kroeker
f5efecb7ca Add (empty) read barrier definition 2020-04-13 12:24:10 +02:00
Martin Kroeker
a52bdd9d7b Add (empty) read barrier definition 2020-04-13 12:22:35 +02:00
Martin Kroeker
db3226a646 Add (empty) read barrier definition 2020-04-13 12:18:48 +02:00
Martin Kroeker
69b6e258d8 Add (empty) read barrier definition 2020-04-13 12:17:41 +02:00
Martin Kroeker
3d4db4d002 Add read barrier definition 2020-04-13 12:16:44 +02:00
Martin Kroeker
99dde1d2c9 Add read barrier definition 2020-04-13 12:14:58 +02:00
Martin Kroeker
ee6b3df02c Add read barrier definition 2020-04-13 12:14:06 +02:00
Martin Kroeker
25e879fe92 Add (empty) read barrier definition 2020-04-13 12:12:54 +02:00
Martin Kroeker
d237dc1360 Add read barrier definition 2020-04-13 12:11:58 +02:00
Martin Kroeker
8692456226 Add read barrier definition 2020-04-13 12:10:37 +02:00
Martin Kroeker
d1d69e1b9a Add read barrier definition 2020-04-13 12:09:24 +02:00
Martin Kroeker
20d0cb2f65 Merge pull request #46 from xianyi/develop
rebase
2020-04-13 12:06:40 +02:00
Martin Kroeker
e7f0da9295 Merge pull request #2551 from martin-frbg/issue2538-2
Increase BUFFER_SIZEs and add a safeguard; supply GEMM_R for POWER8/9
2020-04-12 22:34:41 +02:00
Martin Kroeker
e9bfa2291a Fix parameter overflow 2020-04-12 19:47:02 +02:00
Martin Kroeker
2a28448a96 Add safeguards for sufficient BUFFER_SIZE 2020-04-12 19:45:36 +02:00
Martin Kroeker
a33d177430 Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
As shown in #2538, default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
f73391c9c9 Merge pull request #45 from xianyi/develop
rebase
2020-04-12 19:39:05 +02:00
Martin Kroeker
7905383cb5 Merge pull request #2547 from sharvil/develop
Add API to set thread affinity on Linux.
2020-04-11 00:35:38 +02:00
Martin Kroeker
a8cbd451bf Merge pull request #2541 from bapt/develop
libname: treat FreeBSD and DragonFly like linux and sunos
2020-04-11 00:35:07 +02:00
Martin Kroeker
eecd8c3204 Merge pull request #2548 from gxw-loongson/develop
Add a GENERIC target for 64bit MIPS
2020-04-11 00:34:04 +02:00
Martin Kroeker
ea85eb2e02 Merge pull request #2549 from martin-frbg/fixthreadtest
Match thread count in cpp_thread_test to host capability
2020-04-10 23:54:40 +02:00
Martin Kroeker
66f89c0aaf Match thread count to machine capability 2020-04-10 22:06:44 +02:00
gxw
8d07cf9b67 Fix compilation problem on loongson platform
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Sharvil Nanavati
7b4773b24d Add API to set thread affinity on Linux.
Issue: #2545
2020-04-08 12:49:35 -07:00
Martin Kroeker
69f277f8ee Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues (#2544)
* Add another memory barrier in memory.c to prevent races in memory slot allocation

* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
2020-04-08 11:04:51 +02:00
Martin Kroeker
3a6d51c2fd Merge pull request #44 from xianyi/develop
Add a Z13 build to the Travis configuration (#2542)
2020-04-04 22:48:53 +02:00
Martin Kroeker
1c7771df96 Merge pull request #43 from martin-frbg/revert-42-z12ci
Revert 42 z12ci to keep forked develop clean
2020-04-04 22:46:58 +02:00
Martin Kroeker
a56c9ec52a Revert "Add IBM Z to Travis configuration (#42)"
This reverts commit 7972beb375.
2020-04-04 22:45:01 +02:00
Martin Kroeker
4ae6d1a01b Add a Z13 build to the Travis configuration (#2542)
* Add IBM Z to Travis configuration
2020-04-03 16:02:11 +02:00
Martin Kroeker
7972beb375 Add IBM Z to Travis configuration (#42)
* Add IBM Z to Travis configuration
2020-04-03 15:59:18 +02:00
Baptiste Daroussin
41e802443a libname: treat FreeBSD and DragonFly like linux and sunos
There is no difference in the way libnames are handle between FreeBSD
and linux or sunos. FreeBSD and DragonFly prefers having sonames as well
2020-04-03 06:20:42 +02:00
Martin Kroeker
7bd8624b79 Merge pull request #41 from xianyi/develop
rebase
2020-04-02 10:32:19 +02:00
Martin Kroeker
806f89166e Make ARMV7 compile with xcode and add a CI job for it (#2537)
* Add an ARMV7 iOS build on Travis

* thread_local appears to be unavailable on ARMV7 iOS

* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH

* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
f059e614eb Merge pull request #2536 from martin-frbg/recurs
Add "recursive" option for LAPACK builds with ifort or pgfort as well
2020-04-01 20:00:13 +02:00
Martin Kroeker
e13b6773ee ifort and pgfort need "recursive" for safe compilation of LAPACK as well 2020-04-01 15:39:16 +02:00
Martin Kroeker
a05243d0f2 ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker
c6af9bbb32 Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
2020-03-31 20:53:13 +02:00
Martin Kroeker
144be81ca1 fix initialization to zero in the NEON SGEMM_BETA kernel as well 2020-03-31 16:53:56 +02:00
Martin Kroeker
07cdd5d05c Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
2020-03-31 00:21:02 +02:00
Martin Kroeker
567d2760e6 Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
Martin Kroeker
018bb3e433 Merge pull request #2533 from martin-frbg/gemmdirect2
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
2020-03-30 20:15:37 +02:00
Martin Kroeker
79fd006c58 Expose the support_avx512 function provided in dynamic.c 2020-03-26 21:25:39 +01:00
Martin Kroeker
8229c163b7 Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 2020-03-26 21:12:56 +01:00
Martin Kroeker
a986d42ea6 Merge pull request #39 from xianyi/develop
rebase
2020-03-26 21:06:51 +01:00
Martin Kroeker
b6a948fbee Merge pull request #2530 from martin-frbg/dynmsg
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
2020-03-24 15:44:46 +01:00
Martin Kroeker
0cc352417e Merge pull request #2529 from shengyang-3390/dev1
add ctest for drotm and modified ctest for drot.
2020-03-24 15:44:27 +01:00
Martin Kroeker
fe47dc8673 Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
related to #2526
2020-03-23 19:35:51 +01:00
Martin Kroeker
9f67d03d3b Merge pull request #2527 from martin-frbg/gemmdirect
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-23 12:47:19 +01:00
shengyang
50f4fb2fbd add ctest for drotm and modified ctest for drot.
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-23 10:47:05 +08:00
Martin Kroeker
6a14b34c20 Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 2020-03-22 14:33:16 +01:00
Martin Kroeker
8c7c1395da Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker
5f6f6a2c7d Merge pull request #2525 from andreas-schwab/develop
Fix ARCHCONFIG for Neoverse-N1
2020-03-21 18:47:48 +01:00
Andreas Schwab
71cf2acdef Fix ARCHCONFIG for Neoverse-N1
../config_kernel.h:24:9: warning: missing whitespace after the macro name
   24 | #define ARMV8-march armv8.2-a
      |         ^~~~~
2020-03-21 17:35:42 +01:00
Martin Kroeker
1d9773b800 Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker
a46a8c4956 Merge pull request #2518 from shengyang-3390/dev
add ctest for srotm and modified ctest for srot.
2020-03-20 23:00:06 +01:00
Martin Kroeker
7ae737e04c Merge pull request #2519 from martin-frbg/issue2472
Fix cmake compilation with ifort on Windows
2020-03-20 22:57:44 +01:00
wjc404
64daad4365 Update param.h 2020-03-20 21:46:18 +00:00
wjc404
b8307768e2 Add files via upload 2020-03-21 05:42:10 +08:00
Martin Kroeker
6d54c94760 Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
2020-03-20 01:08:10 +01:00
Martin Kroeker
c0da205412 Merge pull request #38 from xianyi/develop
rebase
2020-03-20 01:05:22 +01:00
shengyang
a06d78556d add ctest for srotm and modified ctest for srot.
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-18 14:17:32 +08:00
Martin Kroeker
af8a619e1f Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
2020-03-17 10:12:53 +01:00
wjc404
62b9608986 Update KERNEL.SKYLAKEX 2020-03-17 12:52:55 +08:00
Martin Kroeker
717c604aeb Merge pull request #2515 from zelong-1024/develop
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
2020-03-16 21:59:55 +01:00
Martin Kroeker
ce33da4cab Merge pull request #2513 from aaawuanjun/develop
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
2020-03-16 21:58:55 +01:00
Martin Kroeker
a1b181cea2 Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
2020-03-16 21:58:34 +01:00
wjc404
cdc0e9011e Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
wjc404
fa049d49c2 AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
l00536773
d45c53ecf1 [OpenBLAS]: benchmark for her/her2 LEVEL2 functions
[description]: benchmark for her/her2
[solution]: added benchmark for her/her2, modified makefile in benchmark
[dts]:
2020-03-16 11:19:05 +08:00
Martin Kroeker
c2840997db Merge pull request #2508 from liujingjue/develop
[OpenBLAS]:fix the iamax benchmark error
2020-03-14 14:21:30 +01:00
Martin Kroeker
c775458299 Merge pull request #2512 from martin-frbg/lapackh
Move declarations of lapack_complex_custom types outside the extern C
2020-03-14 13:27:40 +01:00
Martin Kroeker
c0649aa694 Merge pull request #2506 from xiaofengF/develop
Add benchmark for SPMV and fix segmentation fault when data size >= 50000
2020-03-14 13:08:36 +01:00
wuanjun 00447568
2428dc9fd3 [OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
[Description]: Solve lack of tpsv benchmark.
2020-03-14 09:11:08 +08:00
Martin Kroeker
0fe0914437 Merge pull request #2505 from aaawuanjun/develop
[OpenBlas]:Add benchmark tpmv.c and modify benchmark/Makefile
2020-03-13 23:16:35 +01:00
Martin Kroeker
4cfd8a3f57 Merge pull request #2511 from martin-frbg/fixppctest
Prevent attempts to run ctest or test when fortran is not available
2020-03-13 23:04:01 +01:00
Martin Kroeker
ee2e758278 Move declarations of lapack_complex_custom types outside the extern C
fixes #2510
2020-03-13 20:34:13 +01:00
Martin Kroeker
2d8781b0dc Do not attempt to run test without fortran 2020-03-13 20:11:19 +01:00
Martin Kroeker
c436e8af7b Do not attempt to run ctest without fortran
The main Makefile takes care of this in the build process, but users or CI jobs may try to run this directly
2020-03-13 20:10:26 +01:00
l00546269
a0a3bf7c81 [OpenBLAS]:fix the iamax benchmark error
[Description]:the result for i?amax is not MFlops, it is MBytes
2020-03-13 10:58:39 +08:00
jayfely@qq.com
ae3f2c2e49 Remove cspmv and zspmv to remove the error occured in travis CI 2020-03-11 17:02:34 +08:00
jayfely@qq.com
83ecf9fea7 Modify Makefile in interface to remove the error occured in travis CI 2020-03-11 16:36:45 +08:00
jayfely@qq.com
649733ff15 Only keep spmv.goto and spmv.atlas 2020-03-11 15:48:58 +08:00
wuanjun 00447568
3e8f1c6cc5 [OpenBlas]:Add benchmark tpmv.c and modify Makefile
[Description]:Solve the problem of missing tpmv.c benchmark file
2020-03-11 12:31:48 +08:00
jayfely@qq.com
2f4c5bb3a9 Update spmv.c: solve segmentation fault when m and n are larger than 50000 2020-03-11 10:30:09 +08:00
Martin Kroeker
4e1c4e67d4 Merge pull request #2503 from martin-frbg/xerbl
Apply fix for LAPACK issue 394 (fixed-form code beyond column 72)
2020-03-10 23:38:07 +01:00
Martin Kroeker
b9a2a3c540 Merge pull request #2502 from martin-frbg/issue2497
Fix INTERFACE64 not propagating to the fortran codes on ARMV8
2020-03-10 20:01:23 +01:00
Martin Kroeker
047dfb216d Merge pull request #2501 from jijiwawa/Fix_mistakes
Fix  pr #2487 error
2020-03-10 16:44:40 +01:00
s00527847
cd8871f1a1 Use the correct unit of measure 2020-03-10 19:26:06 -04:00
Martin Kroeker
b25ae1fc60 Apply fix for Reference-LAPACK issue 394
reference to XERBLA extending beyond column 72, breaking builds with compilers that default to traditional punch card format
2020-03-10 13:37:41 +01:00
Martin Kroeker
3f7f7ab7e2 Restore INTERFACE64 for arm64 2020-03-10 12:51:07 +01:00
Martin Kroeker
9c22170f52 Merge pull request #37 from xianyi/develop
rebase
2020-03-10 12:49:21 +01:00
jayfely@qq.com
08e1d8cbae Modify Makefile in Benchmark 2020-03-10 14:32:18 +08:00
jayfely@qq.com
ff40a4e726 Add benchmark for SPMV 2020-03-10 14:22:18 +08:00
Zhang Xianyi
51019feae1 Merge pull request #2498 from njutcz/develop
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
2020-03-09 16:04:33 +08:00
s00548429
bec7923a0d Fix the functional bugs for zamax. 2020-03-09 15:36:50 +08:00
s00548429
c5bdd21352 Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min. 2020-03-09 14:59:03 +08:00
njutcz
d2d16d091e Merge pull request #1 from xianyi/develop
update
2020-03-09 10:39:40 +08:00
Martin Kroeker
b6a6ccbbea Merge pull request #2495 from ZuoQ3/develop
add benchmark for axpby test
2020-03-08 08:09:58 +01:00
Martin Kroeker
8b720f7365 Merge pull request #2494 from shengyang-3390/develop
add benchmark for csrot and zdrot
2020-03-07 23:04:21 +01:00
Martin Kroeker
14df234edb Merge pull request #2489 from jijiwawa/brightness
Remove redundant code
2020-03-07 22:26:00 +01:00
s00527847
bbeda55b7b add trmm.c 2020-03-07 13:09:19 -05:00
s00527847
efcf89aec7 Remove redundant code 2020-03-07 12:03:05 -05:00
Martin Kroeker
37d456f7e0 Merge pull request #2493 from martin-frbg/plainmake
Fix use of make vs $(MAKE) in building lapack-testing
2020-03-07 16:55:53 +01:00
Martin Kroeker
0b9e96922b Merge pull request #2488 from liujingjue/develop
Modify the main Makefile in OpenBLAS
2020-03-07 16:52:29 +01:00
zq
0c8162eba6 Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby 2020-03-07 17:48:55 +08:00
zq
9a94a30132 Merge pull request #1 from xianyi/develop
update
2020-03-07 17:04:59 +08:00
shengyang
09c7a191bd add benchmark for csrot and zdrot
modified:   benchmark/Makefile
	modified:   benchmark/rot.c
2020-03-07 15:17:49 +08:00
l00546269
8a8df530e2 [OpenBLAS]:modifed the Makefile
[Description]: check the compiler version and show the detail info
2020-03-07 10:14:33 +08:00
Martin Kroeker
37f46f2fa0 Fix another spot where make was used instead of $(MAKE)
Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax
2020-03-06 15:37:26 +01:00
Martin Kroeker
9afc561be4 Merge pull request #36 from xianyi/develop
rebase
2020-03-06 15:32:27 +01:00
Martin Kroeker
dca3e0cf20 Merge pull request #2491 from chenxuqiang/hbmv_benchmark
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
2020-03-06 15:06:42 +01:00
Martin Kroeker
c9f8db979b Merge pull request #2490 from shengyang-3390/develop
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
2020-03-06 15:05:55 +01:00
Martin Kroeker
18099de976 Merge pull request #2487 from jijiwawa/develop
add benchmark for spr/spr2
2020-03-06 14:42:25 +01:00
Martin Kroeker
97c36ca58c Merge branch 'develop' into develop 2020-03-06 14:41:40 +01:00
Martin Kroeker
9f5a74f3c7 Merge pull request #2486 from qqqil/develop
add benchmark for trsv
2020-03-06 14:30:09 +01:00
Martin Kroeker
2afb10975d Merge pull request #2485 from Darkness303/develop
Add syr2 benchmark
2020-03-06 14:29:27 +01:00
Martin Kroeker
dbef479227 Merge pull request #2469 from AGSaidi/acq-rel-2
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
2020-03-06 14:28:58 +01:00
Ali Saidi
208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
chenxuqiang
32c847df45 benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
2020-03-06 01:02:02 -05:00
shengyang
e0df9485d4 Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
modified:   benchmark/Makefile
	new file:   benchmark/rotm.c
2020-03-05 10:05:59 +08:00
s00527847
0f1a2b12f9 add benchmark for spr/spr2 2020-03-04 15:50:19 -05:00
q00437336
233838b4bc change clock to CLOCK_PROCESS_CPUTIME_ID 2020-03-04 03:54:40 -05:00
l00546269
13f9afbd99 [OpenBLAS]:modifed the Makefile
[Description]:add c/fortran compiler version information in final note
2020-03-04 16:47:23 +08:00
q00437336
de74e11641 add benchmark for trsv 2020-03-04 03:23:22 -05:00
Martin Kroeker
ad9e53154d Merge pull request #2484 from RajalakshmiSR/power-dynamic
Fix DYNAMIC_ARCH build for POWER9
2020-03-04 08:06:06 +01:00
Martin Kroeker
6e70621b0d Merge pull request #2483 from aaawuanjun/develop
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-04 07:59:56 +01:00
Martin Kroeker
e6edb7431f Merge pull request #2466 from AGSaidi/acq-rel-1
Switch blas_server to use acq/rel semantics
2020-03-04 07:59:31 +01:00
Darkness303
114dbec947 1.Add syr2 benchmark
2.Fixed some errors
2020-03-04 14:09:10 +08:00
Martin Kroeker
d68e4ba59b Fix cut/paste glitch 2020-03-03 21:37:48 +01:00
Martin Kroeker
635c9e4e09 Restore initializers for mutex and conditional 2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan
2afc074803 Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks.  This patch fixes some of the macros that are used
to check compiler version.  On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
wuanjun 00447568
5d6c688a7e Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 19:03:57 +08:00
wuanjun 00447568
87baf9cfe6 Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 19:03:28 +08:00
wuanjun 00447568
c0ca7d6258 Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 17:39:26 +08:00
wuanjun 00447568
f682d19ed4 [OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv 2020-03-03 17:37:33 +08:00
wuanjun 00447568
790d50fbba [OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv 2020-03-03 17:13:49 +08:00
Martin Kroeker
59243d49ab Merge pull request #2479 from Darkness303/develop
Fix potential index overflows at large matrix sizes in the benchmark codes
2020-03-03 08:46:49 +01:00
Martin Kroeker
d41f83e128 Merge pull request #2436 from marxin/improve-utest-coverage
Improve test coverage for utests.
2020-03-03 08:43:00 +01:00
Martin Kroeker
ee4ca7ca6b Merge pull request #2481 from ChinouneMehdi/fix2480
Fix #2480
2020-03-02 21:21:29 +01:00
Martin Kroeker
e326c89ae8 Merge pull request #2478 from MacChen02/develop
Update benchmark statistical time function
2020-03-02 21:20:51 +01:00
مهدي شينون (Mehdi Chinoune)
21f6c4b5a9 fixes #2480 2020-03-02 17:22:28 +01:00
Martin Liska
7ca4ffdbdd Improve test coverage for utests. 2020-03-02 13:38:17 +01:00
jianghesong
0f65c05cd1 fix core dumped error 2020-03-02 19:13:45 +08:00
MacChen02
917d243580 Update benchmark statistical time function
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
2020-03-02 14:36:27 +08:00
Ali Saidi
43c2e845ab Switch blas_server to use acq/rel semantics
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.

We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.

The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
2020-03-02 02:52:49 +00:00
Martin Kroeker
33f76a6c37 Version 0.3.9 2020-03-02 00:10:20 +01:00
Martin Kroeker
960dec234f Version 0.3.9 2020-03-02 00:09:49 +01:00
Martin Kroeker
6b92979f35 Merge pull request #2476 from xianyi/develop
Update from develop in preparation for 0.3.9
2020-03-02 00:08:32 +01:00
Martin Kroeker
014fc13995 Merge pull request #2475 from martin-frbg/039changes
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker
f1e05676a0 Merge pull request #2474 from martin-frbg/p9be
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker
d221c50f27 Add Ampere EMAG8180 2020-03-02 00:02:36 +01:00
Martin Kroeker
f14013da7f Update with 0.3.9 changes 2020-03-02 00:01:22 +01:00
Martin Kroeker
4f371b0fbf Use POWER8 kernels on big-endian POWER9 for now 2020-03-01 23:45:58 +01:00
Martin Kroeker
02d60c1563 Merge pull request #35 from xianyi/develop
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker
2e6963259b Merge pull request #2471 from AGSaidi/l3-fix-2
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker
e94590e400 Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker
69d4687142 Merge pull request #2464 from Darkness303/develop
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker
a731a9bbb9 Merge pull request #2467 from AGSaidi/rpcc
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker
1aa5907a2c Merge pull request #2463 from martin-frbg/mingwfix
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker
ea8eec5d17 Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
97ce6bbce2 Fix barriers in level3_thread 2020-02-29 17:45:17 +00:00
Martin Kroeker
a9aeb6745c Merge pull request #2465 from AGSaidi/neoverse-n1
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi
0af9991cc9 Use wait-for-event to not spin in the blas_lock 2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi
c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
j00520245
e1062400c4 New add syr benchmark 2020-02-28 16:36:53 +08:00
Martin Kroeker
a66f4d80c8 Apply MinGW AVX512 compilation fix to fortran options as well
original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
2020-02-27 23:09:40 +01:00
wjc404
dd22eb7621 Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
wjc404
2352331e60 Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
Martin Kroeker
430ee31e66 Merge pull request #2447 from martin-frbg/issue2446
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
2020-02-27 15:07:02 +01:00
Martin Kroeker
8164fd1328 Always assume server-class cpu count for TSV110 and EMAG8180 2020-02-26 22:19:57 +01:00
Martin Kroeker
531c6b96d6 Merge pull request #34 from xianyi/develop
rebase
2020-02-26 22:16:28 +01:00
wjc404
1b980001dd Update zgemm_kernel_4x2_haswell.c 2020-02-26 18:38:12 +08:00
wjc404
2515e1152f Update cgemm_kernel_8x2_haswell.c 2020-02-26 18:36:54 +08:00
Martin Kroeker
ddcbed6690 Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
2020-02-25 18:42:52 +01:00
Martin Kroeker
f8ec538c82 Add Ampere EMAG8180 2020-02-25 14:30:00 +01:00
Martin Kroeker
ca4f7dceff Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake 2020-02-24 20:23:18 +01:00
Martin Kroeker
1ddf9f1067 Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake 2020-02-24 20:16:18 +01:00
Martin Kroeker
4c5fac5a2b Typo fix 2020-02-24 20:15:04 +01:00
Martin Kroeker
320e2648cd Add EMAG8180 to DYNAMIC_CORE list for ARM64 2020-02-24 19:23:46 +01:00
Martin Kroeker
9b732696c6 Add DYNAMIC_ARCH support for ARMV8 EMAG8180 2020-02-24 19:20:00 +01:00
Martin Kroeker
c9dcb3d4a4 Merge pull request #2443 from aaawuanjun/develop
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 13:14:51 +01:00
Martin Kroeker
3bb7f0138e Merge pull request #2442 from martin-frbg/lapackpr390
Apply fix from Reference-LAPACK PR 390
2020-02-24 12:27:01 +01:00
wuanjun 00447568
c93ae92579 [OpenBlas]:benchmark/copy.c has time,x,y data loop problems 2020-02-24 11:23:39 +08:00
Martin Kroeker
87ac1ceb0b Apply fix from Reference-LAPACK PR390, NaN not propagating 2020-02-23 22:40:40 +01:00
Martin Kroeker
9e40c080f2 Apply fix from Reference-LAPACK PR390, NaN not propagating 2020-02-23 22:39:01 +01:00
wjc404
903854c168 Add files via upload 2020-02-22 23:40:02 +08:00
wjc404
a2ff577a30 Update KERNEL.ZEN 2020-02-22 23:39:43 +08:00
wjc404
97a32cb0a5 Update KERNEL.HASWELL 2020-02-22 23:39:20 +08:00
wjc404
f1746e7284 Delete sgemm_kernel_8x4_haswell_2.c 2020-02-22 23:38:48 +08:00
wjc404
f6fcbd7906 Fix performance bug when LDC is a multiple of 1024 2020-02-22 23:37:45 +08:00
Martin Kroeker
1e8410f18c Merge pull request #2441 from martin-frbg/ismin2
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
2020-02-22 11:21:03 +01:00
Martin Kroeker
07454bf4d5 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:58:15 +01:00
Martin Kroeker
4046985913 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:55:52 +01:00
Martin Kroeker
75577f95a7 Merge pull request #33 from xianyi/develop
rebase
2020-02-21 09:56:05 +01:00
Martin Kroeker
33d92c7a37 Merge pull request #2435 from martin-frbg/issue2433
Fix handling of ppc endianness
2020-02-21 00:01:58 +01:00
Martin Kroeker
e57b11acca Add preliminary support for EMAG8180 2020-02-19 19:00:28 +01:00
Martin Kroeker
71e5669c3e Add preliminary support for EMAG8180 ARMV8 processor 2020-02-19 18:57:26 +01:00
Martin Kroeker
e8d82c01d4 Recognize Ampere EMAG8180 2020-02-19 18:49:13 +01:00
Martin Kroeker
0b39cf95b0 Fix endianness conditionals 2020-02-19 18:09:54 +01:00
Martin Kroeker
76b2cec6ce Get endianness into Makefile variable 2020-02-19 18:08:20 +01:00
Martin Kroeker
276c1791ea Merge pull request #32 from xianyi/develop
rebase
2020-02-19 18:06:39 +01:00
Martin Kroeker
c5bbfd8fee Merge pull request #2432 from isuruf/install_name
Fix install name on osx again
2020-02-19 08:14:28 +01:00
Isuru Fernando
130c1741e5 Fix install name on osx again 2020-02-18 10:22:49 -08:00
Martin Kroeker
8f782f0673 Merge pull request #2426 from zbeekman/nightly-homebrew-check
Nightly homebrew check
2020-02-18 12:09:15 +01:00
Martin Kroeker
6a517dcb6a Merge pull request #2427 from martin-frbg/powermin
Fix ISMIN and ISMAX kernel choices for POWER8
2020-02-18 08:15:02 +01:00
Martin Kroeker
9f39f0a2c3 Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test -  Makefile.L1 defaults look wrong
2020-02-17 19:55:39 +01:00
Izaak Beekman
1a88c4ab26 Fix bottle upload problem & typo 2020-02-17 13:36:17 -05:00
Izaak Beekman
0b44802164 Test push & PRs only when workflow file changes
Also, add comments to clarify what the test is testing
2020-02-17 13:22:09 -05:00
Izaak Beekman
2c242b4cef Add Github Action to build development branch nightly with Homebrew 2020-02-17 12:36:37 -05:00
Martin Kroeker
0bfb7336d2 Merge pull request #2424 from isuruf/osx
Fix building on osx
2020-02-17 17:00:08 +01:00
Martin Kroeker
403cde104e Merge pull request #30 from xianyi/develop
rebase
2020-02-17 14:53:46 +01:00
Martin Kroeker
634f2bddda Merge pull request #2414 from marxin/fix-iamax_sse-implementation
Fix iamax sse implementation and add utests
2020-02-17 14:50:18 +01:00
Martin Liska
aeea14ee40 Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S. 2020-02-17 09:01:53 +01:00
Martin Liska
18bcc36a69 Fix implementation of iamax_sse.S as reported in #2116.
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
2020-02-17 09:01:53 +01:00
Martin Liska
0e7f43c898 Add missing USE_MIN in kernel/CMakeLists.txt. 2020-02-17 09:01:53 +01:00
Martin Kroeker
79e201fbba Merge pull request #2423 from xianyi/issue2419
Restore -march flag for Android builds
2020-02-17 07:24:02 +01:00
Isuru Fernando
4326dcb460 Pass CFLAGS from env to Makefile.prebuild and remove iOS hack 2020-02-16 15:13:01 -06:00
Martin Kroeker
e32f3b1447 Restore -march flag for Android builds
fixes #2419 - renewed discussion in #2112 suggests removal of the option was primarily aimed at non-Android builds
2020-02-16 17:32:13 +01:00
Martin Kroeker
d483e9270a Update KERNEL.POWER8 2020-02-16 17:29:35 +01:00
Martin Kroeker
01834aee33 Merge pull request #29 from xianyi/develop
rebase
2020-02-16 17:28:10 +01:00
wjc404
b0558c11b9 Update param.h 2020-02-16 23:01:31 +08:00
wjc404
f566787e6e Update KERNEL.SKYLAKEX 2020-02-16 22:58:44 +08:00
wjc404
e3368cbf18 AVX512 STRMM kernel 2020-02-16 22:58:00 +08:00
Martin Kroeker
d92bd5be24 Update KERNEL.POWER8 2020-02-15 23:07:50 +01:00
Martin Kroeker
46e4b12946 Update KERNEL.POWER8 2020-02-15 23:06:51 +01:00
Martin Kroeker
5e94aa4877 Merge pull request #2417 from marxin/make-ctest-verbose-for-drone
Make ctest verbose for drone
2020-02-15 21:57:41 +01:00
Martin Kroeker
93f3e27574 Merge pull request #2415 from marxin/add-cmake-to-gitignore
Add CMake related files to .gitignore.
2020-02-15 21:57:03 +01:00
Martin Kroeker
785c389b0e Merge pull request #2420 from martin-frbg/issue2396
Correct generation of GETRF files by the CMAKE build
2020-02-15 21:56:16 +01:00
Martin Kroeker
c222b25b81 Correct generation of GETRF files by the CMAKE build
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
221da8bf05 Merge pull request #2411 from martin-frbg/fix2254-038
Fix pre-processed POWER8 codes and wrong conditionals in the POWER8,PPC440 and PPC970 KERNEL files
2020-02-14 23:07:43 +01:00
Martin Liska
eb285b4d20 Make ctest verbose for drone builder. 2020-02-14 10:46:37 +01:00
Martin Kroeker
cafdd999b8 Update caxpy_power8.S 2020-02-13 22:44:09 +01:00
Martin Kroeker
92ca92a46c Update caxpy_power8.S 2020-02-13 21:24:54 +01:00
Martin Kroeker
486c35c5dc Update icamin_power8.S 2020-02-13 18:38:43 +01:00
Martin Liska
0e05ea9bac Add CMake related files to .gitignore. 2020-02-13 14:51:55 +01:00
Martin Kroeker
5ba3699f41 Update isamin_power8.S 2020-02-13 00:00:32 +01:00
Martin Kroeker
8eefa530cd Update isamax_power8.S 2020-02-12 23:59:50 +01:00
Martin Kroeker
de40d47edf Update isamin_power8.S 2020-02-12 23:57:48 +01:00
Martin Kroeker
7c162b8a21 Update isamax_power8.S 2020-02-12 23:56:57 +01:00
Martin Kroeker
0544cbc806 Fix syntax of endianness conditional 2020-02-12 20:00:29 +01:00
Martin Kroeker
120d20731f Fix syntax of endianness conditional 2020-02-12 19:58:42 +01:00
Martin Kroeker
dc345d84df Fix syntax of endianness conditional and add gcc version check for workaround 2020-02-12 19:56:52 +01:00
Martin Kroeker
616921fd91 Merge pull request #27 from xianyi/develop
rebase
2020-02-12 19:16:14 +01:00
Martin Kroeker
8a9e9a82a1 Merge pull request #2410 from bartoldeman/fix-dscal-inline-asm
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
2020-02-12 15:38:37 +01:00
Bart Oldeman
7ea5e07d1c Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
2020-02-12 14:11:44 +00:00
Martin Kroeker
cb6ef49857 Merge pull request #2407 from susilehtola/patch-2
Patch out instances of Z15 in dynamic_zarch.c
2020-02-11 13:04:44 +01:00
Martin Kroeker
63994e1cdb Merge pull request #2405 from susilehtola/patch-1
Fix typo in dynamic_zarch.c
2020-02-11 13:03:35 +01:00
Martin Kroeker
496e3019bc Merge pull request #2404 from martin-frbg/issue2395
Fix spurious application of USE_TRMM in cmake builds
2020-02-11 13:00:36 +01:00
Martin Kroeker
169be3f097 Merge pull request #2403 from martin-frbg/issue2400
Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont
2020-02-11 13:00:16 +01:00
Martin Kroeker
6ccbb089c2 Merge pull request #2402 from gxw-loongson/develop
Avoid printing the following information on mips and mips64 when check msa
2020-02-11 12:59:53 +01:00
Martin Kroeker
59ebe3636a Merge pull request #2399 from martin-frbg/buffersize
Make BUFFER_SIZE configurable at build time
2020-02-11 12:56:56 +01:00
Susi Lehtola
5a6bba3061 Patch out instances of Z15 in dynamic_zarch.c
There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.
2020-02-11 15:07:33 +13:00
Susi Lehtola
dff173e50e Fix typo in dynamic_zarch.c 2020-02-11 14:46:30 +13:00
Martin Kroeker
7e5cbb6f35 Fix bad conditional syntax that caused spurious application of USE_TRMM 2020-02-10 21:17:39 +01:00
Martin Kroeker
303bdb673b Fix coretype detection for Intel extended models 6 and 7
affecting Goldmont, Cannon Lake, Ice Lake autodetection
2020-02-10 19:17:32 +01:00
gxw
754433f420 Avoid printing the following information on mips and mips64 when check msa:
"unrecognized command line option ‘-mmsa’"
2020-02-10 19:11:45 +08:00
Martin Kroeker
7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker
c353d8b106 Make BUFFER_SIZE configurable 2020-02-09 23:30:22 +01:00
Martin Kroeker
579be3aa9d Add configuration option for BUFFER_SIZE 2020-02-09 23:28:04 +01:00
Martin Kroeker
449e8ea443 Merge pull request #26 from xianyi/develop
rebase
2020-02-09 23:23:55 +01:00
Martin Kroeker
3bec250cf9 Increment version to 0.3.9.dev 2020-02-09 23:18:44 +01:00
Martin Kroeker
f03dd23e90 Increment version to 0.3.9.dev 2020-02-09 23:18:07 +01:00
375 changed files with 41250 additions and 5373 deletions

View File

@@ -8,7 +8,7 @@ platform:
steps:
- name: Build and Test
image: ubuntu:19.04
image: ubuntu:18.04
environment:
CC: gcc
COMMON_FLAGS: 'DYNAMIC_ARCH=1 TARGET=ARMV8 NUM_THREADS=32'
@@ -32,7 +32,7 @@ platform:
steps:
- name: Build and Test
image: ubuntu:19.04
image: ubuntu:18.04
environment:
CC: gcc
COMMON_FLAGS: 'DYNAMIC_ARCH=1 TARGET=ARMV6 NUM_THREADS=32'
@@ -92,7 +92,7 @@ steps:
- mkdir build && cd build
- cmake $CMAKE_FLAGS ..
- make -j
- ctest
- ctest -V
---
kind: pipeline
@@ -116,7 +116,7 @@ steps:
- mkdir build && cd build
- cmake $CMAKE_FLAGS ..
- make -j
- ctest
- ctest -V
---
kind: pipeline
@@ -140,4 +140,53 @@ steps:
- mkdir build && cd build
- cmake $CMAKE_FLAGS ..
- make -j
- ctest
- ctest -V
---
kind: pipeline
name: arm64_native_test
platform:
os: linux
arch: arm64
steps:
- name: Build and Test
image: ubuntu:18.04
environment:
CC: gcc
COMMON_FLAGS: 'USE_OPENMP=1'
commands:
- echo "MAKE_FLAGS:= $COMMON_FLAGS"
- apt-get update -y
- apt-get install -y make $CC gfortran perl python g++
- $CC --version
- make QUIET_MAKE=1 $COMMON_FLAGS
- make -C test $COMMON_FLAGS
- make -C ctest $COMMON_FLAGS
- make -C utest $COMMON_FLAGS
- make -C cpp_thread_test dgemm_tester
---
kind: pipeline
name: epyc_native_test
platform:
os: linux
arch: amd64
steps:
- name: Build and Test
image: ubuntu:18.04
environment:
CC: gcc
COMMON_FLAGS: 'USE_OPENMP=1'
commands:
- echo "MAKE_FLAGS:= $COMMON_FLAGS"
- apt-get update -y
- apt-get install -y make $CC gfortran perl python g++
- $CC --version
- make QUIET_MAKE=1 $COMMON_FLAGS
- make -C test $COMMON_FLAGS
- make -C ctest $COMMON_FLAGS
- make -C utest $COMMON_FLAGS
- make -C cpp_thread_test dgemm_tester

103
.github/workflows/dynamic_arch.yml vendored Normal file
View File

@@ -0,0 +1,103 @@
name: continuous build
on: [push, pull_request]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
fortran: [gfortran, flang]
build: [cmake, make]
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Compilation cache
uses: actions/cache@v2
with:
path: ~/.ccache
# We include the commit sha in the cache key, as new cache entries are
# only created if there is no existing entry for the key yet.
key: ${{ runner.os }}-ccache-${{ github.sha }}
# Restore any ccache cache entry, if none for
# ${{ runner.os }}-ccache-${{ github.sha }} exists
restore-keys: |
${{ runner.os }}-ccache-
- name: Print system information
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
cat /proc/cpuinfo
elif [ "$RUNNER_OS" == "macOS" ]; then
sysctl -a | grep machdep.cpu
else
echo "$RUNNER_OS not supported"
exit 1
fi
- name: Install Dependencies
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
sudo apt-get install -y gfortran cmake ccache
elif [ "$RUNNER_OS" == "macOS" ]; then
brew install coreutils cmake ccache
else
echo "$RUNNER_OS not supported"
exit 1
fi
ccache -M 300M # Limit the ccache size; Github's overall cache limit is 5GB
- name: gfortran build
if: matrix.build == 'make' && matrix.fortran == 'gfortran'
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
export PATH="/usr/lib/ccache:${PATH}"
elif [ "$RUNNER_OS" == "macOS" ]; then
export PATH="$(brew --prefix)/opt/ccache/libexec:${PATH}"
else
echo "$RUNNER_OS not supported"
exit 1
fi
make -j$(nproc) DYNAMIC_ARCH=1 USE_OPENMP=0
- name: flang build
if: matrix.build == 'make' && matrix.fortran == 'flang'
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
export PATH="/usr/lib/ccache:${PATH}"
elif [ "$RUNNER_OS" == "macOS" ]; then
exit 0
else
echo "$RUNNER_OS not supported"
exit 1
fi
cd /usr/
sudo wget -nv https://github.com/flang-compiler/flang/releases/download/flang_20190329/flang-20190329-x86-70.tgz
sudo tar xf flang-20190329-x86-70.tgz
sudo rm flang-20190329-x86-70.tgz
cd -
make -j$(nproc) DYNAMIC_ARCH=1 USE_OPENMP=0 FC=flang
- name: CMake gfortran build
if: matrix.build == 'cmake' && matrix.fortran == 'gfortran'
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
export PATH="/usr/lib/ccache:${PATH}"
elif [ "$RUNNER_OS" == "macOS" ]; then
export PATH="$(brew --prefix)/opt/ccache/libexec:${PATH}"
else
echo "$RUNNER_OS not supported"
exit 1
fi
mkdir build
cd build
cmake -DDYNAMIC_ARCH=1 -DNOFORTRAN=0 -DBUILD_WITHOUT_LAPACK=0 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)

View File

@@ -0,0 +1,79 @@
# Only the "head" branch of the OpenBLAS package is tested
on:
push:
paths:
- '**/nightly-Homebrew-build.yml'
pull_request:
branches:
- develop
paths:
- '**/nightly-Homebrew-build.yml'
schedule:
- cron: 45 7 * * *
# This is 7:45 AM UTC daily, late at night in the USA
# Since push and pull_request will still always be building and testing the `develop` branch,
# it only makes sense to test if this file has been changed
name: Nightly-Homebrew-Build
jobs:
build-OpenBLAS-with-Homebrew:
runs-on: macos-latest
env:
DEVELOPER_DIR: /Applications/Xcode_11.4.1.app/Contents/Developer
HOMEBREW_DEVELOPER: "ON"
HOMEBREW_DISPLAY_INSTALL_TIMES: "ON"
HOMEBREW_NO_ANALYTICS: "ON"
HOMEBREW_NO_AUTO_UPDATE: "ON"
HOMEBREW_NO_BOTTLE_SOURCE_FALLBACK: "ON"
HOMEBREW_NO_INSTALL_CLEANUP: "ON"
steps:
- name: Random delay for cron job
run: |
delay=$(( RANDOM % 600 ))
printf 'Delaying for %s seconds on event %s' ${delay} "${{ github.event_name }}"
sleep ${delay}
if: github.event_name == 'schedule'
- uses: actions/checkout@v2
# This isn't even needed, technically. Homebrew will get `develop` via git
- name: Update Homebrew
if: github.event_name != 'pull_request'
run: brew update || true
- name: Install prerequisites
run: brew install --fetch-HEAD --HEAD --only-dependencies --keep-tmp openblas
- name: Install and bottle OpenBLAS
run: brew install --fetch-HEAD --HEAD --build-bottle --keep-tmp openblas
# the HEAD flags tell Homebrew to build the develop branch fetch via git
- name: Create bottle
run: |
brew bottle -v openblas
mkdir bottles
mv *.bottle.tar.gz bottles
- name: Upload bottle
uses: actions/upload-artifact@v1
with:
name: openblas--HEAD.catalina.bottle.tar.gz
path: bottles
- name: Show linkage
run: brew linkage -v openblas
- name: Test openblas
run: brew test --HEAD --verbose openblas
- name: Audit openblas formula
run: |
brew audit --strict openblas
brew cat openblas
- name: Post logs on failure
if: failure()
run: brew gist-logs --with-hostname -v openblas

5
.gitignore vendored
View File

@@ -70,6 +70,7 @@ test/SBLAT2.SUMM
test/SBLAT3.SUMM
test/ZBLAT2.SUMM
test/ZBLAT3.SUMM
test/SHBLAT3.SUMM
test/cblat1
test/cblat2
test/cblat3
@@ -79,6 +80,7 @@ test/dblat3
test/sblat1
test/sblat2
test/sblat3
test/test_shgemm
test/zblat1
test/zblat2
test/zblat3
@@ -87,4 +89,5 @@ build.*
*.swp
benchmark/*.goto
benchmark/smallscaling
CMakeCache.txt
CMakeFiles/*

View File

@@ -16,7 +16,6 @@ matrix:
before_script: &common-before
- COMMON_FLAGS="DYNAMIC_ARCH=1 TARGET=NEHALEM NUM_THREADS=32"
script:
- set -e
- make QUIET_MAKE=1 $COMMON_FLAGS $BTYPE
- make -C test $COMMON_FLAGS $BTYPE
- make -C ctest $COMMON_FLAGS $BTYPE
@@ -34,6 +33,16 @@ matrix:
- TARGET_BOX=PPC64LE_LINUX
- BTYPE="BINARY=64 USE_OPENMP=1"
- <<: *test-ubuntu
os: linux
arch: s390x
before_script:
- COMMON_FLAGS="DYNAMIC_ARCH=1 TARGET=Z13 NUM_THREADS=32"
env:
# for matrix annotation only
- TARGET_BOX=IBMZ_LINUX
- BTYPE="BINARY=64 USE_OPENMP=1"
- <<: *test-ubuntu
env:
- TARGET_BOX=LINUX64
@@ -66,6 +75,23 @@ matrix:
- TARGET_BOX=LINUX32
- BTYPE="BINARY=32"
- os: linux
arch: ppc64le
dist: bionic
compiler: gcc
before_script:
- sudo add-apt-repository 'ppa:ubuntu-toolchain-r/test' -y
- sudo apt-get update
- sudo apt-get install gcc-9 gfortran-9 -y
script:
- make QUIET_MAKE=1 BINARY=64 USE_OPENMP=1 CC=gcc-9 FC=gfortran-9
- make -C test $COMMON_FLAGS $BTYPE
- make -C ctest $COMMON_FLAGS $BTYPE
- make -C utest $COMMON_FLAGS $BTYPE
env:
# for matrix annotation only
- TARGET_BOX=PPC64LE_LINUX_P9
- os: linux
compiler: gcc
addons:
@@ -98,7 +124,6 @@ matrix:
- sudo sh alpine-chroot-install -p 'build-base gfortran perl linux-headers'
before_script: *common-before
script:
- set -e
# XXX: Disable some warnings for now to avoid exceeding Travis limit for log size.
- alpine make QUIET_MAKE=1 $COMMON_FLAGS $BTYPE
CFLAGS="-Wno-misleading-indentation -Wno-sign-conversion -Wno-incompatible-pointer-types"
@@ -141,7 +166,6 @@ matrix:
before_script:
- COMMON_ARGS="-DTARGET=NEHALEM -DNUM_THREADS=32"
script:
- set -e
- mkdir build
- CONFIG=Release
- cmake -Bbuild -H. $CMAKE_ARGS $COMMON_ARGS -DCMAKE_BUILD_TYPE=$CONFIG
@@ -176,10 +200,16 @@ matrix:
- <<: *test-macos
osx_image: xcode10.1
env:
- CC="/Applications/Xcode-10.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -isysroot /Applications/Xcode-10.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk"
- CC="/Applications/Xcode-10.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
- CFLAGS="-O2 -Wno-macro-redefined -isysroot /Applications/Xcode-10.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk -arch arm64 -miphoneos-version-min=10.0"
- BTYPE="TARGET=ARMV8 BINARY=64 HOSTCC=clang NOFORTRAN=1"
- <<: *test-macos
osx_image: xcode10.1
env:
- CC="/Applications/Xcode-10.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
- CFLAGS="-O2 -mno-thumb -Wno-macro-redefined -isysroot /Applications/Xcode-10.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk -arch armv7 -miphoneos-version-min=5.1"
- BTYPE="TARGET=ARMV7 HOSTCC=clang NOFORTRAN=1"
# whitelist
branches:
only:

View File

@@ -6,7 +6,7 @@ cmake_minimum_required(VERSION 2.8.5)
project(OpenBLAS C ASM)
set(OpenBLAS_MAJOR_VERSION 0)
set(OpenBLAS_MINOR_VERSION 3)
set(OpenBLAS_PATCH_VERSION 8)
set(OpenBLAS_PATCH_VERSION 10.dev)
set(OpenBLAS_VERSION "${OpenBLAS_MAJOR_VERSION}.${OpenBLAS_MINOR_VERSION}.${OpenBLAS_PATCH_VERSION}")
# Adhere to GNU filesystem layout conventions
@@ -23,6 +23,7 @@ option(BUILD_WITHOUT_CBLAS "Do not build the C interface (CBLAS) to the BLAS fun
option(DYNAMIC_ARCH "Include support for multiple CPU targets, with automatic selection at runtime (x86/x86_64, aarch64 or ppc only)" OFF)
option(DYNAMIC_OLDER "Include specific support for older x86 cpu models (Penryn,Dunnington,Atom,Nano,Opteron) with DYNAMIC_ARCH" OFF)
option(BUILD_RELAPACK "Build with ReLAPACK (recursive implementation of several LAPACK functions on top of standard LAPACK)" OFF)
option(USE_LOCKING "Use locks even in single-threaded builds to make them callable from multiple threads" OFF)
if(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
option(NO_AFFINITY "Disable support for CPU affinity masks to avoid binding processes from e.g. R or numpy/scipy to a single core" ON)
else()
@@ -86,9 +87,13 @@ if (NOT NO_LAPACK)
list(APPEND SUBDIRS lapack)
endif ()
if (NOT DEFINED BUILD_HALF)
set (BUILD_HALF false)
endif ()
# set which float types we want to build for
if (NOT DEFINED BUILD_SINGLE AND NOT DEFINED BUILD_DOUBLE AND NOT DEFINED BUILD_COMPLEX AND NOT DEFINED BUILD_COMPLEX16)
# if none are defined, build for all
# set(BUILD_HALF true)
set(BUILD_SINGLE true)
set(BUILD_DOUBLE true)
set(BUILD_COMPLEX true)
@@ -120,6 +125,11 @@ if (BUILD_COMPLEX16)
list(APPEND FLOAT_TYPES "ZCOMPLEX") # defines COMPLEX and DOUBLE
endif ()
if (BUILD_HALF)
message(STATUS "Building Half Precision")
list(APPEND FLOAT_TYPES "HALF") # defines nothing
endif ()
if (NOT DEFINED CORE OR "${CORE}" STREQUAL "UNKNOWN")
message(FATAL_ERROR "Detecting CPU failed. Please set TARGET explicitly, e.g. make TARGET=your_cpu_target. Please read README for details.")
endif ()
@@ -223,6 +233,7 @@ if (NOT MSVC AND NOT NOFORTRAN)
if(NOT NO_CBLAS)
add_subdirectory(ctest)
endif()
add_subdirectory(lapack-netlib/TESTING)
endif()
set_target_properties(${OpenBLAS_LIBNAME} PROPERTIES
@@ -234,11 +245,11 @@ if (BUILD_SHARED_LIBS AND BUILD_RELAPACK)
if (NOT MSVC)
target_link_libraries(${OpenBLAS_LIBNAME} "-Wl,-allow-multiple-definition")
else()
target_link_libraries(${OpenBLAS_LIBNAME} "/FORCE:MULTIPLE")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} /FORCE:MULTIPLE")
endif()
endif()
if (BUILD_SHARED_LIBS AND NOT ${SYMBOLPREFIX}${SYMBOLSUFIX} STREQUAL "")
if (BUILD_SHARED_LIBS AND NOT ${SYMBOLPREFIX}${SYMBOLSUFFIX} STREQUAL "")
if (NOT DEFINED ARCH)
set(ARCH_IN "x86_64")
else()
@@ -347,10 +358,21 @@ endif()
if(NOT NO_CBLAS)
message (STATUS "Generating cblas.h in ${CMAKE_INSTALL_INCLUDEDIR}")
set(CBLAS_H ${CMAKE_BINARY_DIR}/generated/cblas.h)
file(READ ${CMAKE_CURRENT_SOURCE_DIR}/cblas.h CBLAS_H_CONTENTS)
string(REPLACE "common" "openblas_config" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
if (NOT ${SYMBOLPREFIX} STREQUAL "")
string(REPLACE " cblas" " ${SYMBOLPREFIX}cblas" CBLAS_H_CONTENTS "${CBLAS_H_CONTENTS_NEW}")
string(REPLACE " openblas" " ${SYMBOLPREFIX}openblas" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
string (REPLACE " ${SYMBOLPREFIX}openblas_complex" " openblas_complex" CBLAS_H_CONTENTS "${CBLAS_H_CONTENTS_NEW}")
string(REPLACE " goto" " ${SYMBOLPREFIX}goto" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
endif()
if (NOT ${SYMBOLSUFFIX} STREQUAL "")
string(REGEX REPLACE "(cblas[^ (]*)" "\\1${SYMBOLSUFFIX}" CBLAS_H_CONTENTS "${CBLAS_H_CONTENTS_NEW}")
string(REGEX REPLACE "(openblas[^ (]*)" "\\1${SYMBOLSUFFIX}" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
string(REGEX REPLACE "(openblas_complex[^ ]*)${SYMBOLSUFFIX}" "\\1" CBLAS_H_CONTENTS "${CBLAS_H_CONTENTS_NEW}")
string(REGEX REPLACE "(goto[^ (]*)" "\\1${SYMBOLSUFFIX}" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
endif()
file(WRITE ${CBLAS_H} "${CBLAS_H_CONTENTS_NEW}")
install (FILES ${CBLAS_H} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
endif()
@@ -367,11 +389,9 @@ if(NOT NO_LAPACKE)
install (FILES ${CMAKE_BINARY_DIR}/lapacke_mangling.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/openblas${SUFFIX64})
endif()
include(FindPkgConfig QUIET)
if(PKG_CONFIG_FOUND)
configure_file(${PROJECT_SOURCE_DIR}/cmake/openblas.pc.in ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc @ONLY)
install (FILES ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig/)
endif()
# Install pkg-config files
configure_file(${PROJECT_SOURCE_DIR}/cmake/openblas.pc.in ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc @ONLY)
install (FILES ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig/)
# GNUInstallDirs "DATADIR" wrong here; CMake search path wants "share".

View File

@@ -180,3 +180,13 @@ In chronological order:
* [2019-12-23] optimize AVX2 CGEMM and ZGEMM
* [2019-12-30] AVX2 CGEMM3M & ZGEMM3M kernels
* [2020-01-07] optimize AVX2 SGEMM and STRMM
* Rajalakshmi Srinivasaraghavan <https://github.com/RajalakshmiSR>
* [2020-04-15] Half-precision GEMM for bfloat16
* Marius Hillenbrand <https://github.com/mhillenibm>
* [2020-05-12] Revise dynamic architecture detection for IBM z
* [2020-05-12] Add new sgemm and strmm kernel for IBM z14
* Danfeng Zhang <https://github.com/craft-zhang>
* [2020-05-20] Improve performance of SGEMM and STRMM on Arm Cortex-A53

View File

@@ -1,4 +1,121 @@
OpenBLAS ChangeLog
====================================================================
Version 0.3.10
14-Jun-2020
common:
* Improved thread locking behaviour in blas_server and parallel getrf
* Imported bugfix 394 from LAPACK (spurious reference to "XERBL"
due to overlong lines)
* Imported bugfix 403 from LAPACK (compile option "recursive" required
for correctness with Intel and PGI)
* Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB)
* Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP)
* Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that
could lead to crashes at large matrix sizes
* Restored internal soname in dynamic libraries on FreeBSD and Dragonfly
* Added API (openblas_setaffinity) to set the thread affinity on Linux
* Added initial infrastructure for half-precision floating point
(bfloat16) support with a generic implementation of SHGEMM
* Added CMAKE build system support for building the cblas_Xgemm3m
functions
* Fixed CMAKE support for building in a path with embedded spaces
* Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC
* Fixed GCC version detection in the Makefiles
* Allowed overriding the names of AR, AS and LD in Makefile builds
POWER:
* Fixed big-endian POWER8 ELFv2 builds on FreeBSD
* Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9
* Fixed CMAKE build support for POWER9
* fixed a potential race condition in the thread buffer allocation
* Worked around LAPACK test failures on PPC G4
MIPS:
* Fixed a potential race condition in the thread buffer allocation
* Added support for MIPS 24K/24KE family based on P5600 kernels
MIPS64:
* fixed a potential race condition in the thread buffer allocation
* Added TARGET=GENERIC
ARMV7:
* Fixed a race condition in the thread buffer allocation
ARMV8:
* Fixed a race condition in the thread buffer allocation
* Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA
* Improved performance of the ThunderX2 DAXPY kernel
* Added an optimized SGEMM kernel for Cortex A53
* Fixed Makefile support for INTERFACE64 (8-byte integer)
x86_64:
* Fixed a syntax error in the CMAKE setup for SkylakeX
* Improved performance of STRSM on Haswell, SkylakeX and Ryzen
* Improved SGEMM performance on SGEMM for workloads with ldc a
multiple of 1024
* Improved DGEMM performance on Skylake X
* Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH
builds created on SkylakeX
* Removed data alignment requirement in the SSE2 copy kernels
that could cause spurious crashes
* Added a workaround for an optimizer bug in AppleClang 11.0.3
* Fixed LAPACK test failures due to wrong options for Intel Fortran
* Fixed compilation and LAPACK test results with recent Flang
and AMD AOCC
* Fixed DYNAMIC_ARCH builds with CMAKE on OS X
* Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max,
cblas_?sum, cblas_?gemm3m in the shared library on OS
* Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes
show the name of an older generation chip supported by the same kernels)
IBM Z:
* Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14
====================================================================
Version 0.3.9
1-Mar-2020
common:
* Fixed a miscompilation of the GETRF functions with CMAKE
* Imported bugfix 390 from LAPACK (missing NaN propagation in xCOMBSSQ)
* The size of the memory buffer used for splitting GEMM tasks across
multiple threads can now be configured in the build system.
POWER:
* Fixed several compilation problems related to endianness
and ELF version on POWER8 and POWER9
* Fixed use of the absolute value IAMIN/IAMAX instead of IMIN/IMAX
* Fixed a race condition in the level3 blas code
MIPS64:
* Fixed use of the absoltute value IAMIN/IAMAX instead of IMIN/IMAX
ARMV7:
* Fixed a race condition in the level3 blas code
* Fixed compilation on Android
ARMV8:
* Added support for Ampere EMAG8180
* Added support for Neoverse N1
* Improved performance of the blas_lock function
* Fixed a race condition in the level3 blas code
* Fixed a performance regression on TSV110-based servers
x86_64:
* Fixed a long-standing error with undeclared register overwrites
in the DSCAL microkernel for HASWELL,SKYLAKEX and ZEN
* Fixed a long-standing bug in the SSE implementation of IAMAX
* Fixed a CMAKE build failure with DYNAMIC_ARCH
* Fixed cpu autodetection of Goldmont+, Cannon Lake and Ice Lake
* Fixed a compilation failure on OSX with compiler name containing dash
* Fixed compilation with MinGW on SkylakeX
* Improved speed of the AVX512 GEMM3M kernel on SkylakeX
* Added an AVX512 STRMM kernel for SkylakeX
* Improved GEMM performance on Haswell and Zen
zarch:
* fixed compilation of the DYNAMIC_ARCH code
====================================================================
Version 0.3.8
9-Feb-2020

9
Jenkinsfile vendored Normal file
View File

@@ -0,0 +1,9 @@
node {
stage('Checkout') {
checkout
}
stage('Build') {
sh("make")
}
}

View File

@@ -56,10 +56,21 @@ ifneq ($(INTERFACE64), 0)
@echo " Use 64 bits int (equivalent to \"-i8\" in Fortran) "
endif
endif
@echo " C compiler ... $(C_COMPILER) (command line : $(CC))"
@$(CC) --version > /dev/null 2>&1;\
if [ $$? -eq 0 ]; then \
cverinfo=`$(CC) --version | sed -n '1p'`; \
echo " C compiler ... $(C_COMPILER) (cmd & version : $${cverinfo})";\
else \
echo " C compiler ... $(C_COMPILER) (command line : $(CC))";\
fi
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
@echo " Fortran compiler ... $(F_COMPILER) (command line : $(FC))"
@$(FC) --version > /dev/null 2>&1;\
if [ $$? -eq 0 ]; then \
fverinfo=`$(FC) --version | sed -n '1p'`; \
echo " Fortran compiler ... $(F_COMPILER) (cmd & version : $${fverinfo})";\
else \
echo " Fortran compiler ... $(F_COMPILER) (command line : $(FC))";\
fi
endif
ifneq ($(OSNAME), AIX)
@echo -n " Library Name ... $(LIBNAME)"
@@ -68,9 +79,13 @@ else
endif
ifndef SMP
@echo " (Single threaded) "
@echo " (Single-threading) "
else
@echo " (Multi threaded; Max num-threads is $(NUM_THREADS))"
@echo " (Multi-threading; Max num-threads is $(NUM_THREADS))"
endif
ifeq ($(DYNAMIC_ARCH), 1)
@echo " Supporting multiple $(ARCH) cpu models with minimum requirement for the common code being $(CORE)"
endif
ifeq ($(USE_OPENMP), 1)
@@ -97,12 +112,12 @@ endif
shared :
ifneq ($(NO_SHARED), 1)
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku FreeBSD DragonFly))
@$(MAKE) -C exports so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION)
endif
ifeq ($(OSNAME), $(filter $(OSNAME),FreeBSD OpenBSD NetBSD DragonFly))
ifeq ($(OSNAME), $(filter $(OSNAME),OpenBSD NetBSD))
@$(MAKE) -C exports so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so
endif
@@ -126,7 +141,7 @@ ifndef NO_FBLAS
$(MAKE) -C test all
endif
$(MAKE) -C utest all
ifndef NO_CBLAS
ifneq ($(NO_CBLAS), 1)
$(MAKE) -C ctest all
ifeq ($(CPP_THREAD_SAFETY_TEST), 1)
$(MAKE) -C cpp_thread_test all
@@ -229,7 +244,7 @@ ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
@$(MAKE) -C $(NETLIB_LAPACK_DIR) lapacklib
@$(MAKE) -C $(NETLIB_LAPACK_DIR) tmglib
endif
ifndef NO_LAPACKE
ifneq ($(NO_LAPACKE), 1)
@$(MAKE) -C $(NETLIB_LAPACK_DIR) lapackelib
endif
endif
@@ -249,6 +264,7 @@ lapack_prebuild :
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
-@echo "FC = $(FC)" > $(NETLIB_LAPACK_DIR)/make.inc
-@echo "FFLAGS = $(LAPACK_FFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "FFLAGS_DRV = $(LAPACK_FFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "POPTS = $(LAPACK_FPFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "FFLAGS_NOOPT = -O0 $(LAPACK_NOOPT)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "PNOOPT = $(LAPACK_FPFLAGS) -O0" >> $(NETLIB_LAPACK_DIR)/make.inc
@@ -317,7 +333,7 @@ lapack-test :
$(MAKE) -j 1 -C $(NETLIB_LAPACK_DIR)/TESTING/EIG xeigtstc xeigtstd xeigtsts xeigtstz
$(MAKE) -j 1 -C $(NETLIB_LAPACK_DIR)/TESTING/LIN xlintstc xlintstd xlintstds xlintstrfd xlintstrfz xlintsts xlintstz xlintstzc xlintstrfs xlintstrfc
ifneq ($(CROSS), 1)
( cd $(NETLIB_LAPACK_DIR)/INSTALL; make all; ./testlsame; ./testslamch; ./testdlamch; \
( cd $(NETLIB_LAPACK_DIR)/INSTALL; $(MAKE) all; ./testlsame; ./testslamch; ./testdlamch; \
./testsecond; ./testdsecnd; ./testieee; ./testversion )
(cd $(NETLIB_LAPACK_DIR); ./lapack_testing.py -r -b TESTING)
endif
@@ -349,11 +365,12 @@ clean ::
@$(MAKE) -C kernel clean
#endif
@$(MAKE) -C reference clean
@rm -f *.$(LIBSUFFIX) *.so *~ *.exe getarch getarch_2nd *.dll *.lib *.$(SUFFIX) *.dwf $(LIBPREFIX).$(LIBSUFFIX) $(LIBPREFIX)_p.$(LIBSUFFIX) $(LIBPREFIX).so.$(MAJOR_VERSION) *.lnk myconfig.h
@rm -f *.$(LIBSUFFIX) *.so *~ *.exe getarch getarch_2nd *.dll *.lib *.$(SUFFIX) *.dwf $(LIBPREFIX).$(LIBSUFFIX) $(LIBPREFIX)_p.$(LIBSUFFIX) $(LIBPREFIX).so.$(MAJOR_VERSION) *.lnk myconfig.h *.so.renamed *.a.renamed *.so.0
ifeq ($(OSNAME), Darwin)
@rm -rf getarch.dSYM getarch_2nd.dSYM
endif
@rm -f Makefile.conf config.h Makefile_kernel.conf config_kernel.h st* *.dylib
@rm -f cblas.tmp cblas.tmp2
@touch $(NETLIB_LAPACK_DIR)/make.inc
@$(MAKE) -C $(NETLIB_LAPACK_DIR) clean
@rm -f $(NETLIB_LAPACK_DIR)/make.inc $(NETLIB_LAPACK_DIR)/lapacke/include/lapacke_mangling.h

View File

@@ -1,7 +1,7 @@
ifeq ($(CORE), $(filter $(CORE),ARMV7 CORTEXA9 CORTEXA15))
ifeq ($(OSNAME), Android)
CCOMMON_OPT += -mfpu=neon
FCOMMON_OPT += -mfpu=neon
CCOMMON_OPT += -mfpu=neon -march=armv7-a
FCOMMON_OPT += -mfpu=neon -march=armv7-a
else
CCOMMON_OPT += -mfpu=vfpv3 -march=armv7-a
FCOMMON_OPT += -mfpu=vfpv3 -march=armv7-a

View File

@@ -24,6 +24,23 @@ CCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
FCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
endif
# Use a72 tunings because Neoverse-N1 is only available
# in GCC>=9
ifeq ($(CORE), NEOVERSEN1)
ifeq ($(GCCVERSIONGTEQ7), 1)
ifeq ($(GCCVERSIONGTEQ9), 1)
CCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
FCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
else
CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
endif
else
CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
endif
endif
ifeq ($(CORE), THUNDERX)
CCOMMON_OPT += -march=armv8-a -mtune=thunderx
FCOMMON_OPT += -march=armv8-a -mtune=thunderx
@@ -39,6 +56,16 @@ CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
endif
ifeq ($(CORE), THUNDERX3T110)
ifeq ($(GCCVERSIONGTEQ10), 1)
CCOMMON_OPT += -march=armv8.3-a -mtune=thunderx3t110
FCOMMON_OPT += -march=armv8.3-a -mtune=thunderx3t110
else
CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
endif
endif
ifeq ($(GCCVERSIONGTEQ9), 1)
ifeq ($(CORE), TSV110)
CCOMMON_OPT += -march=armv8.2-a -mtune=tsv110

View File

@@ -13,6 +13,14 @@ OPENBLAS_CMAKE_DIR := $(OPENBLAS_LIBRARY_DIR)/cmake/openblas
OPENBLAS_CMAKE_CONFIG := OpenBLASConfig.cmake
OPENBLAS_CMAKE_CONFIG_VERSION := OpenBLASConfigVersion.cmake
OPENBLAS_PKGCONFIG_DIR := $(OPENBLAS_LIBRARY_DIR)/pkgconfig
PKG_EXTRALIB := $(EXTRALIB)
ifeq ($(USE_OPENMP), 1)
ifeq ($(C_COMPILER), PGI)
PKG_EXTRALIB += -lomp
else
PKG_EXTRALIB += -lgomp
endif
endif
.PHONY : install
.NOTPARALLEL : install
@@ -45,7 +53,22 @@ install : lib.grd
ifndef NO_CBLAS
@echo Generating cblas.h in $(DESTDIR)$(OPENBLAS_INCLUDE_DIR)
@sed 's/common/openblas_config/g' cblas.h > "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/cblas.h"
@cp cblas.h cblas.tmp
ifdef SYMBOLPREFIX
@sed 's/cblas[^( ]*/$(SYMBOLPREFIX)&/g' cblas.tmp > cblas.tmp2
@sed 's/openblas[^( ]*/$(SYMBOLPREFIX)&/g' cblas.tmp2 > cblas.tmp
#change back any openblas_complex_float and double that got hit
@sed 's/$(SYMBOLPREFIX)openblas_complex_/openblas_complex_/g' cblas.tmp > cblas.tmp2
@sed 's/goto[^( ]*/$(SYMBOLPREFIX)&/g' cblas.tmp2 > cblas.tmp
endif
ifdef SYMBOLSUFFIX
@sed 's/cblas[^( ]*/&$(SYMBOLSUFFIX)/g' cblas.tmp > cblas.tmp2
@sed 's/openblas[^( ]*/&$(SYMBOLSUFFIX)/g' cblas.tmp2 > cblas.tmp
#change back any openblas_complex_float and double that got hit
@sed 's/\(openblas_complex_\)\([^ ]*\)$(SYMBOLSUFFIX)/\1\2 /g' cblas.tmp > cblas.tmp2
@sed 's/goto[^( ]*/&$(SYMBOLSUFFIX)/g' cblas.tmp2 > cblas.tmp
endif
@sed 's/common/openblas_config/g' cblas.tmp > "$(DESTDIR)$(OPENBLAS_INCLUDE_DIR)/cblas.h"
endif
ifneq ($(OSNAME), AIX)
@@ -68,21 +91,21 @@ endif
#for install shared library
ifneq ($(NO_SHARED),1)
@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku FreeBSD DragonFly))
@install -pm755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBSONAME) $(LIBPREFIX).so ; \
ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION)
endif
ifeq ($(OSNAME), $(filter $(OSNAME),FreeBSD OpenBSD NetBSD DragonFly))
ifeq ($(OSNAME), $(filter $(OSNAME),OpenBSD NetBSD))
@cp $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBSONAME) $(LIBPREFIX).so
endif
ifeq ($(OSNAME), Darwin)
@-cp $(LIBDYNNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@-install_name_tool -id "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)/$(LIBDYNNAME)" "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)/$(LIBDYNNAME)"
@-install_name_tool -id "$(OPENBLAS_LIBRARY_DIR)/$(LIBPREFIX).$(MAJOR_VERSION).dylib" "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)/$(LIBDYNNAME)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBDYNNAME) $(LIBPREFIX).dylib ; \
ln -fs $(LIBDYNNAME) $(LIBPREFIX).$(MAJOR_VERSION).dylib
@@ -132,7 +155,7 @@ endif
@echo 'includedir='$(OPENBLAS_INCLUDE_DIR) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'openblas_config= USE_64BITINT='$(USE_64BITINT) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'version='$(VERSION) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'extralib='$(EXTRALIB) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'extralib='$(PKG_EXTRALIB) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@cat openblas.pc.in >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@@ -168,4 +191,3 @@ endif
@echo " endif ()" >> "$(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG_VERSION)"
@echo "endif ()" >> "$(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG_VERSION)"
@echo Install OK!

View File

@@ -9,23 +9,63 @@ else
USE_OPENMP = 1
endif
ifeq ($(CORE), POWER10)
COMMON_OPT += -Ofast -mcpu=power10 -mtune=power10 -mvsx -fno-fast-math
FCOMMON_OPT += -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math
endif
ifeq ($(CORE), POWER9)
ifeq ($(USE_OPENMP), 1)
COMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -Ofast -mvsx -fno-fast-math
ifneq ($(GCCVERSIONGT4), 1)
$(warning your compiler is too old to fully support POWER9, getting a newer version of gcc is recommended)
CCOMMON_OPT += -mcpu=power8 -mtune=power8
else
COMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -fno-fast-math
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -fno-fast-math
CCOMMON_OPT += -mcpu=power9 -mtune=power9
endif
else
CCOMMON_OPT += -fast -Mvect=simd -Mcache_align
endif
ifneq ($(F_COMPILER), PGI)
FCOMMON_OPT += -O2 -frecursive -fno-fast-math
ifneq ($(GCCVERSIONGT4), 1)
$(warning your compiler is too old to fully support POWER9, getting a newer version of gcc is recommended)
FCOMMON_OPT += -mcpu=power8 -mtune=power8
else
FCOMMON_OPT += -mcpu=power9 -mtune=power9
endif
else
FCOMMON_OPT += -O2 -Mrecursive
endif
endif
ifeq ($(CORE), POWER8)
ifeq ($(USE_OPENMP), 1)
COMMON_OPT += -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
FCOMMON_OPT += -O2 -frecursive -mcpu=power8 -mtune=power8 -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -Ofast -mcpu=power8 -mtune=power8 -mvsx -fno-fast-math
else
COMMON_OPT += -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -fno-fast-math
FCOMMON_OPT += -O2 -frecursive -mcpu=power8 -mtune=power8 -malign-power -fno-fast-math
CCOMMON_OPT += -fast -Mvect=simd -Mcache_align
endif
ifneq ($(F_COMPILER), PGI)
ifeq ($(OSNAME), AIX)
FCOMMON_OPT += -O1 -frecursive -mcpu=power8 -mtune=power8 -fno-fast-math
else
FCOMMON_OPT += -O2 -frecursive -mcpu=power8 -mtune=power8 -fno-fast-math
endif
else
FCOMMON_OPT += -O2 -Mrecursive
endif
endif
ifeq ($(USE_OPENMP), 1)
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -DUSE_OPENMP -fopenmp
else
CCOMMON_OPT += -DUSE_OPENMP -mp
endif
ifneq ($(F_COMPILER), PGI)
FCOMMON_OPT += -DUSE_OPENMP -fopenmp
else
FCOMMON_OPT += -DUSE_OPENMP -mp
endif
endif
@@ -68,6 +108,9 @@ CCOMMON_OPT += -mpowerpc64 -maix64
ifeq ($(COMPILER_F77), g77)
FCOMMON_OPT += -mpowerpc64 -maix64
endif
ifeq ($(F_COMPILER), GFORTRAN)
FCOMMON_OPT += -mpowerpc64 -maix64
endif
ifeq ($(COMPILER_F77), xlf)
FCOMMON_OPT += -q64
endif

View File

@@ -17,7 +17,11 @@ ifdef CPUIDEMU
EXFLAGS = -DCPUIDEMU -DVENDOR=99
endif
ifeq ($(TARGET), 1004K)
ifeq ($(TARGET), MIPS24K)
TARGET_FLAGS = -mips32r2
endif
ifeq ($(TARGET), MIPS1004K)
TARGET_FLAGS = -mips32r2
endif
@@ -42,7 +46,7 @@ all: getarch_2nd
./getarch_2nd 1 >> $(TARGET_CONF)
config.h : c_check f_check getarch
perl ./c_check $(TARGET_MAKE) $(TARGET_CONF) $(CC) $(TARGET_FLAGS)
perl ./c_check $(TARGET_MAKE) $(TARGET_CONF) $(CC) $(TARGET_FLAGS) $(CFLAGS)
ifneq ($(ONLY_CBLAS), 1)
perl ./f_check $(TARGET_MAKE) $(TARGET_CONF) $(FC) $(TARGET_FLAGS)
else
@@ -59,13 +63,13 @@ endif
getarch : getarch.c cpuid.S dummy $(CPUIDEMU)
$(HOSTCC) $(CFLAGS) $(EXFLAGS) -o $(@F) getarch.c cpuid.S $(CPUIDEMU)
$(HOSTCC) $(HOST_CFLAGS) $(EXFLAGS) -o $(@F) getarch.c cpuid.S $(CPUIDEMU)
getarch_2nd : getarch_2nd.c config.h dummy
ifndef TARGET_CORE
$(HOSTCC) -I. $(CFLAGS) -o $(@F) getarch_2nd.c
$(HOSTCC) -I. $(HOST_CFLAGS) -o $(@F) getarch_2nd.c
else
$(HOSTCC) -I. $(CFLAGS) -DBUILD_KERNEL -o $(@F) getarch_2nd.c
$(HOSTCC) -I. $(HOST_CFLAGS) -DBUILD_KERNEL -o $(@F) getarch_2nd.c
endif
dummy:

View File

@@ -3,7 +3,7 @@
#
# This library's version
VERSION = 0.3.8
VERSION = 0.3.10.dev
# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library
@@ -97,6 +97,15 @@ VERSION = 0.3.8
# they need to wait for the preceding API calls to finish or risk data corruption.
# NUM_PARALLEL = 2
# When multithreading, OpenBLAS needs to use a memory buffer for communicating
# and collating results for individual subranges of the original matrix. Since
# the original GotoBLAS of the early 2000s, the default size of this buffer has
# been set at a value of 32<<20 (which is 32MB) on x86_64 , twice that on PPC.
# If you expect to handle large problem sizes (beyond about 30000x30000) uncomment
# this line and adjust the (32<<n) factor if necessary. Usually an insufficient value
# manifests itself as a crash in the relevant scal kernel (sscal_k, dscal_k etc)
# BUFFERSIZE = 25
# If you don't need to install the static library, please comment this in.
# NO_STATIC = 1
@@ -264,6 +273,9 @@ COMMON_PROF = -pg
#
# CPP_THREAD_SAFETY_TEST = 1
# If you want to enable the experimental BFLOAT16 support
# BUILD_HALF = 1
#
# End of user configuration
#

View File

@@ -21,8 +21,14 @@ ifeq ($(ARCH), amd64)
override ARCH=x86_64
else ifeq ($(ARCH), powerpc64)
override ARCH=power
else ifeq ($(ARCH), powerpc)
override ARCH=power
else ifeq ($(ARCH), i386)
override ARCH=x86
else ifeq ($(ARCH), armv6)
override ARCH=arm
else ifeq ($(ARCH), armv7)
override ARCH=arm
else ifeq ($(ARCH), aarch64)
override ARCH=arm64
else ifeq ($(ARCH), zarch)
@@ -86,6 +92,9 @@ endif
ifeq ($(TARGET), SKYLAKEX)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
ifeq ($(TARGET), COOPERLAKE)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
ifeq ($(TARGET), SANDYBRIDGE)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
@@ -107,6 +116,9 @@ endif
ifeq ($(TARGET), ARMV8)
GETARCH_FLAGS := -DFORCE_ARMV7
endif
ifeq ($(TARGET), POWER8)
GETARCH_FLAGS := -DFORCE_POWER6
endif
endif
@@ -125,6 +137,9 @@ endif
ifeq ($(TARGET_CORE), SKYLAKEX)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
ifeq ($(TARGET_CORE), COOPERLAKE)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
ifeq ($(TARGET_CORE), SANDYBRIDGE)
GETARCH_FLAGS := -DFORCE_NEHALEM
endif
@@ -209,12 +224,17 @@ else
ONLY_CBLAS = 0
endif
#For small matrix optimization
ifeq ($(SMALL_MATRIX_OPT), 1)
CCOMMON_OPT += -DSMALL_MATRIX_OPT
endif
# This operation is expensive, so execution should be once.
ifndef GOTOBLAS_MAKEFILE
export GOTOBLAS_MAKEFILE = 1
# Generating Makefile.conf and config.h
DUMMY := $(shell $(MAKE) -C $(TOPDIR) -f Makefile.prebuild CC="$(CC)" FC="$(FC)" HOSTCC="$(HOSTCC)" CFLAGS="$(GETARCH_FLAGS)" BINARY=$(BINARY) USE_OPENMP=$(USE_OPENMP) TARGET_CORE=$(TARGET_CORE) ONLY_CBLAS=$(ONLY_CBLAS) TARGET=$(TARGET) all)
DUMMY := $(shell $(MAKE) -C $(TOPDIR) -f Makefile.prebuild CC="$(CC)" FC="$(FC)" HOSTCC="$(HOSTCC)" HOST_CFLAGS="$(GETARCH_FLAGS)" CFLAGS="$(CFLAGS)" BINARY=$(BINARY) USE_OPENMP=$(USE_OPENMP) TARGET_CORE=$(TARGET_CORE) ONLY_CBLAS=$(ONLY_CBLAS) TARGET=$(TARGET) all)
ifndef TARGET_CORE
include $(TOPDIR)/Makefile.conf
@@ -261,10 +281,10 @@ endif
ARFLAGS =
CPP = $(COMPILER) -E
AR = $(CROSS_SUFFIX)ar
AS = $(CROSS_SUFFIX)as
LD = $(CROSS_SUFFIX)ld
RANLIB = $(CROSS_SUFFIX)ranlib
AR ?= $(CROSS_SUFFIX)ar
AS ?= $(CROSS_SUFFIX)as
LD ?= $(CROSS_SUFFIX)ld
RANLIB ?= $(CROSS_SUFFIX)ranlib
NM = $(CROSS_SUFFIX)nm
DLLWRAP = $(CROSS_SUFFIX)dllwrap
OBJCOPY = $(CROSS_SUFFIX)objcopy
@@ -277,6 +297,26 @@ NO_LAPACK = 1
override FEXTRALIB =
endif
ifeq ($(C_COMPILER), GCC)
GCCVERSIONGTEQ4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 4)
GCCVERSIONGT4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \> 4)
GCCVERSIONEQ5 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` = 5)
GCCVERSIONGT5 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \> 5)
GCCVERSIONGTEQ7 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 7)
GCCVERSIONGTEQ9 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 9)
GCCVERSIONGTEQ11 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 11)
GCCVERSIONGTEQ10 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 10)
# Note that the behavior of -dumpversion is compile-time-configurable for
# gcc-7.x and newer. Use -dumpfullversion there
ifeq ($(GCCVERSIONGTEQ7),1)
GCCDUMPVERSION_PARAM := -dumpfullversion
else
GCCDUMPVERSION_PARAM := -dumpversion
endif
GCCMINORVERSIONGTEQ2 := $(shell expr `$(CC) $(GCCDUMPVERSION_PARAM) | cut -f2 -d.` \>= 2)
GCCMINORVERSIONGTEQ7 := $(shell expr `$(CC) $(GCCDUMPVERSION_PARAM) | cut -f2 -d.` \>= 7)
endif
#
# OS dependent settings
#
@@ -323,13 +363,7 @@ ifeq ($(C_COMPILER), CLANG)
CCOMMON_OPT += -DMS_ABI
endif
ifeq ($(C_COMPILER), GCC)
#Version tests for supporting specific features (MS_ABI, POWER9 intrinsics)
GCCVERSIONGTEQ4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 4)
GCCVERSIONGT4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \> 4)
GCCVERSIONGT5 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \> 5)
GCCVERSIONGTEQ9 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 9)
GCCMINORVERSIONGTEQ7 := $(shell expr `$(CC) -dumpversion | cut -f2 -d.` \>= 7)
ifeq ($(GCCVERSIONGT4), 1)
# GCC Major version > 4
# It is compatible with MSVC ABI.
@@ -343,7 +377,6 @@ ifeq ($(GCCMINORVERSIONGTEQ7), 1)
CCOMMON_OPT += -DMS_ABI
endif
endif
endif
# Ensure the correct stack alignment on Win32
# http://permalink.gmane.org/gmane.comp.lib.openblas.general/97
@@ -535,7 +568,7 @@ DYNAMIC_CORE += HASWELL ZEN
endif
ifneq ($(NO_AVX512), 1)
ifneq ($(NO_AVX2), 1)
DYNAMIC_CORE += SKYLAKEX
DYNAMIC_CORE += SKYLAKEX COOPERLAKE
endif
endif
endif
@@ -554,15 +587,44 @@ DYNAMIC_CORE += CORTEXA53
DYNAMIC_CORE += CORTEXA57
DYNAMIC_CORE += CORTEXA72
DYNAMIC_CORE += CORTEXA73
DYNAMIC_CORE += NEOVERSEN1
DYNAMIC_CORE += FALKOR
DYNAMIC_CORE += THUNDERX
DYNAMIC_CORE += THUNDERX2T99
DYNAMIC_CORE += TSV110
DYNAMIC_CORE += EMAG8180
DYNAMIC_CORE += THUNDERX3T110
endif
ifeq ($(ARCH), zarch)
DYNAMIC_CORE = Z13
DYNAMIC_CORE = ZARCH_GENERIC
# Z13 is supported since gcc-5.2, gcc-6, and in RHEL 7.3 and newer
ifeq ($(GCCVERSIONGT5), 1)
ZARCH_SUPPORT_Z13 := 1
else ifeq ($(GCCVERSIONEQ5), 1)
ifeq ($(GCCMINORVERSIONGTEQ2), 1)
ZARCH_SUPPORT_Z13 := 1
endif
endif
ifeq ($(wildcard /etc/redhat-release), /etc/redhat-release)
ifeq ($(shell source /etc/os-release ; expr $$VERSION_ID \>= "7.3"), 1)
ZARCH_SUPPORT_Z13 := 1
endif
endif
ifeq ($(ZARCH_SUPPORT_Z13), 1)
DYNAMIC_CORE += Z13
else
$(info OpenBLAS: Not building Z13 kernels because gcc is older than 5.2 or 6.x)
endif
ifeq ($(GCCVERSIONGTEQ7), 1)
DYNAMIC_CORE += Z14
else
$(info OpenBLAS: Not building Z14 kernels because gcc is older than 7.x)
endif
endif
ifeq ($(ARCH), power)
@@ -570,6 +632,7 @@ DYNAMIC_CORE = POWER6
DYNAMIC_CORE += POWER8
ifneq ($(C_COMPILER), GCC)
DYNAMIC_CORE += POWER9
DYNAMIC_CORE += POWER10
endif
ifeq ($(C_COMPILER), GCC)
ifeq ($(GCCVERSIONGT5), 1)
@@ -577,6 +640,15 @@ DYNAMIC_CORE += POWER9
else
$(info, OpenBLAS: Your gcc version is too old to build the POWER9 kernels.)
endif
ifeq ($(GCCVERSIONGTEQ11), 1)
DYNAMIC_CORE += POWER10
else ifeq ($(GCCVERSIONGTEQ10), 1)
ifeq ($(GCCMINORVERSIONGTEQ2), 1)
DYNAMIC_CORE += POWER10
endif
else
$(info, OpenBLAS: Your gcc version is too old to build the POWER10 kernels.)
endif
endif
endif
@@ -632,6 +704,16 @@ endif
ifeq ($(ARCH), arm64)
NO_BINARY_MODE = 1
BINARY_DEFINED = 1
ifdef INTERFACE64
ifneq ($(INTERFACE64), 0)
ifeq ($(F_COMPILER), GFORTRAN)
FCOMMON_OPT += -fdefault-integer-8
endif
ifeq ($(F_COMPILER), FLANG)
FCOMMON_OPT += -i8
endif
endif
endif
endif
@@ -677,7 +759,12 @@ CCOMMON_OPT += -march=mips64
FCOMMON_OPT += -march=mips64
endif
ifeq ($(CORE), 1004K)
ifeq ($(CORE), MIPS24K)
CCOMMON_OPT += -mips32r2 -mtune=24kc $(MSA_FLAGS)
FCOMMON_OPT += -mips32r2 -mtune=24kc $(MSA_FLAGS)
endif
ifeq ($(CORE), MIPS1004K)
CCOMMON_OPT += -mips32r2 $(MSA_FLAGS)
FCOMMON_OPT += -mips32r2 $(MSA_FLAGS)
endif
@@ -722,8 +809,19 @@ endif
ifeq ($(C_COMPILER), PGI)
ifdef BINARY64
ifeq ($(ARCH), x86_64)
CCOMMON_OPT += -tp p7-64 -D__MMX__ -Mnollvm
else
ifeq ($(ARCH), power)
ifeq ($(CORE), POWER8)
CCOMMON_OPT += -tp pwr8
endif
ifeq ($(CORE), POWER9)
CCOMMON_OPT += -tp pwr9
endif
endif
endif
else
CCOMMON_OPT += -tp p7
endif
endif
@@ -742,6 +840,15 @@ endif
ifeq ($(F_COMPILER), FLANG)
CCOMMON_OPT += -DF_INTERFACE_FLANG
FCOMMON_OPT += -Mrecursive -Kieee
ifeq ($(OSNAME), Linux)
ifeq ($(ARCH), x86_64)
FLANG_VENDOR := $(shell expr `$(FC) --version|cut -f 1 -d "."|head -1`)
ifeq ($(FLANG_VENDOR),AOCC)
FCOMMON_OPT += -fno-unroll-loops
endif
endif
endif
ifdef BINARY64
ifdef INTERFACE64
ifneq ($(INTERFACE64), 0)
@@ -837,6 +944,7 @@ ifneq ($(INTERFACE64), 0)
FCOMMON_OPT += -i8
endif
endif
FCOMMON_OPT += -recursive -fp-model strict -assume protect-parens
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -fopenmp
endif
@@ -876,10 +984,22 @@ ifneq ($(INTERFACE64), 0)
FCOMMON_OPT += -i8
endif
endif
ifeq ($(ARCH), x86_64)
FCOMMON_OPT += -tp p7-64
else
ifeq ($(ARCH), power)
ifeq ($(CORE), POWER8)
FCOMMON_OPT += -tp pwr8
endif
ifeq ($(CORE), POWER9)
FCOMMON_OPT += -tp pwr9
endif
endif
endif
else
FCOMMON_OPT += -tp p7
endif
FCOMMON_OPT += -Mrecursive
ifeq ($(USE_OPENMP), 1)
FCOMMON_OPT += -mp
endif
@@ -1104,6 +1224,10 @@ ifeq ($(USE_TLS), 1)
CCOMMON_OPT += -DUSE_TLS
endif
ifeq ($(BUILD_HALF), 1)
CCOMMON_OPT += -DBUILD_HALF
endif
CCOMMON_OPT += -DVERSION=\"$(VERSION)\"
ifndef SYMBOLPREFIX
@@ -1130,6 +1254,9 @@ KERNELDIR = $(TOPDIR)/kernel/$(ARCH)
include $(TOPDIR)/Makefile.$(ARCH)
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME
endif
CCOMMON_OPT += -DASMNAME=$(FU)$(*F) -DASMFNAME=$(FU)$(*F)$(BU) -DNAME=$(*F)$(BU) -DCNAME=$(*F) -DCHAR_NAME=\"$(*F)$(BU)\" -DCHAR_CNAME=\"$(*F)\"
ifeq ($(CORE), PPC440)
@@ -1222,7 +1349,6 @@ endif
override CFLAGS += $(COMMON_OPT) $(CCOMMON_OPT) -I$(TOPDIR)
override PFLAGS += $(COMMON_OPT) $(CCOMMON_OPT) -I$(TOPDIR) -DPROFILE $(COMMON_PROF)
override FFLAGS += $(COMMON_OPT) $(FCOMMON_OPT)
override FPFLAGS += $(FCOMMON_OPT) $(COMMON_PROF)
#MAKEOVERRIDES =
@@ -1328,6 +1454,8 @@ export OSNAME
export ARCH
export CORE
export LIBCORE
export __BYTE_ORDER__
export ELF_VERSION
export PGCPATH
export CONFIG
export CC
@@ -1373,7 +1501,10 @@ export KERNELDIR
export FUNCTION_PROFILE
export TARGET_CORE
export NO_AVX512
export BUILD_HALF
export SHGEMM_UNROLL_M
export SHGEMM_UNROLL_N
export SGEMM_UNROLL_M
export SGEMM_UNROLL_N
export DGEMM_UNROLL_M

View File

@@ -1,3 +1,4 @@
SHBLASOBJS_P = $(SHBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
SBLASOBJS_P = $(SBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
DBLASOBJS_P = $(DBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
QBLASOBJS_P = $(QBLASOBJS:.$(SUFFIX)=.$(PSUFFIX))
@@ -9,8 +10,8 @@ COMMONOBJS_P = $(COMMONOBJS:.$(SUFFIX)=.$(PSUFFIX))
HPLOBJS_P = $(HPLOBJS:.$(SUFFIX)=.$(PSUFFIX))
BLASOBJS = $(SBLASOBJS) $(DBLASOBJS) $(CBLASOBJS) $(ZBLASOBJS)
BLASOBJS_P = $(SBLASOBJS_P) $(DBLASOBJS_P) $(CBLASOBJS_P) $(ZBLASOBJS_P)
BLASOBJS = $(SHBLASOBJS) $(SBLASOBJS) $(DBLASOBJS) $(CBLASOBJS) $(ZBLASOBJS)
BLASOBJS_P = $(SHBLASOBJS_P) $(SBLASOBJS_P) $(DBLASOBJS_P) $(CBLASOBJS_P) $(ZBLASOBJS_P)
ifdef EXPRECISION
BLASOBJS += $(QBLASOBJS) $(XBLASOBJS)
@@ -22,6 +23,7 @@ BLASOBJS += $(QBLASOBJS) $(XBLASOBJS)
BLASOBJS_P += $(QBLASOBJS_P) $(XBLASOBJS_P)
endif
$(SHBLASOBJS) $(SHBLASOBJS_P) : override CFLAGS += -DHALF -UDOUBLE -UCOMPLEX
$(SBLASOBJS) $(SBLASOBJS_P) : override CFLAGS += -UDOUBLE -UCOMPLEX
$(DBLASOBJS) $(DBLASOBJS_P) : override CFLAGS += -DDOUBLE -UCOMPLEX
$(QBLASOBJS) $(QBLASOBJS_P) : override CFLAGS += -DXDOUBLE -UCOMPLEX
@@ -29,6 +31,7 @@ $(CBLASOBJS) $(CBLASOBJS_P) : override CFLAGS += -UDOUBLE -DCOMPLEX
$(ZBLASOBJS) $(ZBLASOBJS_P) : override CFLAGS += -DDOUBLE -DCOMPLEX
$(XBLASOBJS) $(XBLASOBJS_P) : override CFLAGS += -DXDOUBLE -DCOMPLEX
$(SHBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(SBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(DBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)
$(QBLASOBJS_P) : override CFLAGS += -DPROFILE $(COMMON_PROF)

View File

@@ -15,10 +15,38 @@ CCOMMON_OPT += -march=skylake-avx512
FCOMMON_OPT += -march=skylake-avx512
ifeq ($(OSNAME), CYGWIN_NT)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
FCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
ifeq ($(OSNAME), WINNT)
ifeq ($(C_COMPILER), GCC)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
FCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
endif
endif
endif
endif
ifeq ($(CORE), COOPERLAKE)
ifndef DYNAMIC_ARCH
ifndef NO_AVX512
ifeq ($(C_COMPILER), GCC)
# cooperlake support was added in 10.1
GCCVERSIONGTEQ10 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 10)
GCCMINORVERSIONGTEQ1 := $(shell expr `$(CC) -dumpversion | cut -f2 -d.` \>= 1)
ifeq ($(GCCVERSIONGTEQ10)$(GCCMINORVERSIONGTEQ1), 11)
CCOMMON_OPT += -march=cooperlake
FCOMMON_OPT += -march=cooperlake
endif
endif
ifeq ($(OSNAME), CYGWIN_NT)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
FCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
ifeq ($(OSNAME), WINNT)
ifeq ($(C_COMPILER), GCC)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
FCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
endif
endif
@@ -29,14 +57,24 @@ ifeq ($(CORE), HASWELL)
ifndef DYNAMIC_ARCH
ifndef NO_AVX2
ifeq ($(C_COMPILER), GCC)
# AVX2 support was added in 4.7.0
GCCVERSIONGTEQ4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 4)
GCCMINORVERSIONGTEQ7 := $(shell expr `$(CC) -dumpversion | cut -f2 -d.` \>= 7)
ifeq ($(GCCVERSIONGTEQ4)$(GCCMINORVERSIONGTEQ7), 11)
CCOMMON_OPT += -mavx2
endif
endif
ifeq ($(F_COMPILER), GFORTRAN)
# AVX2 support was added in 4.7.0
GCCVERSIONGTEQ4 := $(shell expr `$(FC) -dumpversion | cut -f1 -d.` \>= 4)
GCCMINORVERSIONGTEQ7 := $(shell expr `$(FC) -dumpversion | cut -f2 -d.` \>= 7)
ifeq ($(GCCVERSIONGTEQ4)$(GCCMINORVERSIONGTEQ7), 11)
FCOMMON_OPT += -mavx2
endif
endif
endif
endif
endif

View File

@@ -5,6 +5,6 @@ FCOMMON_OPT += -march=z13 -mzvector
endif
ifeq ($(CORE), Z14)
CCOMMON_OPT += -march=z14 -mzvector
CCOMMON_OPT += -march=z14 -mzvector -O3
FCOMMON_OPT += -march=z14 -mzvector
endif

View File

@@ -6,8 +6,11 @@ Travis CI: [![Build Status](https://travis-ci.org/xianyi/OpenBLAS.svg?branch=dev
AppVeyor: [![Build status](https://ci.appveyor.com/api/projects/status/09sohd35n8nkkx64/branch/develop?svg=true)](https://ci.appveyor.com/project/xianyi/openblas/branch/develop)
Drone CI: [![Build Status](https://cloud.drone.io/api/badges/xianyi/OpenBLAS/status.svg?branch=develop)](https://cloud.drone.io/xianyi/OpenBLAS/)
[![Build Status](https://dev.azure.com/xianyi/OpenBLAS/_apis/build/status/xianyi.OpenBLAS?branchName=develop)](https://dev.azure.com/xianyi/OpenBLAS/_build/latest?definitionId=1&branchName=develop)
## Introduction
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
@@ -25,7 +28,8 @@ You can download them from [file hosting on sourceforge.net](https://sourceforge
## Installation from Source
Download from project homepage, https://xianyi.github.com/OpenBLAS/, or check out the code
using Git from https://github.com/xianyi/OpenBLAS.git.
using Git from https://github.com/xianyi/OpenBLAS.git. (If you want the most up to date version, be
sure to use the develop branch - master is several years out of date due to a change of maintainership.)
Buildtime parameters can be chosen in Makefile.rule, see there for a short description of each option.
Most can also be given directly on the make or cmake command line.
@@ -55,6 +59,10 @@ Examples:
```sh
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
```
or same with the newer mips-crosscompiler put out by Loongson that defaults to the 32bit ABI:
```sh
make HOSTCC=gcc CC='/opt/mips-loongson-gcc7.3-linux-gnu/2019.06-29/bin/mips-linux-gnu-gcc -mabi=64' FC='/opt/mips-loongson-gcc7.3-linux-gnu/2019.06-29/bin/mips-linux-gnu-gfortran -mabi=64' TARGET=LOONGSON3A
```
* On an x86 box, compile this library for a loongson3a CPU with loongcc (based on Open64) compiler:
```sh
@@ -119,6 +127,11 @@ Please read `GotoBLAS_01Readme.txt` for older CPU models already supported by th
- **AMD STEAMROLLER**: Uses Bulldozer codes with some optimizations.
- **AMD ZEN**: Uses Haswell codes with some optimizations.
#### MIPS32
- **MIPS 1004K**: uses P5600 codes
- **MIPS 24K**: uses P5600 codes
#### MIPS64
- **ICT Loongson 3A**: Optimized Level-3 BLAS and the part of Level-1,2.
@@ -140,6 +153,7 @@ Please read `GotoBLAS_01Readme.txt` for older CPU models already supported by th
- **ThunderX**: Optimized some Level-1 functions
- **ThunderX2T99**: Optimized Level-3 BLAS and parts of Levels 1 and 2
- **TSV110**: Optimized some Level-3 helper functions
- **EMAG 8180**: preliminary support based on A57
#### PPC/PPC64
@@ -154,11 +168,16 @@ Please read `GotoBLAS_01Readme.txt` for older CPU models already supported by th
### Support for multiple targets in a single library
OpenBLAS can be built for multiple targets with runtime detection of the target cpu by specifiying DYNAMIC_ARCH=1 in Makefile.rule, on the gmake command line or as -DDYNAMIC_ARCH=TRUE in cmake.
For **x86_64**, the list of targets this activates contains Prescott, Core2, Nehalem, Barcelona, Sandybridge, Bulldozer, Piledriver, Steamroller, Excavator, Haswell, Zen, SkylakeX. For cpu generations not included in this list, the corresponding older model is used. If you also specify DYNAMIC_OLDER=1, specific support for Penryn, Dunnington, Opteron, Opteron/SSE3, Bobcat, Atom and Nano is added. Finally there is an option DYNAMIC_LIST that allows to specify an individual list of targets to include instead of the default.
DYNAMIC_ARCH is also supported on **x86**, where it translates to Katmai, Coppermine, Northwood, Prescott, Banias,
Core2, Penryn, Dunnington, Nehalem, Athlon, Opteron, Opteron_SSE3, Barcelona, Bobcat, Atom and Nano.
On **ARMV8**, it enables support for CortexA53, CortexA57, CortexA72, CortexA73, Falkor, ThunderX, ThunderX2T99, TSV110 as well as generic ARMV8 cpus.
For **POWER**, the list encompasses POWER6, POWER8 and POWER9, on **ZARCH** it comprises Z13 and Z14.
The TARGET option can be used in conjunction with DYNAMIC_ARCH=1 to specify which cpu model should be assumed for all the
common code in the library, usually you will want to set this to the oldest model you expect to encounter.
Please note that it is not possible to combine support for different architectures, so no combined 32 and 64 bit or x86_64 and arm64 in the same library.

View File

@@ -22,6 +22,7 @@ SANDYBRIDGE
HASWELL
SKYLAKEX
ATOM
COOPERLAKE
b)AMD CPU:
ATHLON
@@ -49,6 +50,7 @@ POWER6
POWER7
POWER8
POWER9
POWER10
PPCG4
PPC970
PPC970MP
@@ -58,7 +60,8 @@ CELL
3.MIPS CPU:
P5600
1004K
MIPS1004K
MIPS24K
4.MIPS64 CPU:
SICORTEX
@@ -88,10 +91,13 @@ CORTEXA53
CORTEXA57
CORTEXA72
CORTEXA73
NEOVERSEN1
EMAG8180
FALKOR
THUNDERX
THUNDERX2T99
TSV110
THUNDERX3T110
9.System Z:
ZARCH_GENERIC

View File

@@ -49,3 +49,23 @@ jobs:
# we need a privileged docker run for sde process attachment
docker run --privileged intel_sde
displayName: 'Run AVX512 SkylakeX docker build / test'
- job: Windows_cl
pool:
vmImage: 'windows-latest'
steps:
- task: CMake@1
inputs:
workingDirectory: 'build' # Optional
cmakeArgs: '-G "Visual Studio 16 2019" ..'
- task: CMake@1
inputs:
cmakeArgs: '--build . --config Release'
workingDirectory: 'build'
- script: |
cd build
cd utest
dir
openblas_utest.exe

File diff suppressed because it is too large Load Diff

191
benchmark/amax.c Normal file
View File

@@ -0,0 +1,191 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef AMAX
#ifdef COMPLEX
#ifdef DOUBLE
#define AMAX BLASFUNC(dzamax)
#else
#define AMAX BLASFUNC(scamax)
#endif
#else
#ifdef DOUBLE
#define AMAX BLASFUNC(damax)
#else
#define AMAX BLASFUNC(samax)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
AMAX (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

192
benchmark/amin.c Normal file
View File

@@ -0,0 +1,192 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef AMIN
#ifdef COMPLEX
#ifdef DOUBLE
#define AMIN BLASFUNC(dzamin)
#else
#define AMIN BLASFUNC(scamin)
#endif
#else
#ifdef DOUBLE
#define AMIN BLASFUNC(damin)
#else
#define AMIN BLASFUNC(samin)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
AMIN (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

202
benchmark/axpby.c Normal file
View File

@@ -0,0 +1,202 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef AXPBY
#ifdef COMPLEX
#ifdef DOUBLE
#define AXPBY BLASFUNC(zaxpby)
#else
#define AXPBY BLASFUNC(caxpby)
#endif
#else
#ifdef DOUBLE
#define AXPBY BLASFUNC(daxpby)
#else
#define AXPBY BLASFUNC(saxpby)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y;
FLOAT alpha[2] = { 2.0, 2.0 };
FLOAT beta[2] = {2.0, 2.0};
blasint m, i;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,inc_x,inc_y,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l=0; l<loops; l++)
{
gettimeofday( &start, (struct timezone *)0);
AXPBY (&m, alpha, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
(COMPSIZE * COMPSIZE * 4. - COMPSIZE) * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -128,7 +128,7 @@ int main(int argc, char *argv[]){
int to = 200;
int step = 1;
struct timeval start, stop;
struct timespec start, stop;
double time1,timeg;
argc--;argv++;
@@ -175,13 +175,13 @@ int main(int argc, char *argv[]){
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
clock_gettime( CLOCK_REALTIME, &start);
AXPY (&m, alpha, x, &inc_x, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
clock_gettime( CLOCK_REALTIME, &stop);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_nsec - start.tv_nsec)) * 1.e-9;
timeg += time1;
@@ -190,7 +190,7 @@ int main(int argc, char *argv[]){
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
" %10.2f MFlops %10.9f sec\n",
COMPSIZE * COMPSIZE * 2. * (double)m / timeg * 1.e-6, timeg);
}

View File

@@ -173,46 +173,46 @@ int main(int argc, char *argv[]){
#ifndef COMPLEX
if (uplos & 1) {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) a[i + j * m] = 0.;
a[j + j * m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[i + j * m] = ((double) rand() / (double) RAND_MAX) - 0.5;
for(i = 0; i < j; i++) a[(long)i + (long)j * (long)m] = 0.;
a[(long)j + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[(long)i + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
} else {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) a[i + j * m] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[j + j * m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[i + j * m] = 0.;
for(i = 0; i < j; i++) a[(long)i + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(long)j + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[(long)i + (long)j * (long)m] = 0.;
}
}
#else
if (uplos & 1) {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) {
a[(i + j * m) * 2 + 0] = 0.;
a[(i + j * m) * 2 + 1] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 1] = 0.;
}
a[(j + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[(j + j * m) * 2 + 1] = 0.;
a[((long)j + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[((long)j + (long)j * (long)m) * 2 + 1] = 0.;
for(i = j + 1; i < m; i++) {
a[(i + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(i + j * m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
}
} else {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) {
a[(i + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(i + j * m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
a[(j + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[(j + j * m) * 2 + 1] = 0.;
a[((long)j + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[((long)j + (long)j * (long)m) * 2 + 1] = 0.;
for(i = j + 1; i < m; i++) {
a[(i + j * m) * 2 + 0] = 0.;
a[(i + j * m) * 2 + 1] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 1] = 0.;
}
}
}
@@ -239,10 +239,13 @@ int main(int argc, char *argv[]){
for (j = 0; j < m; j++) {
for(i = 0; i <= j; i++) {
#ifndef COMPLEX
if (maxerr < fabs(a[i + j * m] - b[i + j * m])) maxerr = fabs(a[i + j * m] - b[i + j * m]);
if (maxerr < fabs(a[(long)i + (long)j * (long)m] - b[(long)i + (long)j * (long)m]))
maxerr = fabs(a[(long)i + (long)j * (long)m] - b[(long)i + (long)j * (long)m]);
#else
if (maxerr < fabs(a[(i + j * m) * 2 + 0] - b[(i + j * m) * 2 + 0])) maxerr = fabs(a[(i + j * m) * 2 + 0] - b[(i + j * m) * 2 + 0]);
if (maxerr < fabs(a[(i + j * m) * 2 + 1] - b[(i + j * m) * 2 + 1])) maxerr = fabs(a[(i + j * m) * 2 + 1] - b[(i + j * m) * 2 + 1]);
if (maxerr < fabs(a[((long)i + (long)j * (long)m) * 2 + 0] - b[((long)i + (long)j * (long)m) * 2 + 0]))
maxerr = fabs(a[((long)i + (long)j * (long)m) * 2 + 0] - b[((long)i + (long)j * (long)m) * 2 + 0]);
if (maxerr < fabs(a[((long)i + (long)j * (long)m) * 2 + 1] - b[((long)i + (long)j * (long)m) * 2 + 1]))
maxerr = fabs(a[((long)i + (long)j * (long)m) * 2 + 1] - b[((long)i + (long)j * (long)m) * 2 + 1]);
#endif
}
}
@@ -250,10 +253,13 @@ int main(int argc, char *argv[]){
for (j = 0; j < m; j++) {
for(i = j; i < m; i++) {
#ifndef COMPLEX
if (maxerr < fabs(a[i + j * m] - b[i + j * m])) maxerr = fabs(a[i + j * m] - b[i + j * m]);
if (maxerr < fabs(a[(long)i + (long)j * (long)m] - b[(long)i + (long)j * (long)m]))
maxerr = fabs(a[(long)i + (long)j * (long)m] - b[(long)i + (long)j * (long)m]);
#else
if (maxerr < fabs(a[(i + j * m) * 2 + 0] - b[(i + j * m) * 2 + 0])) maxerr = fabs(a[(i + j * m) * 2 + 0] - b[(i + j * m) * 2 + 0]);
if (maxerr < fabs(a[(i + j * m) * 2 + 1] - b[(i + j * m) * 2 + 1])) maxerr = fabs(a[(i + j * m) * 2 + 1] - b[(i + j * m) * 2 + 1]);
if (maxerr < fabs(a[((long)i + (long)j * (long)m) * 2 + 0] - b[((long)i + (long)j * (long)m) * 2 + 0]))
maxerr = fabs(a[((long)i + (long)j * (long)m) * 2 + 0] - b[((long)i + (long)j * (long)m) * 2 + 0]);
if (maxerr < fabs(a[((long)i + (long)j * (long)m) * 2 + 1] - b[((long)i + (long)j * (long)m) * 2 + 1]))
maxerr = fabs(a[((long)i + (long)j * (long)m) * 2 + 1] - b[((long)i + (long)j * (long)m) * 2 + 1]);
#endif
}
}

View File

@@ -129,7 +129,10 @@ int main(int argc, char *argv[]){
int step = 1;
struct timeval start, stop;
double time1,timeg;
double time1 = 0.0, timeg = 0.0;
long nanos = 0;
time_t seconds = 0;
struct timespec time_start = { 0, 0 }, time_end = { 0, 0 };
argc--;argv++;
@@ -163,35 +166,32 @@ int main(int argc, char *argv[]){
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l=0; l<loops; l++)
{
clock_gettime(CLOCK_REALTIME, &time_start);
COPY (&m, x, &inc_x, y, &inc_y );
clock_gettime(CLOCK_REALTIME, &time_end);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
nanos = time_end.tv_nsec - time_start.tv_nsec;
seconds = time_end.tv_sec - time_start.tv_sec;
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
time1 = seconds + nanos / 1.e9;
timeg += time1;
}
COPY (&m, x, &inc_x, y, &inc_y );
timeg /= loops;
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MBytes %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
fprintf(stderr,
" %10.2f MBytes %12.9f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg / 1.e6, timeg);
}

View File

@@ -195,7 +195,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < to; j++){
for(i = 0; i < to * COMPSIZE; i++){
a[i + j * to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

View File

@@ -39,6 +39,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#ifdef DOUBLE
#define GEMM BLASFUNC(dgemm)
#elif defined(HALF)
#define GEMM BLASFUNC(shgemm)
#else
#define GEMM BLASFUNC(sgemm)
#endif
@@ -120,7 +122,8 @@ static void *huge_malloc(BLASLONG size){
int main(int argc, char *argv[]){
FLOAT *a, *b, *c;
IFLOAT *a, *b;
FLOAT *c;
FLOAT alpha[] = {1.0, 0.0};
FLOAT beta [] = {0.0, 0.0};
char transa = 'N';
@@ -184,10 +187,10 @@ int main(int argc, char *argv[]){
k = to;
}
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * m * k * COMPSIZE)) == NULL) {
if (( a = (IFLOAT *)malloc(sizeof(IFLOAT) * m * k * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( b = (FLOAT *)malloc(sizeof(FLOAT) * k * n * COMPSIZE)) == NULL) {
if (( b = (IFLOAT *)malloc(sizeof(IFLOAT) * k * n * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( c = (FLOAT *)malloc(sizeof(FLOAT) * m * n * COMPSIZE)) == NULL) {
@@ -199,10 +202,10 @@ int main(int argc, char *argv[]){
#endif
for (i = 0; i < m * k * COMPSIZE; i++) {
a[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[i] = ((IFLOAT) rand() / (IFLOAT) RAND_MAX) - 0.5;
}
for (i = 0; i < k * n * COMPSIZE; i++) {
b[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i] = ((IFLOAT) rand() / (IFLOAT) RAND_MAX) - 0.5;
}
for (i = 0; i < m * n * COMPSIZE; i++) {
c[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;

View File

@@ -181,9 +181,9 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

View File

@@ -197,7 +197,7 @@ int main(int argc, char *argv[]){
fprintf(stderr, " %6dx%d : ", (int)m,(int)n);
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[j + i * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)j + (long)i * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -234,7 +234,7 @@ int main(int argc, char *argv[]){
fprintf(stderr, " %6dx%d : ", (int)m,(int)n);
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[j + i * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)j + (long)i * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

View File

@@ -182,7 +182,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

View File

@@ -177,20 +177,20 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
b[i + j * m * COMPSIZE] = 0.0;
b[(long)i + (long)j * (long)m * COMPSIZE] = 0.0;
}
}
for (j = 0; j < m; ++j) {
for (i = 0; i < m * COMPSIZE; ++i) {
b[i] += a[i + j * m * COMPSIZE];
b[i] += a[(long)i + (long)j * (long)m * COMPSIZE];
}
}

View File

@@ -172,7 +172,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < to; j++){
for(i = 0; i < to * COMPSIZE; i++){
a[i + j * to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)to * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

210
benchmark/hbmv.c Normal file
View File

@@ -0,0 +1,210 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef HBMV
#ifdef DOUBLE
#define HBMV BLASFUNC(zhbmv)
#else
#define HBMV BLASFUNC(chbmv)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz) {
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size) {
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {0.0, 0.0};
blasint k = 1;
char uplo='L';
blasint m, i, j;
blasint inc_x=1, inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_K"))) k = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = '%c' k = %d Inc_x = %d Inc_y = %d Loops = %d\n",
from, to, step, uplo, k, inc_x, inc_y, loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step) {
timeg=0;
fprintf(stderr, " %6dx%d : ", (int)m, (int)m);
for(j = 0; j < m; j++) {
for(i = 0; i < m * COMPSIZE; i++) {
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (l = 0; l < loops; l++) {
for (i = 0; i < m * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (i = 0; i < m * COMPSIZE * abs(inc_y); i++) {
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
HBMV (&uplo, &m, &k, alpha, a, &m, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)(2 * k + 1) * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -164,9 +164,9 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -178,8 +178,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / time1 * 1.e-6);

View File

@@ -167,7 +167,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

186
benchmark/her.c Normal file
View File

@@ -0,0 +1,186 @@
/***************************************************************************
Copyright (c) 2020, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef HER
#ifdef DOUBLE
#define HER BLASFUNC(zher)
#else
#define HER BLASFUNC(cher)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x;
FLOAT alpha[] = {1.0, 1.0};
blasint incx = 1;
char *p;
char uplo='U';
char trans='N';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
blasint m, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Trans = %c\n", from, to, step,uplo,trans);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
fprintf(stderr, " %6d : ", (int)m);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
x[ (long)j * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
HER (&uplo, &m, alpha, x, &incx, a, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m / time1 * 1.e-6);
}
return 0;
}

190
benchmark/her2.c Normal file
View File

@@ -0,0 +1,190 @@
/***************************************************************************
Copyright (c) 2020, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef HER2
#ifdef DOUBLE
#define HER2 BLASFUNC(zher2)
#else
#define HER2 BLASFUNC(cher2)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
blasint inc = 1;
char *p;
char uplo='U';
char trans='N';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
blasint m, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Trans = %c\n", from, to, step,uplo,trans);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
fprintf(stderr, " %6d : ", (int)m);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
x[ (long)j * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
y[ (long)j * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
HER2 (&uplo, &m, alpha, x, &inc, y, &inc, a, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / time1 * 1.e-6);
}
return 0;
}

View File

@@ -163,9 +163,9 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -177,8 +177,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / time1 * 1.e-6);

View File

@@ -162,8 +162,8 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -175,8 +175,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m * (double)m / time1 * 1.e-6);

207
benchmark/hpmv.c Normal file
View File

@@ -0,0 +1,207 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef HPMV
#ifdef DOUBLE
#define HPMV BLASFUNC(zhpmv)
#else
#define HPMV BLASFUNC(chpmv)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz) {
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size) {
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
char uplo='L';
blasint m, i, j;
blasint inc_x=1, inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = '%c' Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,uplo,inc_x,inc_y,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");
exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step) {
timeg=0;
fprintf(stderr, " %6dx%d : ", (int)m, (int)m);
for(j = 0; j < m; j++) {
for(i = 0; i < m * COMPSIZE; i++) {
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (l = 0; l < loops; l++) {
for (i = 0; i < m * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (i = 0; i < m * COMPSIZE * abs(inc_y); i++) {
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
HPMV (&uplo, &m, alpha, a, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -181,7 +181,7 @@ int main(int argc, char *argv[]){
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
" %10.2f MBytes %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}

192
benchmark/iamin.c Normal file
View File

@@ -0,0 +1,192 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef IAMIN
#ifdef COMPLEX
#ifdef DOUBLE
#define IAMIN BLASFUNC(izamin)
#else
#define IAMIN BLASFUNC(icamin)
#endif
#else
#ifdef DOUBLE
#define IAMIN BLASFUNC(idamin)
#else
#define IAMIN BLASFUNC(isamin)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
IAMIN (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

186
benchmark/imax.c Normal file
View File

@@ -0,0 +1,186 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef IMAX
#ifndef COMPLEX
#ifdef DOUBLE
#define IMAX BLASFUNC(idmax)
#else
#define IMAX BLASFUNC(ismax)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
IMAX (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

186
benchmark/imin.c Normal file
View File

@@ -0,0 +1,186 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef IMIN
#ifndef COMPLEX
#ifdef DOUBLE
#define IMIN BLASFUNC(idmin)
#else
#define IMIN BLASFUNC(ismin)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
IMIN (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -186,7 +186,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -194,7 +194,7 @@ int main(int argc, char *argv[]){
for (j = 0; j < m; ++j) {
for (i = 0; i < m * COMPSIZE; ++i) {
b[i] += a[i + j * m * COMPSIZE];
b[i] += a[(long)i + (long)j * (long)m * COMPSIZE];
}
}

185
benchmark/max.c Normal file
View File

@@ -0,0 +1,185 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef NAMAX
#ifndef COMPLEX
#ifdef DOUBLE
#define NAMAX BLASFUNC(dmax)
#else
#define NAMAX BLASFUNC(smax)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
NAMAX (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

185
benchmark/min.c Normal file
View File

@@ -0,0 +1,185 @@
/***************************************************************************
Copyright (c) 2016, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef NAMIN
#ifndef COMPLEX
#ifdef DOUBLE
#define NAMIN BLASFUNC(dmin)
#else
#define NAMIN BLASFUNC(smin)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x;
blasint m, i;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
fprintf(stderr, "From : %3d To : %3d Step = %3d Inc_x = %d Loops = %d\n", from, to, step,inc_x,loops);
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
NAMIN (&m, x, &inc_x);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * sizeof(FLOAT) * 1. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -170,46 +170,46 @@ int main(int argc, char *argv[]){
#ifndef COMPLEX
if (uplos & 1) {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) a[i + j * m] = 0.;
a[j + j * m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[i + j * m] = ((double) rand() / (double) RAND_MAX) - 0.5;
for(i = 0; i < j; i++) a[(long)i + (long)j * (long)m] = 0.;
a[(long)j + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[(long)i + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
} else {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) a[i + j * m] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[j + j * m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[i + j * m] = 0.;
for(i = 0; i < j; i++) a[(long)i + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(long)j + (long)j * (long)m] = ((double) rand() / (double) RAND_MAX) + 8.;
for(i = j + 1; i < m; i++) a[(long)i + (long)j * (long)m] = 0.;
}
}
#else
if (uplos & 1) {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) {
a[(i + j * m) * 2 + 0] = 0.;
a[(i + j * m) * 2 + 1] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 1] = 0.;
}
a[(j + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[(j + j * m) * 2 + 1] = 0.;
a[((long)j + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[((long)j + (long)j * (long)m) * 2 + 1] = 0.;
for(i = j + 1; i < m; i++) {
a[(i + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(i + j * m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0;
a[((long)i + (long)j * (long)m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
}
} else {
for (j = 0; j < m; j++) {
for(i = 0; i < j; i++) {
a[(i + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[(i + j * m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 1] = ((double) rand() / (double) RAND_MAX) - 0.5;
}
a[(j + j * m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[(j + j * m) * 2 + 1] = 0.;
a[((long)j + (long)j * (long)m) * 2 + 0] = ((double) rand() / (double) RAND_MAX) + 8.;
a[((long)j + (long)j * (long)m) * 2 + 1] = 0.;
for(i = j + 1; i < m; i++) {
a[(i + j * m) * 2 + 0] = 0.;
a[(i + j * m) * 2 + 1] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 0] = 0.;
a[((long)i + (long)j * (long)m) * 2 + 1] = 0.;
}
}
}

View File

@@ -32,9 +32,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#endif
#include "common.h"
#undef ROT
#undef DOT
#ifndef COMPLEX
#ifdef DOUBLE
#define ROT BLASFUNC(drot)
@@ -42,6 +42,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define ROT BLASFUNC(srot)
#endif
#else
#ifdef DOUBLE
#define ROT BLASFUNC(zdrot)
#else
#define ROT BLASFUNC(csrot)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
@@ -160,17 +169,16 @@ int main(int argc, char *argv[]){
fprintf(stderr, " %6d : ", (int)m);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
ROT (&m, x, &inc_x, y, &inc_y, c, s);
@@ -179,13 +187,13 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
timeg += time1;
}
}
timeg /= loops;
timeg /= loops;
fprintf(stderr,
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 6. * (double)m / timeg * 1.e-6, timeg);

210
benchmark/rotm.c Normal file
View File

@@ -0,0 +1,210 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef ROTM
#ifdef DOUBLE
#define ROTM BLASFUNC(drotm)
#else
#define ROTM BLASFUNC(srotm)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz)
{
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv) {
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size)
{
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =
shmget(IPC_PRIVATE, (size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT | 0600)) < 0) {
printf("Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1) {
printf("Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[])
{
FLOAT *x, *y;
// FLOAT result;
blasint m, i;
blasint inc_x = 1, inc_y = 1;
FLOAT param[5] = {1, 2.0, 3.0, 4.0, 5.0};
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1, timeg;
argc--;
argv++;
if (argc > 0) {
from = atol(*argv);
argc--;
argv++;
}
if (argc > 0) {
to = MAX(atol(*argv), from);
argc--;
argv++;
}
if (argc > 0) {
step = atol(*argv);
argc--;
argv++;
}
if ((p = getenv("OPENBLAS_LOOPS")))
loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX")))
inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY")))
inc_y = atoi(p);
fprintf(
stderr,
"From : %3d To : %3d Step = %3d Inc_x = %d Inc_y = %d Loops = %d\n",
from, to, step, inc_x, inc_y, loops);
if ((x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) ==
NULL) {
fprintf(stderr, "Out of Memory!!\n");
exit(1);
}
if ((y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) ==
NULL) {
fprintf(stderr, "Out of Memory!!\n");
exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for (m = from; m <= to; m += step) {
timeg = 0;
fprintf(stderr, " %6d : ", (int)m);
for (i = 0; i < m * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT)rand() / (FLOAT)RAND_MAX) - 0.5;
}
for (i = 0; i < m * COMPSIZE * abs(inc_y); i++) {
y[i] = ((FLOAT)rand() / (FLOAT)RAND_MAX) - 0.5;
}
for (l = 0; l < loops; l++) {
gettimeofday(&start, (struct timezone *)0);
ROTM(&m, x, &inc_x, y, &inc_y, param);
gettimeofday(&stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) +
(double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 6. * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}

219
benchmark/spmv.c Normal file
View File

@@ -0,0 +1,219 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SPMV
#ifndef COMPLEX
#ifdef DOUBLE
#define SPMV BLASFUNC(dspmv)
#else
#define SPMV BLASFUNC(sspmv)
#endif
#else
#ifdef DOUBLE
#define SPMV BLASFUNC(zspmv)
#else
#define SPMV BLASFUNC(cspmv)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
char uplo='L';
blasint m, i, j;
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_INCY"))) inc_y = atoi(p);
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = '%c' Inc_x = %d Inc_y = %d Loops = %d\n", from, to, step,uplo,inc_x,inc_y,loops);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6dx%d : ", (int)m,(int)m);
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (l=0; l<loops; l++)
{
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
SPMV (&uplo, &m, alpha, a, x, &inc_x, beta, y, &inc_y );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / timeg * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

198
benchmark/spr.c Executable file
View File

@@ -0,0 +1,198 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SPR
#ifdef DOUBLE
#define SPR BLASFUNC(dspr)
#else
#define SPR BLASFUNC(sspr)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a,*c;
FLOAT alpha[] = {1.0, 1.0};
blasint inc_x=1;
int loops = 1;
int l;
char *p;
char uplo='U';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
blasint m, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Inc_x = %d\n", from, to, step,uplo,inc_x);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( c = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops Time\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
c[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
SPR (&uplo, &m, alpha, c, &inc_x, a);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

207
benchmark/spr2.c Executable file
View File

@@ -0,0 +1,207 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SPR2
#ifdef DOUBLE
#define SPR2 BLASFUNC(dspr2)
#else
#define SPR2 BLASFUNC(sspr2)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a,*b,*c;
FLOAT alpha[] = {1.0, 1.0};
blasint inc_x=1,inc_y=1;
int loops = 1;
int l;
char *p;
char uplo='U';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
blasint m, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1,timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Inc_x = %d Inc_y = %d\n", from, to, step,uplo,inc_x,inc_y);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( b = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( c = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops Time\n");
for(m = from; m <= to; m += step)
{
timeg=0;
fprintf(stderr, " %6d : ", (int)m);
for (l=0; l<loops; l++)
{
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
b[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
c[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
SPR2 (&uplo, &m, alpha, c, &inc_x, b, &inc_y, a);
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
timeg += time1;
}
timeg /= loops;
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / timeg * 1.e-6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -175,9 +175,9 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -189,8 +189,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / time1 * 1.e-6);

View File

@@ -177,7 +177,7 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

185
benchmark/syr.c Normal file
View File

@@ -0,0 +1,185 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SYR
#ifdef DOUBLE
#define SYR BLASFUNC(dsyr)
#else
#define SYR BLASFUNC(ssyr)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x,*a;
FLOAT alpha[] = {1.0, 1.0};
char *p;
char uplo='U';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
blasint m, i, j;
blasint inc_x= 1;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Inc_x = %d\n", from, to, step,uplo,inc_x);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
fprintf(stderr, " %6d : ", (int)m);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
gettimeofday( &start, (struct timezone *)0);
SYR (&uplo, &m, alpha, x, &inc_x, a, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m / time1 * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

194
benchmark/syr2.c Normal file
View File

@@ -0,0 +1,194 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef SYR2
#ifdef DOUBLE
#define SYR2 BLASFUNC(dsyr2)
#else
#define SYR2 BLASFUNC(ssyr2)
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *x, *y, *a;
FLOAT alpha[] = {1.0, 1.0};
char *p;
char uplo='U';
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
blasint m, i, j;
blasint inc_x= 1;
blasint inc_y= 1;
int from = 1;
int to = 200;
int step = 1;
struct timeval start, stop;
double time1;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Inc_x = %d Inc_y = %d\n", from, to, step,uplo,inc_x,inc_y);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( y = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_y) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(m = from; m <= to; m += step)
{
fprintf(stderr, " %6d : ", (int)m);
for(i = 0; i < m * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(i = 0; i < m * COMPSIZE * abs(inc_y); i++){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
gettimeofday( &start, (struct timezone *)0);
SYR2 (&uplo, &m, alpha, x, &inc_x, y, &inc_y, a, &m );
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m / time1 * 1.e-6);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -175,9 +175,9 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -189,8 +189,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 2. * (double)m * (double)m * (double)m / time1 * 1.e-6);

View File

@@ -172,8 +172,8 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
c[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -185,8 +185,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m * (double)m / time1 * 1.e-6);

172
benchmark/tpmv.c Normal file
View File

@@ -0,0 +1,172 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef TPMV
#ifndef COMPLEX
#ifdef DOUBLE
#define TPMV BLASFUNC(dtpmv)
#else
#define TPMV BLASFUNC(stpmv)
#endif
#else
#ifdef DOUBLE
#define TPMV BLASFUNC(ztpmv)
#else
#define TPMV BLASFUNC(ctpmv)
#endif
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size)
{
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1) {
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[])
{
FLOAT *a, *x;
char *p;
char uplo ='U';
char trans='N';
char diag ='U';
int loops = 1;
int l;
blasint inc_x=1;
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
if ((p = getenv("OPENBLAS_DIAG"))) diag=*p;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
blasint n, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timespec start = { 0, 0 }, stop = { 0, 0 };
double time1, timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Trans = %c Diag = %c Loops=%d Inc_x=%d\n", from,
to, step, uplo, trans, diag, loops, inc_x);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(n = from; n <= to; n += step) {
timeg=0;
fprintf(stderr, " %6d : ", (int)n);
for(j = 0; j < n; j++) {
for(i = 0; i < n * COMPSIZE; i++) {
a[(long)i + (long)j * (long)n * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (i = 0; i < n * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l = 0; l < loops; l++) {
clock_gettime(CLOCK_REALTIME, &start);
TPMV (&uplo, &trans, &diag, &n, a, x, &inc_x);
clock_gettime(CLOCK_REALTIME, &stop);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_nsec - start.tv_nsec)) / 1.e9;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops %12.9f sec\n",
COMPSIZE * COMPSIZE * 1. * (double)n * (double)n / timeg / 1.e6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

172
benchmark/tpsv.c Normal file
View File

@@ -0,0 +1,172 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef TPSV
#ifndef COMPLEX
#ifdef DOUBLE
#define TPSV BLASFUNC(dtpsv)
#else
#define TPSV BLASFUNC(stpsv)
#endif
#else
#ifdef DOUBLE
#define TPSV BLASFUNC(ztpsv)
#else
#define TPSV BLASFUNC(ctpsv)
#endif
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size)
{
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1) {
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[])
{
FLOAT *a, *x;
char *p;
char uplo ='U';
char trans='N';
char diag ='U';
int loops = 1;
int l;
blasint inc_x=1;
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
if ((p = getenv("OPENBLAS_DIAG"))) diag=*p;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
blasint n, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timespec start = { 0, 0 }, stop = { 0, 0 };
double time1, timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Trans = %c Diag = %c Loops=%d Inc_x=%d\n", from,
to, step, uplo, trans, diag, loops, inc_x);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(n = from; n <= to; n += step) {
timeg=0;
fprintf(stderr, " %6d : ", (int)n);
for(j = 0; j < n; j++) {
for(i = 0; i < n * COMPSIZE; i++) {
a[(long)i + (long)j * (long)n * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (i = 0; i < n * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l = 0; l < loops; l++) {
clock_gettime(CLOCK_REALTIME, &start);
TPSV (&uplo, &trans, &diag, &n, a, x, &inc_x);
clock_gettime(CLOCK_REALTIME, &stop);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_nsec - start.tv_nsec)) / 1.e9;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops %12.9f sec\n",
COMPSIZE * COMPSIZE * 1. * (double)n * (double)n / timeg / 1.e6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -175,8 +175,8 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
@@ -188,8 +188,6 @@ int main(int argc, char *argv[]){
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;
gettimeofday( &start, (struct timezone *)0);
fprintf(stderr,
" %10.2f MFlops %10.6f sec\n",
COMPSIZE * COMPSIZE * 1. * (double)m * (double)m * (double)m / time1 * 1.e-6, time1);

172
benchmark/trmv.c Normal file
View File

@@ -0,0 +1,172 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include "common.h"
#undef TRMV
#ifndef COMPLEX
#ifdef DOUBLE
#define TRMV BLASFUNC(dtrmv)
#else
#define TRMV BLASFUNC(strmv)
#endif
#else
#ifdef DOUBLE
#define TRMV BLASFUNC(ztrmv)
#else
#define TRMV BLASFUNC(ctrmv)
#endif
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size)
{
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1) {
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[])
{
FLOAT *a, *x;
char *p;
char uplo ='U';
char trans='N';
char diag ='U';
int loops = 1;
int l;
blasint inc_x=1;
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
if ((p = getenv("OPENBLAS_TRANS"))) trans=*p;
if ((p = getenv("OPENBLAS_DIAG"))) diag=*p;
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
blasint n, i, j;
int from = 1;
int to = 200;
int step = 1;
struct timespec start = { 0, 0 }, stop = { 0, 0 };
double time1, timeg;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
fprintf(stderr, "From : %3d To : %3d Step = %3d Uplo = %c Trans = %c Diag = %c Loops=%d Inc_x=%d\n", from,
to, step, uplo, trans, diag, loops, inc_x);
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * to * to * COMPSIZE)) == NULL) {
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * to * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
for(n = from; n <= to; n += step) {
timeg=0;
fprintf(stderr, " %6d : ", (int)n);
for(j = 0; j < n; j++) {
for(i = 0; i < n * COMPSIZE; i++) {
a[(long)i + (long)j * (long)n * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for (i = 0; i < n * COMPSIZE * abs(inc_x); i++) {
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for (l = 0; l < loops; l++) {
clock_gettime(CLOCK_REALTIME, &start);
TRMV (&uplo, &trans, &diag, &n, a, &n, x, &inc_x);
clock_gettime(CLOCK_REALTIME, &stop);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_nsec - start.tv_nsec)) / 1.e9;
timeg += time1;
}
timeg /= loops;
fprintf(stderr, " %10.2f MFlops %12.9f sec\n",
COMPSIZE * COMPSIZE * 1. * (double)n * (double)n / timeg / 1.e6, timeg);
}
return 0;
}
// void main(int argc, char *argv[]) __attribute__((weak, alias("MAIN__")));

View File

@@ -191,8 +191,8 @@ int main(int argc, char *argv[]){
for(j = 0; j < m; j++){
for(i = 0; i < m * COMPSIZE; i++){
a[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[i + j * m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
a[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
b[(long)i + (long)j * (long)m * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}

222
benchmark/trsv.c Normal file
View File

@@ -0,0 +1,222 @@
/***************************************************************************
Copyright (c) 2014, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#ifdef __CYGWIN32__
#include <sys/time.h>
#endif
#include <time.h>
#include "common.h"
#undef GEMV
#undef TRSV
#ifndef COMPLEX
#ifdef DOUBLE
#define TRSV BLASFUNC(dtrsv)
#else
#define TRSV BLASFUNC(strsv)
#endif
#else
#ifdef DOUBLE
#define TRSV BLASFUNC(ztrsv)
#else
#define TRSV BLASFUNC(ctrsv)
#endif
#endif
#if defined(__WIN32__) || defined(__WIN64__)
#ifndef DELTA_EPOCH_IN_MICROSECS
#define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL
#endif
int gettimeofday(struct timeval *tv, void *tz){
FILETIME ft;
unsigned __int64 tmpres = 0;
static int tzflag;
if (NULL != tv)
{
GetSystemTimeAsFileTime(&ft);
tmpres |= ft.dwHighDateTime;
tmpres <<= 32;
tmpres |= ft.dwLowDateTime;
/*converting file time to unix epoch*/
tmpres /= 10; /*convert into microseconds*/
tmpres -= DELTA_EPOCH_IN_MICROSECS;
tv->tv_sec = (long)(tmpres / 1000000UL);
tv->tv_usec = (long)(tmpres % 1000000UL);
}
return 0;
}
#endif
#if !defined(__WIN32__) && !defined(__WIN64__) && !defined(__CYGWIN32__) && 0
static void *huge_malloc(BLASLONG size){
int shmid;
void *address;
#ifndef SHM_HUGETLB
#define SHM_HUGETLB 04000
#endif
if ((shmid =shmget(IPC_PRIVATE,
(size + HUGE_PAGESIZE) & ~(HUGE_PAGESIZE - 1),
SHM_HUGETLB | IPC_CREAT |0600)) < 0) {
printf( "Memory allocation failed(shmget).\n");
exit(1);
}
address = shmat(shmid, NULL, SHM_RND);
if ((BLASLONG)address == -1){
printf( "Memory allocation failed(shmat).\n");
exit(1);
}
shmctl(shmid, IPC_RMID, 0);
return address;
}
#define malloc huge_malloc
#endif
int main(int argc, char *argv[]){
FLOAT *a, *x;
blasint n = 0, i, j;
blasint inc_x=1;
int loops = 1;
int l;
char *p;
int from = 1;
int to = 200;
int step = 1;
struct timespec time_start, time_end;
time_t seconds = 0;
double time1,timeg;
long long nanos = 0;
argc--;argv++;
if (argc > 0) { from = atol(*argv); argc--; argv++;}
if (argc > 0) { to = MAX(atol(*argv), from); argc--; argv++;}
if (argc > 0) { step = atol(*argv); argc--; argv++;}
char uplo ='L';
char transa = 'N';
char diag ='U';
if ((p = getenv("OPENBLAS_LOOPS"))) loops = atoi(p);
if ((p = getenv("OPENBLAS_INCX"))) inc_x = atoi(p);
if ((p = getenv("OPENBLAS_TRANSA"))) transa=*p;
if ((p = getenv("OPENBLAS_DIAG"))) diag=*p;
if ((p = getenv("OPENBLAS_UPLO"))) uplo=*p;
fprintf(stderr, "From : %3d To : %3d Step = %3d Transa = '%c' Inc_x = %d uplo=%c diag=%c loop = %d\n", from, to, step,transa,inc_x,
uplo,diag,loops);
#ifdef linux
srandom(getpid());
#endif
fprintf(stderr, " SIZE Flops\n");
fprintf(stderr, "============================================\n");
for(n = from; n <= to; n += step)
{
timeg=0;
if (( a = (FLOAT *)malloc(sizeof(FLOAT) * n * n * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
if (( x = (FLOAT *)malloc(sizeof(FLOAT) * n * abs(inc_x) * COMPSIZE)) == NULL){
fprintf(stderr,"Out of Memory!!\n");exit(1);
}
for(j = 0; j < n; j++){
for(i = 0; i < n * COMPSIZE; i++){
a[i + j * n * COMPSIZE] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
}
for(i = 0; i < n * COMPSIZE * abs(inc_x); i++){
x[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
for(l =0;l< loops;l++){
clock_gettime(CLOCK_PROCESS_CPUTIME_ID,&time_start);
TRSV(&uplo,&transa,&diag,&n,a,&n,x,&inc_x);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID,&time_end);
nanos = time_end.tv_nsec - time_start.tv_nsec;
seconds = time_end.tv_sec - time_start.tv_sec;
time1 = seconds + nanos /1.e9;
timeg += time1;
}
timeg /= loops;
long long muls = n*(n+1)/2.0;
long long adds = (n - 1.0)*n/2.0;
fprintf(stderr, "%10d %10.2f MFlops %10.6f sec\n", n,(muls+adds) / timeg * 1.e-6, timeg);
if(a != NULL){
free(a);
}
if( x != NULL){
free(x);
}
}
return 0;
}

View File

@@ -170,9 +170,11 @@ int main(int argc, char *argv[]){
y[i] = ((FLOAT) rand() / (FLOAT) RAND_MAX) - 0.5;
}
gettimeofday( &start, (struct timezone *)0);
#ifdef RETURN_BY_STACK
DOT (&result , &m, x, &inc_x, y, &inc_y );
#else
result = DOT (&m, x, &inc_x, y, &inc_y );
#endif
gettimeofday( &stop, (struct timezone *)0);
time1 = (double)(stop.tv_sec - start.tv_sec) + (double)((stop.tv_usec - start.tv_usec)) * 1.e-6;

54
c_check
View File

@@ -6,6 +6,7 @@
# Checking cross compile
$hostos = `uname -s | sed -e s/\-.*//`; chop($hostos);
$hostarch = `uname -m | sed -e s/i.86/x86/`;chop($hostarch);
$hostarch = `uname -p` if ($hostos eq "AIX");
$hostarch = "x86_64" if ($hostarch eq "amd64");
$hostarch = "arm" if ($hostarch =~ /^arm.*/);
$hostarch = "arm64" if ($hostarch eq "aarch64");
@@ -18,11 +19,12 @@ $binary = $ENV{"BINARY"};
$makefile = shift(@ARGV);
$config = shift(@ARGV);
$compiler_name = join(" ", @ARGV);
$compiler_name = shift(@ARGV);
$flags = join(" ", @ARGV);
# First, we need to know the target OS and compiler name
$data = `$compiler_name -E ctest.c`;
$data = `$compiler_name $flags -E ctest.c`;
if ($?) {
printf STDERR "C Compiler ($compiler_name) is something wrong.\n";
@@ -175,7 +177,7 @@ if ($defined == 0) {
# Do again
$data = `$compiler_name -E ctest.c`;
$data = `$compiler_name $flags -E ctest.c`;
if ($?) {
printf STDERR "C Compiler ($compiler_name) is something wrong.\n";
@@ -195,7 +197,7 @@ if (($architecture eq "mips") || ($architecture eq "mips64")) {
print $tmpf "void main(void){ __asm__ volatile($code); }\n";
$args = "$msa_flags -o $tmpf.o $tmpf";
my @cmd = ("$compiler_name $args");
my @cmd = ("$compiler_name $flags $args >/dev/null 2>/dev/null");
system(@cmd) == 0;
if ($? != 0) {
$have_msa = 0;
@@ -236,7 +238,7 @@ if (($architecture eq "x86") || ($architecture eq "x86_64")) {
if ($compiler eq "PGI") {
$args = " -tp skylake -c -o $tmpf.o $tmpf";
}
my @cmd = ("$compiler_name $args >/dev/null 2>/dev/null");
my @cmd = ("$compiler_name $flags $args >/dev/null 2>/dev/null");
system(@cmd) == 0;
if ($? != 0) {
$no_avx512 = 1;
@@ -247,7 +249,29 @@ if (($architecture eq "x86") || ($architecture eq "x86_64")) {
}
}
$data = `$compiler_name -S ctest1.c && grep globl ctest1.s | head -n 1 && rm -f ctest1.s`;
$c11_atomics = 0;
if ($data =~ /HAVE_C11/) {
eval "use File::Temp qw(tempfile)";
if ($@){
warn "could not load PERL module File::Temp, so could not check compiler compatibility with C11";
$c11_atomics = 0;
} else {
($fh,$tmpf) = tempfile( SUFFIX => '.c' , UNLINK => 1 );
print $tmpf "#include <stdatomic.h>\nint main(void){}\n";
$args = " -c -o $tmpf.o $tmpf";
my @cmd = ("$compiler_name $flags $args >/dev/null 2>/dev/null");
system(@cmd) == 0;
if ($? != 0) {
$c11_atomics = 0;
} else {
$c11_atomics = 1;
}
unlink("$tmpf.o");
}
}
$data = `$compiler_name $flags -S ctest1.c && grep globl ctest1.s | head -n 1 && rm -f ctest1.s`;
$data =~ /globl\s([_\.]*)(.*)/;
@@ -263,19 +287,6 @@ if ($architecture ne $hostarch) {
$cross = 1 if ($os ne $hostos);
# rework cross suffix and architecture if we are on OSX cross-compiling for ARMV8-based IOS
# the initial autodetection will have been confused by the command-line arguments to clang
# and the cross-compiler apparently still claims to build for x86_64 in its CC -E output
if (($os eq "Darwin") && ($cross_suffix ne "")) {
my $tmpnam = `xcrun --sdk iphoneos --find clang`;
$cross_suffix = substr($tmpnam, 0, rindex($tmpnam, "/")+1 );
# this should produce something like $cross_suffix="/Applications/Xcode-10.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/";
$cross =1;
$architecture = arm64;
}
$openmp = "" if $ENV{USE_OPENMP} != 1;
$linker_L = "";
@@ -283,7 +294,7 @@ $linker_l = "";
$linker_a = "";
{
$link = `$compiler_name -c ctest2.c -o ctest2.o 2>&1 && $compiler_name $openmp -v ctest2.o -o ctest2 2>&1 && rm -f ctest2.o ctest2 ctest2.exe`;
$link = `$compiler_name $flags -c ctest2.c -o ctest2.o 2>&1 && $compiler_name $flags $openmp -v ctest2.o -o ctest2 2>&1 && rm -f ctest2.o ctest2 ctest2.exe`;
$link =~ s/\-Y\sP\,/\-Y/g;
@@ -322,6 +333,7 @@ $linker_a = "";
&& ($flags !~ /advapi32/)
&& ($flags !~ /shell32/)
&& ($flags !~ /omp/)
&& ($flags !~ /[0-9]+/)
) {
$linker_l .= $flags . " "
}
@@ -362,6 +374,8 @@ print CONFFILE "#define __32BIT__\t1\n" if $binformat eq bin32;
print CONFFILE "#define __64BIT__\t1\n" if $binformat eq bin64;
print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if $need_fu ne "";
print CONFFILE "#define HAVE_MSA\t1\n" if $have_msa eq 1;
print CONFFILE "#define HAVE_C11\t1\n" if $c11_atomics eq 1;
if ($os eq "LINUX") {

16
cblas.h
View File

@@ -25,6 +25,11 @@ char* openblas_get_config(void);
/*Get the CPU corename on runtime.*/
char* openblas_get_corename(void);
#ifdef OPENBLAS_OS_LINUX
/* Sets thread affinity for OpenBLAS threads. `thread_idx` is in [0, openblas_get_num_threads()-1]. */
int openblas_setaffinity(int thread_idx, size_t cpusetsize, cpu_set_t* cpu_set);
#endif
/* Get the parallelization type which is used by OpenBLAS */
int openblas_get_parallel(void);
/* OpenBLAS is compiled for sequential use */
@@ -377,6 +382,17 @@ void cblas_cgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint
void cblas_zgeadd(OPENBLAS_CONST enum CBLAS_ORDER CORDER,OPENBLAS_CONST blasint crows, OPENBLAS_CONST blasint ccols, OPENBLAS_CONST double *calpha, double *a, OPENBLAS_CONST blasint clda, OPENBLAS_CONST double *cbeta,
double *c, OPENBLAS_CONST blasint cldc);
void cblas_sgemm_batch(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransA_array, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransB_array, OPENBLAS_CONST blasint * M_array, OPENBLAS_CONST blasint * N_array, OPENBLAS_CONST blasint * K_array,
OPENBLAS_CONST float * alpha_array, OPENBLAS_CONST float ** A_array, OPENBLAS_CONST blasint * lda_array, OPENBLAS_CONST float ** B_array, OPENBLAS_CONST blasint * ldb_array, OPENBLAS_CONST float * beta_array, float ** C_array, OPENBLAS_CONST blasint * ldc_array, OPENBLAS_CONST blasint group_count, OPENBLAS_CONST blasint * group_size);
void cblas_dgemm_batch(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransA_array, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransB_array, OPENBLAS_CONST blasint * M_array, OPENBLAS_CONST blasint * N_array, OPENBLAS_CONST blasint * K_array,
OPENBLAS_CONST double * alpha_array, OPENBLAS_CONST double ** A_array, OPENBLAS_CONST blasint * lda_array, OPENBLAS_CONST double ** B_array, OPENBLAS_CONST blasint * ldb_array, OPENBLAS_CONST double * beta_array, double ** C_array, OPENBLAS_CONST blasint * ldc_array, OPENBLAS_CONST blasint group_count, OPENBLAS_CONST blasint * group_size);
void cblas_cgemm_batch(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransA_array, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransB_array, OPENBLAS_CONST blasint * M_array, OPENBLAS_CONST blasint * N_array, OPENBLAS_CONST blasint * K_array,
OPENBLAS_CONST void * alpha_array, OPENBLAS_CONST void ** A_array, OPENBLAS_CONST blasint * lda_array, OPENBLAS_CONST void ** B_array, OPENBLAS_CONST blasint * ldb_array, OPENBLAS_CONST void * beta_array, void ** C_array, OPENBLAS_CONST blasint * ldc_array, OPENBLAS_CONST blasint group_count, OPENBLAS_CONST blasint * group_size);
void cblas_zgemm_batch(OPENBLAS_CONST enum CBLAS_ORDER Order, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransA_array, OPENBLAS_CONST enum CBLAS_TRANSPOSE * TransB_array, OPENBLAS_CONST blasint * M_array, OPENBLAS_CONST blasint * N_array, OPENBLAS_CONST blasint * K_array,
OPENBLAS_CONST void * alpha_array, OPENBLAS_CONST void ** A_array, OPENBLAS_CONST blasint * lda_array, OPENBLAS_CONST void ** B_array, OPENBLAS_CONST blasint * ldb_array, OPENBLAS_CONST void * beta_array, void ** C_array, OPENBLAS_CONST blasint * ldc_array, OPENBLAS_CONST blasint group_count, OPENBLAS_CONST blasint * group_size);
#ifdef __cplusplus
}

View File

@@ -45,11 +45,11 @@ endif ()
if (DYNAMIC_ARCH)
if (ARM64)
set(DYNAMIC_CORE ARMV8 CORTEXA53 CORTEXA57 CORTEXA72 CORTEXA73 FALKOR THUNDERX THUNDERX2T99 TSV110)
set(DYNAMIC_CORE ARMV8 CORTEXA53 CORTEXA57 CORTEXA72 CORTEXA73 FALKOR THUNDERX THUNDERX2T99 TSV110 EMAG8180 NEOVERSEN1 THUNDERX3T110)
endif ()
if (POWER)
set(DYNAMIC_CORE POWER6 POWER8 POWER9)
set(DYNAMIC_CORE POWER6 POWER8 POWER9 POWER10)
endif ()
if (X86)
@@ -76,9 +76,9 @@ if (DYNAMIC_ARCH)
set(DYNAMIC_CORE ${DYNAMIC_CORE} HASWELL ZEN)
endif ()
if (NOT NO_AVX512)
set(DYNAMIC_CORE ${DYNAMIC_CORE} SKYLAKEX)
set(DYNAMIC_CORE ${DYNAMIC_CORE} SKYLAKEX COOPERLAKE)
string(REGEX REPLACE "-march=native" "" CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
endif ()
endif ()
if (DYNAMIC_LIST)
set(DYNAMIC_CORE PRESCOTT ${DYNAMIC_LIST})
endif ()

View File

@@ -99,7 +99,20 @@ endif ()
if (${CORE} STREQUAL "SKYLAKEX")
if (NOT DYNAMIC_ARCH)
if (NOT NO_AVX512)
set (CCOMMON_OPT = "${CCOMMON_OPT} -march=skylake-avx512")
set (CCOMMON_OPT "${CCOMMON_OPT} -march=skylake-avx512")
endif ()
endif ()
endif ()
if (${CORE} STREQUAL "COOPERLAKE")
if (NOT DYNAMIC_ARCH)
if (NOT NO_AVX512)
execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion OUTPUT_VARIABLE GCC_VERSION)
if (${GCC_VERSION} VERSION_GREATER 10.1 OR ${GCC_VERSION} VERSION_EQUAL 10.1)
set (CCOMMON_OPT = "${CCOMMON_OPT} -march=cooperlake")
else ()
set (CCOMMON_OPT "${CCOMMON_OPT} -march=skylake-avx512")
endif()
endif ()
endif ()
endif ()

View File

@@ -21,7 +21,15 @@
# NEED2UNDERSCORES
if (NOT NO_LAPACK)
enable_language(Fortran)
include(CheckLanguage)
check_language(Fortran)
if(CMAKE_Fortran_COMPILER)
enable_language(Fortran)
else()
message(STATUS "No Fortran compiler found, can build only BLAS but not LAPACK")
set (NOFORTRAN 1)
set (NO_LAPACK 1)
endif()
else()
include(CMakeForceCompiler)
CMAKE_FORCE_Fortran_COMPILER(gfortran GNU)

View File

@@ -16,6 +16,7 @@ if (${F_COMPILER} STREQUAL "FLANG")
if (USE_OPENMP)
set(FCOMMON_OPT "${FCOMMON_OPT} -fopenmp")
endif ()
set(FCOMMON_OPT "${FCOMMON_OPT} -Mrecursive -Kieee")
endif ()
if (${F_COMPILER} STREQUAL "G77")
@@ -81,6 +82,7 @@ if (${F_COMPILER} STREQUAL "INTEL")
if (INTERFACE64)
set(FCOMMON_OPT "${FCOMMON_OPT} -i8")
endif ()
set(FCOMMON_OPT "${FCOMMON_OPT} -recursive")
if (USE_OPENMP)
set(FCOMMON_OPT "${FCOMMON_OPT} -openmp")
endif ()
@@ -120,6 +122,7 @@ if (${F_COMPILER} STREQUAL "PGI")
else ()
set(FCOMMON_OPT "${FCOMMON_OPT} -tp p7")
endif ()
set(FCOMMON_OPT "${FCOMMON_OPT} -Mrecursive")
if (USE_OPENMP)
set(FCOMMON_OPT "${FCOMMON_OPT} -mp")
endif ()

View File

@@ -113,11 +113,31 @@ macro(SetDefaultL1)
set(ZSUMKERNEL zsum.S)
set(QSUMKERNEL sum.S)
set(XSUMKERNEL zsum.S)
if (BUILD_HALF)
set(SHAMINKERNEL ../arm/amin.c)
set(SHAMAXKERNEL ../arm/amax.c)
set(SHMAXKERNEL ../arm/max.c)
set(SHMINKERNEL ../arm/min.c)
set(ISHAMAXKERNEL ../arm/iamax.c)
set(ISHAMINKERNEL ../arm/iamin.c)
set(ISHMAXKERNEL ../arm/imax.c)
set(ISHMINKERNEL ../arm/imin.c)
set(SHASUMKERNEL ../arm/asum.c)
set(SHAXPYKERNEL ../arm/axpy.c)
set(SHAXPBYKERNEL ../arm/axpby.c)
set(SHCOPYKERNEL ../arm/copy.c)
set(SHDOTKERNEL ../arm/dot.c)
set(SHROTKERNEL ../arm/rot.c)
set(SHSCALKERNEL ../arm/scal.c)
set(SHNRM2KERNEL ../arm/nrm2.c)
set(SHSUMKERNEL ../arm/sum.c)
set(SHSWAPKERNEL ../arm/swap.c)
endif ()
endmacro ()
macro(SetDefaultL2)
set(SGEMVNKERNEL gemv_n.S)
set(SGEMVTKERNEL gemv_t.S)
set(SGEMVNKERNEL ../arm/gemv_n.c)
set(SGEMVTKERNEL ../arm/gemv_t.c)
set(DGEMVNKERNEL gemv_n.S)
set(DGEMVTKERNEL gemv_t.S)
set(CGEMVNKERNEL zgemv_n.S)
@@ -161,6 +181,11 @@ macro(SetDefaultL2)
set(XHEMV_L_KERNEL ../generic/zhemv_k.c)
set(XHEMV_V_KERNEL ../generic/zhemv_k.c)
set(XHEMV_M_KERNEL ../generic/zhemv_k.c)
if (BUILD_HALF)
set(SHGEMVNKERNEL ../arm/gemv_n.c)
set(SHGEMVTKERNEL ../arm/gemv_t.c)
set(SHGERKERNEL ../generic/ger.c)
endif ()
endmacro ()
macro(SetDefaultL3)
@@ -168,4 +193,18 @@ macro(SetDefaultL3)
set(DGEADD_KERNEL ../generic/geadd.c)
set(CGEADD_KERNEL ../generic/zgeadd.c)
set(ZGEADD_KERNEL ../generic/zgeadd.c)
if (BUILD_HALF)
set(SHGEADD_KERNEL ../generic/geadd.c)
set(SHGEMMKERNEL ../generic/gemmkernel_2x2.c)
set(SHGEMM_BETA ../generic/gemm_beta.c)
set(SHGEMMINCOPY ../generic/gemm_ncopy_2.c)
set(SHGEMMITCOPY ../generic/gemm_tcopy_2.c)
set(SHGEMMONCOPY ../generic/gemm_ncopy_2.c)
set(SHGEMMOTCOPY ../generic/gemm_tcopy_2.c)
set(SHGEMMINCOPYOBJ shgemm_incopy.o)
set(SHGEMMITCOPYOBJ shgemm_itcopy.o)
set(SHGEMMONCOPYOBJ shgemm_oncopy.o)
set(SHGEMMOTCOPYOBJ shgemm_otcopy.o)
endif ()
endmacro ()

View File

@@ -7,5 +7,5 @@ Name: OpenBLAS
Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version
Version: @OPENBLAS_VERSION@
URL: https://github.com/xianyi/OpenBLAS
Libs: -L${libdir} -lopenblas${libsuffix}
Libs: @OpenMP_C_FLAGS@ -L${libdir} -lopenblas${libsuffix}
Cflags: -I${includedir}

View File

@@ -8,7 +8,7 @@ if (${CMAKE_SYSTEM_NAME} STREQUAL "Linux")
set(NO_EXPRECISION 1)
endif ()
if (${CMAKE_SYSTEM_NAME} MATCHES "FreeBSD|OpenBSD|NetBSD|DragonFly")
if (${CMAKE_SYSTEM_NAME} MATCHES "FreeBSD|OpenBSD|NetBSD|DragonFly|Darwin")
set(EXTRALIB "${EXTRALIB} -lm")
set(NO_EXPRECISION 1)
endif ()

View File

@@ -16,6 +16,8 @@
# HAVE_SSE2
# HAVE_SSE3
# MAKE
# SHGEMM_UNROLL_M
# SHGEMM_UNROLL_N
# SGEMM_UNROLL_M
# SGEMM_UNROLL_N
# DGEMM_UNROLL_M
@@ -193,8 +195,13 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
"#define HAVE_VFP\n"
"#define HAVE_NEON\n"
"#define ARMV8\n")
if ("${TCORE}" STREQUAL "CORTEXA57")
set(SGEMM_UNROLL_M 16)
set(SGEMM_UNROLL_N 4)
else ()
set(SGEMM_UNROLL_M 8)
set(SGEMM_UNROLL_N 8)
endif ()
set(DGEMM_UNROLL_M 8)
set(DGEMM_UNROLL_N 4)
set(CGEMM_UNROLL_M 8)
@@ -229,6 +236,33 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "NEOVERSEN1")
file(APPEND ${TARGET_CONF_TEMP}
"#define L1_CODE_SIZE\t65536\n"
"#define L1_CODE_LINESIZE\t64\n"
"#define L1_CODE_ASSOCIATIVE\t4\n"
"#define L1_DATA_SIZE\t65536\n"
"#define L1_DATA_LINESIZE\t64\n"
"#define L1_DATA_ASSOCIATIVE\t2\n"
"#define L2_SIZE\t1048576\n\n"
"#define L2_LINESIZE\t64\n"
"#define L2_ASSOCIATIVE\t16\n"
"#define DTB_DEFAULT_ENTRIES\t64\n"
"#define DTB_SIZE\t4096\n"
"#define HAVE_VFPV4\n"
"#define HAVE_VFPV3\n"
"#define HAVE_VFP\n"
"#define HAVE_NEON\n"
"#define ARMV8\n")
set(SGEMM_UNROLL_M 16)
set(SGEMM_UNROLL_N 4)
set(DGEMM_UNROLL_M 8)
set(DGEMM_UNROLL_N 4)
set(CGEMM_UNROLL_M 8)
set(CGEMM_UNROLL_N 4)
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "FALKOR")
file(APPEND ${TARGET_CONF_TEMP}
"#define L1_CODE_SIZE\t65536\n"
@@ -309,6 +343,33 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "THUNDERX3T110")
file(APPEND ${TARGET_CONF_TEMP}
"#define THUNDERX3T110\n"
"#define L1_CODE_SIZE\t65536\n"
"#define L1_CODE_LINESIZE\t64\n"
"#define L1_CODE_ASSOCIATIVE\t8\n"
"#define L1_DATA_SIZE\t65536\n"
"#define L1_DATA_LINESIZE\t64\n"
"#define L1_DATA_ASSOCIATIVE\t8\n"
"#define L2_SIZE\t524288\n"
"#define L2_LINESIZE\t64\n"
"#define L2_ASSOCIATIVE\t8\n"
"#define L3_SIZE\t94371840\n"
"#define L3_LINESIZE\t64\n"
"#define L3_ASSOCIATIVE\t32\n"
"#define DTB_DEFAULT_ENTRIES\t64\n"
"#define DTB_SIZE\t4096\n"
"#define ARMV8\n")
set(SGEMM_UNROLL_M 16)
set(SGEMM_UNROLL_N 4)
set(DGEMM_UNROLL_M 8)
set(DGEMM_UNROLL_N 4)
set(CGEMM_UNROLL_M 8)
set(CGEMM_UNROLL_N 4)
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "TSV110")
file(APPEND ${TARGET_CONF_TEMP}
"#define ARMV8\n"
@@ -332,6 +393,29 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "EMAG8180")
file(APPEND ${TARGET_CONF_TEMP}
"#define ARMV8\n"
"#define L1_CODE_SIZE\t32768\n"
"#define L1_CODE_LINESIZE\t64\n"
"#define L1_CODE_ASSOCIATIVE\t4\n"
"#define L1_DATA_SIZE\t32768\n"
"#define L1_DATA_LINESIZE\t64\n"
"#define L1_DATA_ASSOCIATIVE\t4\n"
"#define L2_SIZE\t5262144\n"
"#define L2_LINESIZE\t64\n"
"#define L2_ASSOCIATIVE\t8\n"
"#define DTB_DEFAULT_ENTRIES\t64\n"
"#define DTB_SIZE\t4096\n")
set(SGEMM_UNROLL_M 16)
set(SGEMM_UNROLL_N 4)
set(DGEMM_UNROLL_M 8)
set(DGEMM_UNROLL_N 4)
set(CGEMM_UNROLL_M 8)
set(CGEMM_UNROLL_N 4)
set(ZGEMM_UNROLL_M 4)
set(ZGEMM_UNROLL_N 4)
set(SYMV_P 16)
elseif ("${TCORE}" STREQUAL "POWER6")
file(APPEND ${TARGET_CONF_TEMP}
"#define L1_DATA_SIZE 32768\n"
@@ -368,7 +452,7 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
set(ZGEMM_UNROLL_M 8)
set(ZGEMM_UNROLL_N 2)
set(SYMV_P 8)
elseif ("${TCORE}" STREQUAL "POWER9")
elseif ("${TCORE}" STREQUAL "POWER9" OR "${TCORE}" STREQUAL "POWER10")
file(APPEND ${TARGET_CONF_TEMP}
"#define L1_DATA_SIZE 32768\n"
"#define L1_DATA_LINESIZE 128\n"
@@ -387,6 +471,8 @@ if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSS
set(ZGEMM_UNROLL_N 2)
set(SYMV_P 8)
endif()
set(SHGEMM_UNROLL_M 8)
set(SHGEMM_UNROLL_N 4)
# Or should this actually be NUM_CORES?
if (${NUM_THREADS} GREATER 0)
@@ -438,7 +524,7 @@ else(NOT CMAKE_CROSSCOMPILING)
if (NOT "${CMAKE_SYSTEM_NAME}" STREQUAL "WindowsStore")
try_compile(GETARCH_RESULT ${GETARCH_DIR}
SOURCES ${GETARCH_SRC}
COMPILE_DEFINITIONS ${EXFLAGS} ${GETARCH_FLAGS} -I${GETARCH_DIR} -I"${PROJECT_SOURCE_DIR}" -I"${PROJECT_BINARY_DIR}"
COMPILE_DEFINITIONS ${EXFLAGS} ${GETARCH_FLAGS} -I"${GETARCH_DIR}" -I"${PROJECT_SOURCE_DIR}" -I"${PROJECT_BINARY_DIR}"
OUTPUT_VARIABLE GETARCH_LOG
COPY_FILE ${PROJECT_BINARY_DIR}/${GETARCH_BIN}
)
@@ -466,7 +552,7 @@ execute_process(COMMAND "${PROJECT_BINARY_DIR}/${GETARCH_BIN}" 1 OUTPUT_VARIABLE
if (NOT "${CMAKE_SYSTEM_NAME}" STREQUAL "WindowsStore")
try_compile(GETARCH2_RESULT ${GETARCH2_DIR}
SOURCES ${PROJECT_SOURCE_DIR}/getarch_2nd.c
COMPILE_DEFINITIONS ${EXFLAGS} ${GETARCH_FLAGS} ${GETARCH2_FLAGS} -I${GETARCH2_DIR} -I"${PROJECT_SOURCE_DIR}" -I"${PROJECT_BINARY_DIR}"
COMPILE_DEFINITIONS ${EXFLAGS} ${GETARCH_FLAGS} ${GETARCH2_FLAGS} -I"${GETARCH2_DIR}" -I"${PROJECT_SOURCE_DIR}" -I"${PROJECT_BINARY_DIR}"
OUTPUT_VARIABLE GETARCH2_LOG
COPY_FILE ${PROJECT_BINARY_DIR}/${GETARCH2_BIN}
)

View File

@@ -33,7 +33,7 @@ endif ()
if (DEFINED BINARY AND DEFINED TARGET AND BINARY EQUAL 32)
message(STATUS "Compiling a ${BINARY}-bit binary.")
set(NO_AVX 1)
if (${TARGET} STREQUAL "HASWELL" OR ${TARGET} STREQUAL "SANDYBRIDGE" OR ${TARGET} STREQUAL "SKYLAKEX")
if (${TARGET} STREQUAL "HASWELL" OR ${TARGET} STREQUAL "SANDYBRIDGE" OR ${TARGET} STREQUAL "SKYLAKEX" OR ${TARGET} STREQUAL "COOPERLAKE")
set(TARGET "NEHALEM")
endif ()
if (${TARGET} STREQUAL "BULLDOZER" OR ${TARGET} STREQUAL "PILEDRIVER" OR ${TARGET} STREQUAL "ZEN")
@@ -45,6 +45,18 @@ if (DEFINED BINARY AND DEFINED TARGET AND BINARY EQUAL 32)
endif ()
if (DEFINED TARGET)
if (${TARGET} STREQUAL "COOPERLAKE" AND NOT NO_AVX512)
# if (${CMAKE_C_COMPILER_ID} STREQUAL "GNU")
execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion OUTPUT_VARIABLE GCC_VERSION)
if (${GCC_VERSION} VERSION_GREATER 10.1 OR ${GCC_VERSION} VERSION_EQUAL 10.1)
set (KERNEL_DEFINITIONS "${KERNEL_DEFINITIONS} -march=cooperlake")
else()
set (KERNEL_DEFINITIONS "${KERNEL_DEFINITIONS} -march=skylake-avx512")
endif()
# elseif (${CMAKE_C_COMPILER_ID} STREQUAL "CLANG")
# set (KERNEL_DEFINITIONS "${KERNEL_DEFINITIONS} -mavx2")
# endif()
endif()
if (${TARGET} STREQUAL "SKYLAKEX" AND NOT NO_AVX512)
set (KERNEL_DEFINITIONS "${KERNEL_DEFINITIONS} -march=skylake-avx512")
endif()
@@ -289,10 +301,24 @@ set(CCOMMON_OPT "${CCOMMON_OPT} -DMAX_CPU_NUMBER=${NUM_THREADS}")
set(CCOMMON_OPT "${CCOMMON_OPT} -DMAX_PARALLEL_NUMBER=${NUM_PARALLEL}")
if (BUFFERSIZE)
set(CCOMMON_OPT "${CCOMMON_OPT} -DBUFFERSIZE=${BUFFERSIZE}")
endif ()
if (USE_SIMPLE_THREADED_LEVEL3)
set(CCOMMON_OPT "${CCOMMON_OPT} -DUSE_SIMPLE_THREADED_LEVEL3")
endif ()
if (NOT ${CMAKE_SYSTEM_NAME} STREQUAL "Windows")
if (DEFINED MAX_STACK_ALLOC)
if (NOT ${MAX_STACK_ALLOC} EQUAL 0)
set(CCOMMON_OPT "${CCOMMON_OPT} -DMAX_STACK_ALLOC=${MAX_STACK_ALLOC}")
endif ()
else ()
set(CCOMMON_OPT "${CCOMMON_OPT} -DMAX_STACK_ALLOC=2048")
endif ()
endif ()
if (DEFINED LIBNAMESUFFIX)
set(LIBPREFIX "libopenblas_${LIBNAMESUFFIX}")
else ()
@@ -403,6 +429,14 @@ if (${CMAKE_C_COMPILER} STREQUAL "LSB" OR ${CMAKE_SYSTEM_NAME} STREQUAL "Windows
set(LAPACK_CFLAGS "${LAPACK_CFLAGS} -DLAPACK_COMPLEX_STRUCTURE")
endif ()
if ("${CMAKE_BUILD_TYPE}" STREQUAL "Release")
if ("${F_COMPILER}" STREQUAL "FLANG")
if (${CMAKE_Fortran_COMPILER_VERSION} VERSION_LESS_EQUAL 3)
set(CMAKE_Fortran_FLAGS_RELEASE "${CMAKE_Fortran_FLAGS_RELEASE} -fno-unroll-loops")
endif ()
endif ()
endif ()
if (NOT DEFINED SUFFIX)
set(SUFFIX o)
endif ()
@@ -526,6 +560,8 @@ endif ()
#export FUNCTION_PROFILE
#export TARGET_CORE
#
#export SHGEMM_UNROLL_M
#export SHGEMM_UNROLL_N
#export SGEMM_UNROLL_M
#export SGEMM_UNROLL_N
#export DGEMM_UNROLL_M

View File

@@ -109,10 +109,17 @@ else()
endif()
if (X86_64 OR X86)
file(WRITE ${PROJECT_BINARY_DIR}/avx512.tmp "#include <immintrin.h>\n\nint main(void){ __asm__ volatile(\"vbroadcastss -4 * 4(%rsi), %zmm2\"); }")
execute_process(COMMAND ${CMAKE_C_COMPILER} -march=skylake-avx512 -c -v -o ${PROJECT_BINARY_DIR}/avx512.o -x c ${PROJECT_BINARY_DIR}/avx512.tmp OUTPUT_QUIET ERROR_QUIET RESULT_VARIABLE NO_AVX512)
file(WRITE ${PROJECT_BINARY_DIR}/avx512.c "#include <immintrin.h>\n\nint main(void){ __asm__ volatile(\"vbroadcastss -4 * 4(%rsi), %zmm2\"); }")
execute_process(COMMAND ${CMAKE_C_COMPILER} -march=skylake-avx512 -c -v -o ${PROJECT_BINARY_DIR}/avx512.o ${PROJECT_BINARY_DIR}/avx512.c OUTPUT_QUIET ERROR_QUIET RESULT_VARIABLE NO_AVX512)
if (NO_AVX512 EQUAL 1)
set (CCOMMON_OPT "${CCOMMON_OPT} -DNO_AVX512")
endif()
file(REMOVE "avx512.tmp" "avx512.o")
file(REMOVE "avx512.c" "avx512.o")
endif()
include(CheckIncludeFile)
CHECK_INCLUDE_FILE("stdatomic.h" HAVE_C11)
if (HAVE_C11 EQUAL 1)
message (STATUS found stdatomic.h)
set (CCOMMON_OPT "${CCOMMON_OPT} -DHAVE_C11")
endif()

View File

@@ -15,12 +15,36 @@ endfunction ()
# Reads a Makefile into CMake vars.
macro(ParseMakefileVars MAKEFILE_IN)
message(STATUS "Reading vars from ${MAKEFILE_IN}...")
set (IfElse 0)
set (ElseSeen 0)
file(STRINGS ${MAKEFILE_IN} makefile_contents)
foreach (makefile_line ${makefile_contents})
#message(STATUS "parsing ${makefile_line}")
if (${IfElse} GREATER 0)
string(REGEX MATCH "endif[ \t]*" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
# message(STATUS "ENDIF ${makefile_line}")
set (IfElse 0)
set (ElseSeen 0)
continue ()
endif ()
string(REGEX MATCH "else[ \t]*" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
# message(STATUS "ELSE ${makefile_line}")
set (ElseSeen 1)
continue ()
endif()
if ( (${IfElse} EQUAL 2 AND ${ElseSeen} EQUAL 0) OR ( ${IfElse} EQUAL 1 AND ${ElseSeen} EQUAL 1))
# message(STATUS "skipping ${makefile_line}")
continue ()
endif ()
endif ()
string(REGEX MATCH "([0-9_a-zA-Z]+)[ \t]*=[ \t]*(.+)$" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
#message(STATUS "match on ${line_match}")
set(var_name ${CMAKE_MATCH_1})
set(var_value ${CMAKE_MATCH_2})
# set(var_value ${CMAKE_MATCH_2})
string(STRIP ${CMAKE_MATCH_2} var_value)
# check for Makefile variables in the string, e.g. $(TSUFFIX)
string(REGEX MATCHALL "\\$\\(([0-9_a-zA-Z]+)\\)" make_var_matches ${var_value})
foreach (make_var ${make_var_matches})
@@ -33,7 +57,31 @@ macro(ParseMakefileVars MAKEFILE_IN)
else ()
string(REGEX MATCH "include \\$\\(KERNELDIR\\)/(.+)$" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
#message(STATUS "match on include ${line_match}")
ParseMakefileVars(${KERNELDIR}/${CMAKE_MATCH_1})
else ()
# message(STATUS "unmatched line ${line_match}")
string(REGEX MATCH "ifeq \\(\\$\\(([_A-Z]+)\\),[ \t]*([0-9_A-Z]+)\\)" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
# message(STATUS "IFEQ: ${line_match} first: ${CMAKE_MATCH_1} second: ${CMAKE_MATCH_2}")
if (DEFINED ${${CMAKE_MATCH_1}} AND ${${CMAKE_MATCH_1}} STREQUAL ${CMAKE_MATCH_2})
# message (STATUS "condition is true")
set (IfElse 1)
else ()
set (IfElse 2)
endif ()
else ()
string(REGEX MATCH "ifneq \\(\\$\\(([_A-Z]+)\\),[ \t]*([0-9_A-Z]+)\\)" line_match "${makefile_line}")
if (NOT "${line_match}" STREQUAL "")
# message(STATUS "IFNEQ: ${line_match} first: ${CMAKE_MATCH_1} second: ${CMAKE_MATCH_2}")
if (NOT ( ${${CMAKE_MATCH_1}} STREQUAL ${CMAKE_MATCH_2}))
# message (STATUS "condition is true")
set (IfElse 1)
else ()
set (IfElse 2)
endif ()
endif ()
endif ()
endif ()
endif ()
endforeach ()
@@ -163,6 +211,7 @@ function(GenerateNamedObjects sources_in)
if (complex_only)
list(REMOVE_ITEM float_list "SINGLE")
list(REMOVE_ITEM float_list "DOUBLE")
list(REMOVE_ITEM float_list "HALF")
elseif (real_only)
list(REMOVE_ITEM float_list "COMPLEX")
list(REMOVE_ITEM float_list "ZCOMPLEX")
@@ -176,6 +225,9 @@ function(GenerateNamedObjects sources_in)
if (NOT no_float_type)
string(SUBSTRING ${float_type} 0 1 float_char)
string(TOLOWER ${float_char} float_char)
if (${float_type} STREQUAL "HALF")
set (float_char "sh")
endif ()
endif ()
if (NOT name_in)
@@ -210,6 +262,9 @@ function(GenerateNamedObjects sources_in)
if (${float_type} STREQUAL "DOUBLE" OR ${float_type} STREQUAL "ZCOMPLEX")
list(APPEND obj_defines "DOUBLE")
endif ()
if (${float_type} STREQUAL "HALF")
list(APPEND obj_defines "HALF")
endif ()
if (${float_type} STREQUAL "COMPLEX" OR ${float_type} STREQUAL "ZCOMPLEX")
list(APPEND obj_defines "COMPLEX")
if (mangle_complex_sources)

View File

@@ -257,6 +257,11 @@ typedef long BLASLONG;
typedef unsigned long BLASULONG;
#endif
#ifndef BFLOAT16
typedef unsigned short bfloat16;
#define HALFCONVERSION 1
#endif
#ifdef USE64BITINT
typedef BLASLONG blasint;
#if defined(OS_WINDOWS) && defined(__64BIT__)
@@ -297,6 +302,13 @@ typedef int blasint;
#define SIZE 8
#define BASE_SHIFT 3
#define ZBASE_SHIFT 4
#elif defined(HALF)
#define IFLOAT bfloat16
#define XFLOAT IFLOAT
#define FLOAT float
#define SIZE 2
#define BASE_SHIFT 1
#define ZBASE_SHIFT 2
#else
#define FLOAT float
#define SIZE 4
@@ -308,6 +320,10 @@ typedef int blasint;
#define XFLOAT FLOAT
#endif
#ifndef IFLOAT
#define IFLOAT FLOAT
#endif
#ifndef COMPLEX
#define COMPSIZE 1
#else
@@ -344,13 +360,8 @@ typedef int blasint;
#endif
#endif
#ifdef POWER8
#ifndef YIELDING
#define YIELDING __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop;\n");
#endif
#endif
#ifdef POWER9
#if defined(POWER8) || defined(POWER9) || defined(POWER10)
#ifndef YIELDING
#define YIELDING __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop;\n");
#endif
@@ -657,6 +668,8 @@ void gotoblas_dynamic_init(void);
void gotoblas_dynamic_quit(void);
void gotoblas_profile_init(void);
void gotoblas_profile_quit(void);
int support_avx512(void);
#ifdef USE_OPENMP
@@ -668,7 +681,7 @@ __declspec(dllimport) int __cdecl omp_in_parallel(void);
__declspec(dllimport) int __cdecl omp_get_num_procs(void);
#endif
#if (__STDC_VERSION__ >= 201112L)
#ifdef HAVE_C11
#if defined(C_GCC) && ( __GNUC__ < 7)
// workaround for GCC bug 65467
#ifndef _Atomic

View File

@@ -43,6 +43,7 @@
#define MB asm("mb")
#define WMB asm("wmb")
#define RMB asm("rmb")
static void __inline blas_lock(unsigned long *address){
#ifndef __DECC

View File

@@ -37,11 +37,13 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define MB
#define WMB
#define RMB
#else
#define MB __asm__ __volatile__ ("dmb ish" : : : "memory")
#define WMB __asm__ __volatile__ ("dmb ishst" : : : "memory")
#define RMB __asm__ __volatile__ ("dmb ish" : : : "memory")
#endif
@@ -121,7 +123,7 @@ REALNAME:
#endif
#define HUGE_PAGESIZE ( 4 << 20)
#define BUFFER_SIZE (16 << 20)
#define BUFFER_SIZE (32 << 20)
#define BASE_ADDRESS (START_ADDRESS - BUFFER_SIZE * MAX_CPU_NUMBER)

View File

@@ -35,7 +35,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define MB __asm__ __volatile__ ("dmb ish" : : : "memory")
#define WMB __asm__ __volatile__ ("dmb ishst" : : : "memory")
#define RMB __asm__ __volatile__ ("dmb ishld" : : : "memory")
#define INLINE inline
@@ -53,16 +53,16 @@ static void __inline blas_lock(volatile BLASULONG *address){
BLASULONG ret;
do {
while (*address) {YIELDING;};
__asm__ __volatile__(
"mov x4, #1 \n\t"
"sevl \n\t"
"1: \n\t"
"wfe \n\t"
"2: \n\t"
"ldaxr x2, [%1] \n\t"
"cbnz x2, 1b \n\t"
"2: \n\t"
"stxr w3, x4, [%1] \n\t"
"cbnz w3, 1b \n\t"
"cbnz w3, 2b \n\t"
"mov %0, #0 \n\t"
: "=r"(ret), "=r"(address)
: "1"(address)
@@ -81,10 +81,12 @@ static void __inline blas_lock(volatile BLASULONG *address){
#if !defined(OS_DARWIN) && !defined (OS_ANDROID)
static __inline BLASULONG rpcc(void){
BLASULONG ret = 0;
blasint shift;
__asm__ __volatile__ ("isb; mrs %0,cntvct_el0":"=r"(ret));
__asm__ __volatile__ ("mrs %0,cntfrq_el0; clz %w0, %w0":"=&r"(shift));
return ret;
return ret << shift;
}
#define RPCC_DEFINED
@@ -139,12 +141,17 @@ REALNAME:
#endif
#define HUGE_PAGESIZE ( 4 << 20)
#ifndef BUFFERSIZE
#if defined(CORTEXA57)
#define BUFFER_SIZE (20 << 20)
#elif defined(TSV110) || defined(EMAG8180)
#define BUFFER_SIZE (32 << 20)
#else
#define BUFFER_SIZE (16 << 20)
#endif
#else
#define BUFFER_SIZE (32 << BUFFERSIZE)
#endif
#define BASE_ADDRESS (START_ADDRESS - BUFFER_SIZE * MAX_CPU_NUMBER)

View File

@@ -232,6 +232,46 @@
#define CGEADD_K cgeadd_k
#define CGEMM_SMALL_KERNEL_NN cgemm_small_kernel_nn
#define CGEMM_SMALL_KERNEL_NT cgemm_small_kernel_nt
#define CGEMM_SMALL_KERNEL_NR cgemm_small_kernel_nr
#define CGEMM_SMALL_KERNEL_NC cgemm_small_kernel_nc
#define CGEMM_SMALL_KERNEL_TN cgemm_small_kernel_tn
#define CGEMM_SMALL_KERNEL_TT cgemm_small_kernel_tt
#define CGEMM_SMALL_KERNEL_TR cgemm_small_kernel_tr
#define CGEMM_SMALL_KERNEL_TC cgemm_small_kernel_tc
#define CGEMM_SMALL_KERNEL_RN cgemm_small_kernel_rn
#define CGEMM_SMALL_KERNEL_RT cgemm_small_kernel_rt
#define CGEMM_SMALL_KERNEL_RR cgemm_small_kernel_rr
#define CGEMM_SMALL_KERNEL_RC cgemm_small_kernel_rc
#define CGEMM_SMALL_KERNEL_CN cgemm_small_kernel_cn
#define CGEMM_SMALL_KERNEL_CT cgemm_small_kernel_ct
#define CGEMM_SMALL_KERNEL_CR cgemm_small_kernel_cr
#define CGEMM_SMALL_KERNEL_CC cgemm_small_kernel_cc
#define CGEMM_SMALL_KERNEL_B0_NN cgemm_small_kernel_b0_nn
#define CGEMM_SMALL_KERNEL_B0_NT cgemm_small_kernel_b0_nt
#define CGEMM_SMALL_KERNEL_B0_NR cgemm_small_kernel_b0_nr
#define CGEMM_SMALL_KERNEL_B0_NC cgemm_small_kernel_b0_nc
#define CGEMM_SMALL_KERNEL_B0_TN cgemm_small_kernel_b0_tn
#define CGEMM_SMALL_KERNEL_B0_TT cgemm_small_kernel_b0_tt
#define CGEMM_SMALL_KERNEL_B0_TR cgemm_small_kernel_b0_tr
#define CGEMM_SMALL_KERNEL_B0_TC cgemm_small_kernel_b0_tc
#define CGEMM_SMALL_KERNEL_B0_RN cgemm_small_kernel_b0_rn
#define CGEMM_SMALL_KERNEL_B0_RT cgemm_small_kernel_b0_rt
#define CGEMM_SMALL_KERNEL_B0_RR cgemm_small_kernel_b0_rr
#define CGEMM_SMALL_KERNEL_B0_RC cgemm_small_kernel_b0_rc
#define CGEMM_SMALL_KERNEL_B0_CN cgemm_small_kernel_b0_cn
#define CGEMM_SMALL_KERNEL_B0_CT cgemm_small_kernel_b0_ct
#define CGEMM_SMALL_KERNEL_B0_CR cgemm_small_kernel_b0_cr
#define CGEMM_SMALL_KERNEL_B0_CC cgemm_small_kernel_b0_cc
#else
#define CAMAX_K gotoblas -> camax_k

View File

@@ -157,6 +157,17 @@
#define DIMATCOPY_K_RT dimatcopy_k_rt
#define DGEADD_K dgeadd_k
#define DGEMM_SMALL_KERNEL_NN dgemm_small_kernel_nn
#define DGEMM_SMALL_KERNEL_NT dgemm_small_kernel_nt
#define DGEMM_SMALL_KERNEL_TN dgemm_small_kernel_tn
#define DGEMM_SMALL_KERNEL_TT dgemm_small_kernel_tt
#define DGEMM_SMALL_KERNEL_B0_NN dgemm_small_kernel_b0_nn
#define DGEMM_SMALL_KERNEL_B0_NT dgemm_small_kernel_b0_nt
#define DGEMM_SMALL_KERNEL_B0_TN dgemm_small_kernel_b0_tn
#define DGEMM_SMALL_KERNEL_B0_TT dgemm_small_kernel_b0_tt
#else
#define DAMAX_K gotoblas -> damax_k

View File

@@ -47,6 +47,7 @@
#define MB
#define WMB
#define RMB
#ifdef __ECC
#include <ia64intrin.h>

View File

@@ -469,6 +469,8 @@ void BLASFUNC(xhbmv)(char *, blasint *, blasint *, xdouble *, xdouble *, blasint
/* Level 3 routines */
void BLASFUNC(shgemm)(char *, char *, blasint *, blasint *, blasint *, float *,
bfloat16 *, blasint *, bfloat16 *, blasint *, float *, float *, blasint *);
void BLASFUNC(sgemm)(char *, char *, blasint *, blasint *, blasint *, float *,
float *, blasint *, float *, blasint *, float *, float *, blasint *);
void BLASFUNC(dgemm)(char *, char *, blasint *, blasint *, blasint *, double *,

View File

@@ -47,14 +47,16 @@ __global__ void cuda_dgemm_kernel(int, int, int, double *, double *, double *);
extern "C" {
#endif
extern void sgemm_kernel_direct(BLASLONG M, BLASLONG N, BLASLONG K,
void sgemm_direct(BLASLONG M, BLASLONG N, BLASLONG K,
float * A, BLASLONG strideA,
float * B, BLASLONG strideB,
float * R, BLASLONG strideR);
extern int sgemm_kernel_direct_performant(BLASLONG M, BLASLONG N, BLASLONG K);
int sgemm_direct_performant(BLASLONG M, BLASLONG N, BLASLONG K);
int shgemm_beta(BLASLONG, BLASLONG, BLASLONG, float,
bfloat16 *, BLASLONG, bfloat16 *, BLASLONG, float *, BLASLONG);
int sgemm_beta(BLASLONG, BLASLONG, BLASLONG, float,
float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int dgemm_beta(BLASLONG, BLASLONG, BLASLONG, double,
@@ -76,6 +78,10 @@ int xgemm_beta(BLASLONG, BLASLONG, BLASLONG, xdouble *,
xdouble *, BLASLONG, xdouble *, BLASLONG, xdouble *, BLASLONG);
#endif
int shgemm_incopy(BLASLONG m, BLASLONG n, bfloat16 *a, BLASLONG lda, bfloat16 *b);
int shgemm_itcopy(BLASLONG m, BLASLONG n, bfloat16 *a, BLASLONG lda, bfloat16 *b);
int shgemm_oncopy(BLASLONG m, BLASLONG n, bfloat16 *a, BLASLONG lda, bfloat16 *b);
int shgemm_otcopy(BLASLONG m, BLASLONG n, bfloat16 *a, BLASLONG lda, bfloat16 *b);
int sgemm_incopy(BLASLONG m, BLASLONG n, float *a, BLASLONG lda, float *b);
int sgemm_itcopy(BLASLONG m, BLASLONG n, float *a, BLASLONG lda, float *b);
int sgemm_oncopy(BLASLONG m, BLASLONG n, float *a, BLASLONG lda, float *b);
@@ -499,6 +505,7 @@ int xher2k_kernel_UC(BLASLONG m, BLASLONG n, BLASLONG k, xdouble alpha_r, xdoubl
int xher2k_kernel_LN(BLASLONG m, BLASLONG n, BLASLONG k, xdouble alpha_r, xdouble alpha_i, xdouble *a, xdouble *b, xdouble *c, BLASLONG ldc, BLASLONG offset, int flag);
int xher2k_kernel_LC(BLASLONG m, BLASLONG n, BLASLONG k, xdouble alpha_r, xdouble alpha_i, xdouble *a, xdouble *b, xdouble *c, BLASLONG ldc, BLASLONG offset, int flag);
int shgemm_kernel(BLASLONG, BLASLONG, BLASLONG, float, bfloat16 *, bfloat16 *, float *, BLASLONG);
int sgemm_kernel(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG);
int dgemm_kernel(BLASLONG, BLASLONG, BLASLONG, double, double *, double *, double *, BLASLONG);
@@ -508,6 +515,109 @@ int qgemm_kernel(BLASLONG, BLASLONG, BLASLONG, xidouble *, xidouble *, xidouble
int qgemm_kernel(BLASLONG, BLASLONG, BLASLONG, xdouble, xdouble *, xdouble *, xdouble *, BLASLONG);
#endif
#ifdef SMALL_MATRIX_OPT
int sgemm_small_kernel_nn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float beta, float * C, BLASLONG ldc);
int sgemm_small_kernel_nt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float beta, float * C, BLASLONG ldc);
int sgemm_small_kernel_tn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float beta, float * C, BLASLONG ldc);
int sgemm_small_kernel_tt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float beta, float * C, BLASLONG ldc);
int dgemm_small_kernel_nn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double beta, double * C, BLASLONG ldc);
int dgemm_small_kernel_nt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double beta, double * C, BLASLONG ldc);
int dgemm_small_kernel_tn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double beta, double * C, BLASLONG ldc);
int dgemm_small_kernel_tt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double beta, double * C, BLASLONG ldc);
int sgemm_small_kernel_b0_nn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int sgemm_small_kernel_b0_nt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int sgemm_small_kernel_b0_tn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int sgemm_small_kernel_b0_tt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int dgemm_small_kernel_b0_nn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int dgemm_small_kernel_b0_nt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int dgemm_small_kernel_b0_tn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int dgemm_small_kernel_b0_tt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int cgemm_small_kernel_nn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_nt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_nr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_nc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_tn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_tt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_tr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_tc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_rn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_rt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_rr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_rc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_cn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_ct(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_cr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int cgemm_small_kernel_cc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float beta0, float beta1, float * C, BLASLONG ldc);
int zgemm_small_kernel_nn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_nt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_nr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_nc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_tn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_tt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_tr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_tc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_rn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_rt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_rr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_rc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_cn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_ct(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_cr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int zgemm_small_kernel_cc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double beta0, double beta1, double * C, BLASLONG ldc);
int cgemm_small_kernel_b0_nn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_nt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_nr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_nc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_tn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_tt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_tr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_tc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_rn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_rt(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_rr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_rc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_cn(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_ct(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_cr(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int cgemm_small_kernel_b0_cc(BLASLONG m, BLASLONG n, BLASLONG k, float * A, BLASLONG lda, float alpha0, float alpha1, float * B, BLASLONG ldb, float * C, BLASLONG ldc);
int zgemm_small_kernel_b0_nn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_nt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_nr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_nc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_tn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_tt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_tr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_tc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_rn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_rt(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_rr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_rc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_cn(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_ct(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_cr(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
int zgemm_small_kernel_b0_cc(BLASLONG m, BLASLONG n, BLASLONG k, double * A, BLASLONG lda, double alpha0, double alpha1, double * B, BLASLONG ldb, double * C, BLASLONG ldc);
#endif
int cgemm_kernel_n(BLASLONG, BLASLONG, BLASLONG, float, float, float *, float *, float *, BLASLONG);
int cgemm_kernel_l(BLASLONG, BLASLONG, BLASLONG, float, float, float *, float *, float *, BLASLONG);
int cgemm_kernel_r(BLASLONG, BLASLONG, BLASLONG, float, float, float *, float *, float *, BLASLONG);
@@ -527,6 +637,11 @@ int cgemm3m_kernel(BLASLONG, BLASLONG, BLASLONG, float, float, float *, float
int zgemm3m_kernel(BLASLONG, BLASLONG, BLASLONG, double, double, double *, double *, double *, BLASLONG);
int xgemm3m_kernel(BLASLONG, BLASLONG, BLASLONG, xdouble, xdouble, xdouble *, xdouble *, xdouble *, BLASLONG);
int shgemm_nn(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_nt(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_tn(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_tt(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int sgemm_nn(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
int sgemm_nt(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
int sgemm_tn(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
@@ -619,6 +734,11 @@ int xgemm_cr(blas_arg_t *, BLASLONG *, BLASLONG *, xdouble *, xdouble *, BLASLON
int xgemm_cc(blas_arg_t *, BLASLONG *, BLASLONG *, xdouble *, xdouble *, BLASLONG);
#endif
int shgemm_thread_nn(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_thread_nt(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_thread_tn(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int shgemm_thread_tt(blas_arg_t *, BLASLONG *, BLASLONG *, bfloat16 *, bfloat16 *, BLASLONG);
int sgemm_thread_nn(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
int sgemm_thread_nt(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
int sgemm_thread_tn(blas_arg_t *, BLASLONG *, BLASLONG *, float *, float *, BLASLONG);
@@ -1799,6 +1919,10 @@ int dgeadd_k(BLASLONG, BLASLONG, double, double*, BLASLONG, double, double *, BL
int cgeadd_k(BLASLONG, BLASLONG, float, float, float*, BLASLONG, float, float, float *, BLASLONG);
int zgeadd_k(BLASLONG, BLASLONG, double,double, double*, BLASLONG, double, double, double *, BLASLONG);
int sgemm_batch_thread(blas_arg_t * queue, BLASLONG nums);
int dgemm_batch_thread(blas_arg_t * queue, BLASLONG nums);
int cgemm_batch_thread(blas_arg_t * queue, BLASLONG nums);
int zgemm_batch_thread(blas_arg_t * queue, BLASLONG nums);
#ifdef __CUDACC__
}

View File

@@ -39,6 +39,7 @@
#ifndef COMMON_MACRO
#define COMMON_MACRO
#include "common_sh.h"
#include "common_s.h"
#include "common_d.h"
#include "common_q.h"
@@ -642,6 +643,308 @@
#define IMATCOPY_K_RT DIMATCOPY_K_RT
#define GEADD_K DGEADD_K
#define GEMM_SMALL_KERNEL_NN DGEMM_SMALL_KERNEL_NN
#define GEMM_SMALL_KERNEL_NT DGEMM_SMALL_KERNEL_NT
#define GEMM_SMALL_KERNEL_TN DGEMM_SMALL_KERNEL_TN
#define GEMM_SMALL_KERNEL_TT DGEMM_SMALL_KERNEL_TT
#define GEMM_SMALL_KERNEL_B0_NN DGEMM_SMALL_KERNEL_B0_NN
#define GEMM_SMALL_KERNEL_B0_NT DGEMM_SMALL_KERNEL_B0_NT
#define GEMM_SMALL_KERNEL_B0_TN DGEMM_SMALL_KERNEL_B0_TN
#define GEMM_SMALL_KERNEL_B0_TT DGEMM_SMALL_KERNEL_B0_TT
#elif defined(HALF)
#define AMAX_K SAMAX_K
#define AMIN_K SAMIN_K
#define MAX_K SMAX_K
#define MIN_K SMIN_K
#define IAMAX_K ISAMAX_K
#define IAMIN_K ISAMIN_K
#define IMAX_K ISMAX_K
#define IMIN_K ISMIN_K
#define ASUM_K SASUM_K
#define DOTU_K SDOTU_K
#define DOTC_K SDOTC_K
#define AXPYU_K SAXPYU_K
#define AXPYC_K SAXPYC_K
#define AXPBY_K SAXPBY_K
#define SCAL_K SSCAL_K
#define GEMV_N SGEMV_N
#define GEMV_T SGEMV_T
#define SYMV_U SSYMV_U
#define SYMV_L SSYMV_L
#define GERU_K SGERU_K
#define GERC_K SGERC_K
#define GERV_K SGERV_K
#define GERD_K SGERD_K
#define SUM_K SSUM_K
#define SWAP_K SSWAP_K
#define ROT_K SROT_K
#define COPY_K SCOPY_K
#define NRM2_K SNRM2_K
#define SYMV_THREAD_U SSYMV_THREAD_U
#define SYMV_THREAD_L SSYMV_THREAD_L
#define GEMM_BETA SHGEMM_BETA
#define GEMM_KERNEL_N SHGEMM_KERNEL
#define GEMM_KERNEL_L SHGEMM_KERNEL
#define GEMM_KERNEL_R SHGEMM_KERNEL
#define GEMM_KERNEL_B SHGEMM_KERNEL
#define GEMM_NN SHGEMM_NN
#define GEMM_CN SHGEMM_TN
#define GEMM_TN SHGEMM_TN
#define GEMM_NC SHGEMM_NT
#define GEMM_NT SHGEMM_NT
#define GEMM_CC SHGEMM_TT
#define GEMM_CT SHGEMM_TT
#define GEMM_TC SHGEMM_TT
#define GEMM_TT SHGEMM_TT
#define GEMM_NR SHGEMM_NN
#define GEMM_TR SHGEMM_TN
#define GEMM_CR SHGEMM_TN
#define GEMM_RN SHGEMM_NN
#define GEMM_RT SHGEMM_NT
#define GEMM_RC SHGEMM_NT
#define GEMM_RR SHGEMM_NN
#define GEMM_ONCOPY SHGEMM_ONCOPY
#define GEMM_OTCOPY SHGEMM_OTCOPY
#define GEMM_INCOPY SHGEMM_INCOPY
#define GEMM_ITCOPY SHGEMM_ITCOPY
#define SYMM_THREAD_LU SSYMM_THREAD_LU
#define SYMM_THREAD_LL SSYMM_THREAD_LL
#define SYMM_THREAD_RU SSYMM_THREAD_RU
#define SYMM_THREAD_RL SSYMM_THREAD_RL
#define SYMM_LU SSYMM_LU
#define SYMM_LL SSYMM_LL
#define SYMM_RU SSYMM_RU
#define SYMM_RL SSYMM_RL
#define HEMM_THREAD_LU SHEMM_THREAD_LU
#define HEMM_THREAD_LL SHEMM_THREAD_LL
#define HEMM_THREAD_RU SHEMM_THREAD_RU
#define HEMM_THREAD_RL SHEMM_THREAD_RL
#define GEMM_THREAD_NN SHGEMM_THREAD_NN
#define GEMM_THREAD_CN SHGEMM_THREAD_TN
#define GEMM_THREAD_TN SHGEMM_THREAD_TN
#define GEMM_THREAD_NC SHGEMM_THREAD_NT
#define GEMM_THREAD_NT SHGEMM_THREAD_NT
#define GEMM_THREAD_CC SHGEMM_THREAD_TT
#define GEMM_THREAD_CT SHGEMM_THREAD_TT
#define GEMM_THREAD_TC SHGEMM_THREAD_TT
#define GEMM_THREAD_TT SHGEMM_THREAD_TT
#define GEMM_THREAD_NR SHGEMM_THREAD_NN
#define GEMM_THREAD_TR SHGEMM_THREAD_TN
#define GEMM_THREAD_CR SHGEMM_THREAD_TN
#define GEMM_THREAD_RN SHGEMM_THREAD_NN
#define GEMM_THREAD_RT SHGEMM_THREAD_NT
#define GEMM_THREAD_RC SHGEMM_THREAD_NT
#define GEMM_THREAD_RR SHGEMM_THREAD_NN
#ifdef UNIT
#define TRMM_OUNCOPY STRMM_OUNUCOPY
#define TRMM_OUTCOPY STRMM_OUTUCOPY
#define TRMM_OLNCOPY STRMM_OLNUCOPY
#define TRMM_OLTCOPY STRMM_OLTUCOPY
#define TRSM_OUNCOPY STRSM_OUNUCOPY
#define TRSM_OUTCOPY STRSM_OUTUCOPY
#define TRSM_OLNCOPY STRSM_OLNUCOPY
#define TRSM_OLTCOPY STRSM_OLTUCOPY
#define TRMM_IUNCOPY STRMM_IUNUCOPY
#define TRMM_IUTCOPY STRMM_IUTUCOPY
#define TRMM_ILNCOPY STRMM_ILNUCOPY
#define TRMM_ILTCOPY STRMM_ILTUCOPY
#define TRSM_IUNCOPY STRSM_IUNUCOPY
#define TRSM_IUTCOPY STRSM_IUTUCOPY
#define TRSM_ILNCOPY STRSM_ILNUCOPY
#define TRSM_ILTCOPY STRSM_ILTUCOPY
#else
#define TRMM_OUNCOPY STRMM_OUNNCOPY
#define TRMM_OUTCOPY STRMM_OUTNCOPY
#define TRMM_OLNCOPY STRMM_OLNNCOPY
#define TRMM_OLTCOPY STRMM_OLTNCOPY
#define TRSM_OUNCOPY STRSM_OUNNCOPY
#define TRSM_OUTCOPY STRSM_OUTNCOPY
#define TRSM_OLNCOPY STRSM_OLNNCOPY
#define TRSM_OLTCOPY STRSM_OLTNCOPY
#define TRMM_IUNCOPY STRMM_IUNNCOPY
#define TRMM_IUTCOPY STRMM_IUTNCOPY
#define TRMM_ILNCOPY STRMM_ILNNCOPY
#define TRMM_ILTCOPY STRMM_ILTNCOPY
#define TRSM_IUNCOPY STRSM_IUNNCOPY
#define TRSM_IUTCOPY STRSM_IUTNCOPY
#define TRSM_ILNCOPY STRSM_ILNNCOPY
#define TRSM_ILTCOPY STRSM_ILTNCOPY
#define TRMM_KERNEL_LN STRMM_KERNEL_LN
#define TRMM_KERNEL_LT STRMM_KERNEL_LT
#define TRMM_KERNEL_LR STRMM_KERNEL_LN
#define TRMM_KERNEL_LC STRMM_KERNEL_LT
#define TRMM_KERNEL_RN STRMM_KERNEL_RN
#define TRMM_KERNEL_RT STRMM_KERNEL_RT
#define TRMM_KERNEL_RR STRMM_KERNEL_RN
#define TRMM_KERNEL_RC STRMM_KERNEL_RT
#define TRSM_KERNEL_LN STRSM_KERNEL_LN
#define TRSM_KERNEL_LT STRSM_KERNEL_LT
#define TRSM_KERNEL_LR STRSM_KERNEL_LN
#define TRSM_KERNEL_LC STRSM_KERNEL_LT
#define TRSM_KERNEL_RN STRSM_KERNEL_RN
#define TRSM_KERNEL_RT STRSM_KERNEL_RT
#define TRSM_KERNEL_RR STRSM_KERNEL_RN
#define TRSM_KERNEL_RC STRSM_KERNEL_RT
#define SYMM_IUTCOPY SSYMM_IUTCOPY
#define SYMM_ILTCOPY SSYMM_ILTCOPY
#define SYMM_OUTCOPY SSYMM_OUTCOPY
#define SYMM_OLTCOPY SSYMM_OLTCOPY
#define TRMM_LNUU STRMM_LNUU
#define TRMM_LNUN STRMM_LNUN
#define TRMM_LNLU STRMM_LNLU
#define TRMM_LNLN STRMM_LNLN
#define TRMM_LTUU STRMM_LTUU
#define TRMM_LTUN STRMM_LTUN
#define TRMM_LTLU STRMM_LTLU
#define TRMM_LTLN STRMM_LTLN
#define TRMM_LRUU STRMM_LNUU
#define TRMM_LRUN STRMM_LNUN
#define TRMM_LRLU STRMM_LNLU
#define TRMM_LRLN STRMM_LNLN
#define TRMM_LCUU STRMM_LTUU
#define TRMM_LCUN STRMM_LTUN
#define TRMM_LCLU STRMM_LTLU
#define TRMM_LCLN STRMM_LTLN
#define TRMM_RNUU STRMM_RNUU
#define TRMM_RNUN STRMM_RNUN
#define TRMM_RNLU STRMM_RNLU
#define TRMM_RNLN STRMM_RNLN
#define TRMM_RTUU STRMM_RTUU
#define TRMM_RTUN STRMM_RTUN
#define TRMM_RTLU STRMM_RTLU
#define TRMM_RTLN STRMM_RTLN
#define TRMM_RRUU STRMM_RNUU
#define TRMM_RRUN STRMM_RNUN
#define TRMM_RRLU STRMM_RNLU
#define TRMM_RRLN STRMM_RNLN
#define TRMM_RCUU STRMM_RTUU
#define TRMM_RCUN STRMM_RTUN
#define TRMM_RCLU STRMM_RTLU
#define TRMM_RCLN STRMM_RTLN
#define TRSM_LNUU STRSM_LNUU
#define TRSM_LNUN STRSM_LNUN
#define TRSM_LNLU STRSM_LNLU
#define TRSM_LNLN STRSM_LNLN
#define TRSM_LTUU STRSM_LTUU
#define TRSM_LTUN STRSM_LTUN
#define TRSM_LTLU STRSM_LTLU
#define TRSM_LTLN STRSM_LTLN
#define TRSM_LRUU STRSM_LNUU
#define TRSM_LRUN STRSM_LNUN
#define TRSM_LRLU STRSM_LNLU
#define TRSM_LRLN STRSM_LNLN
#define TRSM_LCUU STRSM_LTUU
#define TRSM_LCUN STRSM_LTUN
#define TRSM_LCLU STRSM_LTLU
#define TRSM_LCLN STRSM_LTLN
#define TRSM_RNUU STRSM_RNUU
#define TRSM_RNUN STRSM_RNUN
#define TRSM_RNLU STRSM_RNLU
#define TRSM_RNLN STRSM_RNLN
#define TRSM_RTUU STRSM_RTUU
#define TRSM_RTUN STRSM_RTUN
#define TRSM_RTLU STRSM_RTLU
#define TRSM_RTLN STRSM_RTLN
#define TRSM_RRUU STRSM_RNUU
#define TRSM_RRUN STRSM_RNUN
#define TRSM_RRLU STRSM_RNLU
#define TRSM_RRLN STRSM_RNLN
#define TRSM_RCUU STRSM_RTUU
#define TRSM_RCUN STRSM_RTUN
#define TRSM_RCLU STRSM_RTLU
#define TRSM_RCLN STRSM_RTLN
#define SYRK_UN SSYRK_UN
#define SYRK_UT SSYRK_UT
#define SYRK_LN SSYRK_LN
#define SYRK_LT SSYRK_LT
#define SYRK_UR SSYRK_UN
#define SYRK_UC SSYRK_UT
#define SYRK_LR SSYRK_LN
#define SYRK_LC SSYRK_LT
#define SYRK_KERNEL_U SSYRK_KERNEL_U
#define SYRK_KERNEL_L SSYRK_KERNEL_L
#define HERK_UN SSYRK_UN
#define HERK_LN SSYRK_LN
#define HERK_UC SSYRK_UT
#define HERK_LC SSYRK_LT
#define HER2K_UN SSYR2K_UN
#define HER2K_LN SSYR2K_LN
#define HER2K_UC SSYR2K_UT
#define HER2K_LC SSYR2K_LT
#define SYR2K_UN SSYR2K_UN
#define SYR2K_UT SSYR2K_UT
#define SYR2K_LN SSYR2K_LN
#define SYR2K_LT SSYR2K_LT
#define SYR2K_UR SSYR2K_UN
#define SYR2K_UC SSYR2K_UT
#define SYR2K_LR SSYR2K_LN
#define SYR2K_LC SSYR2K_LT
#define SYR2K_KERNEL_U SSYR2K_KERNEL_U
#define SYR2K_KERNEL_L SSYR2K_KERNEL_L
#define SYRK_THREAD_UN SSYRK_THREAD_UN
#define SYRK_THREAD_UT SSYRK_THREAD_UT
#define SYRK_THREAD_LN SSYRK_THREAD_LN
#define SYRK_THREAD_LT SSYRK_THREAD_LT
#define SYRK_THREAD_UR SSYRK_THREAD_UR
#define SYRK_THREAD_UC SSYRK_THREAD_UC
#define SYRK_THREAD_LR SSYRK_THREAD_LN
#define SYRK_THREAD_LC SSYRK_THREAD_LT
#define HERK_THREAD_UN SSYRK_THREAD_UN
#define HERK_THREAD_UT SSYRK_THREAD_UT
#define HERK_THREAD_LN SSYRK_THREAD_LN
#define HERK_THREAD_LT SSYRK_THREAD_LT
#define HERK_THREAD_UR SSYRK_THREAD_UR
#define HERK_THREAD_UC SSYRK_THREAD_UC
#define HERK_THREAD_LR SSYRK_THREAD_LN
#define HERK_THREAD_LC SSYRK_THREAD_LT
#define OMATCOPY_K_CN SOMATCOPY_K_CN
#define OMATCOPY_K_RN SOMATCOPY_K_RN
#define OMATCOPY_K_CT SOMATCOPY_K_CT
#define OMATCOPY_K_RT SOMATCOPY_K_RT
#define IMATCOPY_K_CN SIMATCOPY_K_CN
#define IMATCOPY_K_RN SIMATCOPY_K_RN
#define IMATCOPY_K_CT SIMATCOPY_K_CT
#define IMATCOPY_K_RT SIMATCOPY_K_RT
#define GEADD_K SGEADD_K
#define GEMM_SMALL_KERNEL_NN SGEMM_SMALL_KERNEL_NN
#define GEMM_SMALL_KERNEL_NT SGEMM_SMALL_KERNEL_NT
#define GEMM_SMALL_KERNEL_TN SGEMM_SMALL_KERNEL_TN
#define GEMM_SMALL_KERNEL_TT SGEMM_SMALL_KERNEL_TT
#define GEMM_SMALL_KERNEL_B0_NN SGEMM_SMALL_KERNEL_B0_NN
#define GEMM_SMALL_KERNEL_B0_NT SGEMM_SMALL_KERNEL_B0_NT
#define GEMM_SMALL_KERNEL_B0_TN SGEMM_SMALL_KERNEL_B0_TN
#define GEMM_SMALL_KERNEL_B0_TT SGEMM_SMALL_KERNEL_B0_TT
#endif
#else
#define AMAX_K SAMAX_K
@@ -673,14 +976,14 @@
#define GEMV_S SGEMV_S
#define GEMV_D SGEMV_D
#define SYMV_U SSYMV_U
#define SYMV_L SSYMV_L
#define GERU_K SGERU_K
#define GERC_K SGERC_K
#define GERV_K SGERV_K
#define GERD_K SGERD_K
#define SYMV_U SSYMV_U
#define SYMV_L SSYMV_L
#define SYMV_THREAD_U SSYMV_THREAD_U
#define SYMV_THREAD_L SSYMV_THREAD_L
@@ -945,6 +1248,17 @@
#define IMATCOPY_K_RT SIMATCOPY_K_RT
#define GEADD_K SGEADD_K
#define GEMM_SMALL_KERNEL_NN SGEMM_SMALL_KERNEL_NN
#define GEMM_SMALL_KERNEL_NT SGEMM_SMALL_KERNEL_NT
#define GEMM_SMALL_KERNEL_TN SGEMM_SMALL_KERNEL_TN
#define GEMM_SMALL_KERNEL_TT SGEMM_SMALL_KERNEL_TT
#define GEMM_SMALL_KERNEL_B0_NN SGEMM_SMALL_KERNEL_B0_NN
#define GEMM_SMALL_KERNEL_B0_NT SGEMM_SMALL_KERNEL_B0_NT
#define GEMM_SMALL_KERNEL_B0_TN SGEMM_SMALL_KERNEL_B0_TN
#define GEMM_SMALL_KERNEL_B0_TT SGEMM_SMALL_KERNEL_B0_TT
#endif
#else
#ifdef XDOUBLE
@@ -1772,6 +2086,46 @@
#define GEADD_K ZGEADD_K
#define GEMM_SMALL_KERNEL_NN ZGEMM_SMALL_KERNEL_NN
#define GEMM_SMALL_KERNEL_NT ZGEMM_SMALL_KERNEL_NT
#define GEMM_SMALL_KERNEL_NR ZGEMM_SMALL_KERNEL_NR
#define GEMM_SMALL_KERNEL_NC ZGEMM_SMALL_KERNEL_NC
#define GEMM_SMALL_KERNEL_TN ZGEMM_SMALL_KERNEL_TN
#define GEMM_SMALL_KERNEL_TT ZGEMM_SMALL_KERNEL_TT
#define GEMM_SMALL_KERNEL_TR ZGEMM_SMALL_KERNEL_TR
#define GEMM_SMALL_KERNEL_TC ZGEMM_SMALL_KERNEL_TC
#define GEMM_SMALL_KERNEL_RN ZGEMM_SMALL_KERNEL_RN
#define GEMM_SMALL_KERNEL_RT ZGEMM_SMALL_KERNEL_RT
#define GEMM_SMALL_KERNEL_RR ZGEMM_SMALL_KERNEL_RR
#define GEMM_SMALL_KERNEL_RC ZGEMM_SMALL_KERNEL_RC
#define GEMM_SMALL_KERNEL_CN ZGEMM_SMALL_KERNEL_CN
#define GEMM_SMALL_KERNEL_CT ZGEMM_SMALL_KERNEL_CT
#define GEMM_SMALL_KERNEL_CR ZGEMM_SMALL_KERNEL_CR
#define GEMM_SMALL_KERNEL_CC ZGEMM_SMALL_KERNEL_CC
#define GEMM_SMALL_KERNEL_B0_NN ZGEMM_SMALL_KERNEL_B0_NN
#define GEMM_SMALL_KERNEL_B0_NT ZGEMM_SMALL_KERNEL_B0_NT
#define GEMM_SMALL_KERNEL_B0_NR ZGEMM_SMALL_KERNEL_B0_NR
#define GEMM_SMALL_KERNEL_B0_NC ZGEMM_SMALL_KERNEL_B0_NC
#define GEMM_SMALL_KERNEL_B0_TN ZGEMM_SMALL_KERNEL_B0_TN
#define GEMM_SMALL_KERNEL_B0_TT ZGEMM_SMALL_KERNEL_B0_TT
#define GEMM_SMALL_KERNEL_B0_TR ZGEMM_SMALL_KERNEL_B0_TR
#define GEMM_SMALL_KERNEL_B0_TC ZGEMM_SMALL_KERNEL_B0_TC
#define GEMM_SMALL_KERNEL_B0_RN ZGEMM_SMALL_KERNEL_B0_RN
#define GEMM_SMALL_KERNEL_B0_RT ZGEMM_SMALL_KERNEL_B0_RT
#define GEMM_SMALL_KERNEL_B0_RR ZGEMM_SMALL_KERNEL_B0_RR
#define GEMM_SMALL_KERNEL_B0_RC ZGEMM_SMALL_KERNEL_B0_RC
#define GEMM_SMALL_KERNEL_B0_CN ZGEMM_SMALL_KERNEL_B0_CN
#define GEMM_SMALL_KERNEL_B0_CT ZGEMM_SMALL_KERNEL_B0_CT
#define GEMM_SMALL_KERNEL_B0_CR ZGEMM_SMALL_KERNEL_B0_CR
#define GEMM_SMALL_KERNEL_B0_CC ZGEMM_SMALL_KERNEL_B0_CC
#else
#define AMAX_K CAMAX_K
@@ -2195,6 +2549,46 @@
#define GEADD_K CGEADD_K
#define GEMM_SMALL_KERNEL_NN CGEMM_SMALL_KERNEL_NN
#define GEMM_SMALL_KERNEL_NT CGEMM_SMALL_KERNEL_NT
#define GEMM_SMALL_KERNEL_NR CGEMM_SMALL_KERNEL_NR
#define GEMM_SMALL_KERNEL_NC CGEMM_SMALL_KERNEL_NC
#define GEMM_SMALL_KERNEL_TN CGEMM_SMALL_KERNEL_TN
#define GEMM_SMALL_KERNEL_TT CGEMM_SMALL_KERNEL_TT
#define GEMM_SMALL_KERNEL_TR CGEMM_SMALL_KERNEL_TR
#define GEMM_SMALL_KERNEL_TC CGEMM_SMALL_KERNEL_TC
#define GEMM_SMALL_KERNEL_RN CGEMM_SMALL_KERNEL_RN
#define GEMM_SMALL_KERNEL_RT CGEMM_SMALL_KERNEL_RT
#define GEMM_SMALL_KERNEL_RR CGEMM_SMALL_KERNEL_RR
#define GEMM_SMALL_KERNEL_RC CGEMM_SMALL_KERNEL_RC
#define GEMM_SMALL_KERNEL_CN CGEMM_SMALL_KERNEL_CN
#define GEMM_SMALL_KERNEL_CT CGEMM_SMALL_KERNEL_CT
#define GEMM_SMALL_KERNEL_CR CGEMM_SMALL_KERNEL_CR
#define GEMM_SMALL_KERNEL_CC CGEMM_SMALL_KERNEL_CC
#define GEMM_SMALL_KERNEL_B0_NN CGEMM_SMALL_KERNEL_B0_NN
#define GEMM_SMALL_KERNEL_B0_NT CGEMM_SMALL_KERNEL_B0_NT
#define GEMM_SMALL_KERNEL_B0_NR CGEMM_SMALL_KERNEL_B0_NR
#define GEMM_SMALL_KERNEL_B0_NC CGEMM_SMALL_KERNEL_B0_NC
#define GEMM_SMALL_KERNEL_B0_TN CGEMM_SMALL_KERNEL_B0_TN
#define GEMM_SMALL_KERNEL_B0_TT CGEMM_SMALL_KERNEL_B0_TT
#define GEMM_SMALL_KERNEL_B0_TR CGEMM_SMALL_KERNEL_B0_TR
#define GEMM_SMALL_KERNEL_B0_TC CGEMM_SMALL_KERNEL_B0_TC
#define GEMM_SMALL_KERNEL_B0_RN CGEMM_SMALL_KERNEL_B0_RN
#define GEMM_SMALL_KERNEL_B0_RT CGEMM_SMALL_KERNEL_B0_RT
#define GEMM_SMALL_KERNEL_B0_RR CGEMM_SMALL_KERNEL_B0_RR
#define GEMM_SMALL_KERNEL_B0_RC CGEMM_SMALL_KERNEL_B0_RC
#define GEMM_SMALL_KERNEL_B0_CN CGEMM_SMALL_KERNEL_B0_CN
#define GEMM_SMALL_KERNEL_B0_CT CGEMM_SMALL_KERNEL_B0_CT
#define GEMM_SMALL_KERNEL_B0_CR CGEMM_SMALL_KERNEL_B0_CR
#define GEMM_SMALL_KERNEL_B0_CC CGEMM_SMALL_KERNEL_B0_CC
#endif
#endif
@@ -2202,6 +2596,9 @@
#if defined(ARCH_X86) || defined(ARCH_X86_64) || defined(ARCH_IA64) || defined(ARCH_MIPS64) || defined(ARCH_ARM64)
extern BLASLONG gemm_offset_a;
extern BLASLONG gemm_offset_b;
extern BLASLONG shgemm_p;
extern BLASLONG shgemm_q;
extern BLASLONG shgemm_r;
extern BLASLONG sgemm_p;
extern BLASLONG sgemm_q;
extern BLASLONG sgemm_r;
@@ -2239,7 +2636,17 @@ typedef struct {
BLASLONG prea, preb, prec, pred;
#endif
//for gemm_batch
void * routine;
int routine_mode;
} blas_arg_t;
#ifdef SMALL_MATRIX_OPT
#define BLAS_SMALL_OPT 0x10000U
#define BLAS_SMALL_B0_OPT 0x30000U
#endif
#endif
#ifdef XDOUBLE

View File

@@ -35,6 +35,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define MB __sync_synchronize()
#define WMB __sync_synchronize()
#define RMB __sync_synchronize()
#define INLINE inline
@@ -42,6 +43,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#ifndef ASSEMBLER
#if !defined(MIPS24K)
static inline unsigned int rpcc(void){
unsigned long ret;
@@ -52,6 +54,7 @@ static inline unsigned int rpcc(void){
return ret;
}
#define RPCC_DEFINED
#endif
static inline int blas_quickdivide(blasint x, blasint y){
return x / y;
@@ -91,7 +94,7 @@ REALNAME:
#endif
#define HUGE_PAGESIZE ( 4 << 20)
#define BUFFER_SIZE (16 << 20)
#define BUFFER_SIZE (16 << 21)
#define BASE_ADDRESS (START_ADDRESS - BUFFER_SIZE * MAX_CPU_NUMBER)

View File

@@ -73,6 +73,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define MB __sync_synchronize()
#define WMB __sync_synchronize()
#define RMB __sync_synchronize()
#define INLINE inline
@@ -226,7 +227,7 @@ REALNAME: ;\
#define SEEK_ADDRESS
#define BUFFER_SIZE ( 32 << 20)
#define BUFFER_SIZE ( 32 << 21)
#if defined(LOONGSON3A)
#define PAGESIZE (16UL << 10)

View File

@@ -47,6 +47,100 @@ typedef struct {
int dtb_entries;
int offsetA, offsetB, align;
#ifdef BUILD_HALF
int shgemm_p, shgemm_q, shgemm_r;
int shgemm_unroll_m, shgemm_unroll_n, shgemm_unroll_mn;
float (*shamax_k) (BLASLONG, float *, BLASLONG);
float (*shamin_k) (BLASLONG, float *, BLASLONG);
float (*shmax_k) (BLASLONG, float *, BLASLONG);
float (*shmin_k) (BLASLONG, float *, BLASLONG);
BLASLONG (*ishamax_k)(BLASLONG, float *, BLASLONG);
BLASLONG (*ishamin_k)(BLASLONG, float *, BLASLONG);
BLASLONG (*ishmax_k) (BLASLONG, float *, BLASLONG);
BLASLONG (*ishmin_k) (BLASLONG, float *, BLASLONG);
float (*shnrm2_k) (BLASLONG, float *, BLASLONG);
float (*shasum_k) (BLASLONG, float *, BLASLONG);
float (*shsum_k) (BLASLONG, float *, BLASLONG);
int (*shcopy_k) (BLASLONG, float *, BLASLONG, float *, BLASLONG);
float (*shdot_k) (BLASLONG, float *, BLASLONG, float *, BLASLONG);
double (*dshdot_k) (BLASLONG, float *, BLASLONG, float *, BLASLONG);
int (*shrot_k) (BLASLONG, float *, BLASLONG, float *, BLASLONG, float, float);
int (*shaxpy_k) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int (*shscal_k) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int (*shswap_k) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int (*shgemv_n) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*shgemv_t) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*shger_k) (BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*shsymv_L) (BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*shsymv_U) (BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*shgemm_kernel )(BLASLONG, BLASLONG, BLASLONG, float, bfloat16 *, bfloat16 *, float *, BLASLONG);
int (*shgemm_beta )(BLASLONG, BLASLONG, BLASLONG, float, bfloat16 *, BLASLONG, bfloat16 *, BLASLONG, float *, BLASLONG);
int (*shgemm_incopy )(BLASLONG, BLASLONG, bfloat16 *, BLASLONG, bfloat16 *);
int (*shgemm_itcopy )(BLASLONG, BLASLONG, bfloat16 *, BLASLONG, bfloat16 *);
int (*shgemm_oncopy )(BLASLONG, BLASLONG, bfloat16 *, BLASLONG, bfloat16 *);
int (*shgemm_otcopy )(BLASLONG, BLASLONG, bfloat16 *, BLASLONG, bfloat16 *);
int (*shtrsm_kernel_LN)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrsm_kernel_LT)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrsm_kernel_RN)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrsm_kernel_RT)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrsm_iunucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_iunncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_iutucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_iutncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_ilnucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_ilnncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_iltucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_iltncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_ounucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_ounncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_outucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_outncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_olnucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_olnncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_oltucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrsm_oltncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, float *);
int (*shtrmm_kernel_RN)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrmm_kernel_RT)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrmm_kernel_LN)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrmm_kernel_LT)(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG, BLASLONG);
int (*shtrmm_iunucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_iunncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_iutucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_iutncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_ilnucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_ilnncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_iltucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_iltncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_ounucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_ounncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_outucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_outncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_olnucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_olnncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_oltucopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shtrmm_oltncopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shsymm_iutcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shsymm_iltcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shsymm_outcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shsymm_oltcopy)(BLASLONG, BLASLONG, float *, BLASLONG, BLASLONG, BLASLONG, float *);
int (*shneg_tcopy) (BLASLONG, BLASLONG, float *, BLASLONG, float *);
int (*shlaswp_ncopy) (BLASLONG, BLASLONG, BLASLONG, float *, BLASLONG, blasint *, float *);
#endif
int sgemm_p, sgemm_q, sgemm_r;
int sgemm_unroll_m, sgemm_unroll_n, sgemm_unroll_mn;
@@ -81,9 +175,15 @@ BLASLONG (*ismin_k) (BLASLONG, float *, BLASLONG);
int (*ssymv_L) (BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
int (*ssymv_U) (BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG, float *);
#ifdef ARCH_X86_64
void (*sgemm_direct) (BLASLONG, BLASLONG, BLASLONG, float *, BLASLONG , float *, BLASLONG , float * , BLASLONG);
int (*sgemm_direct_performant) (BLASLONG M, BLASLONG N, BLASLONG K);
#endif
int (*sgemm_kernel )(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG);
int (*sgemm_beta )(BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float *, BLASLONG);
int (*sgemm_incopy )(BLASLONG, BLASLONG, float *, BLASLONG, float *);
int (*sgemm_itcopy )(BLASLONG, BLASLONG, float *, BLASLONG, float *);
int (*sgemm_oncopy )(BLASLONG, BLASLONG, float *, BLASLONG, float *);
@@ -907,6 +1007,15 @@ extern gotoblas_t *gotoblas;
#define HAVE_EX_L2 gotoblas -> exclusive_cache
#ifdef BUILD_HALF
#define SHGEMM_P gotoblas -> shgemm_p
#define SHGEMM_Q gotoblas -> shgemm_q
#define SHGEMM_R gotoblas -> shgemm_r
#define SHGEMM_UNROLL_M gotoblas -> shgemm_unroll_m
#define SHGEMM_UNROLL_N gotoblas -> shgemm_unroll_n
#define SHGEMM_UNROLL_MN gotoblas -> shgemm_unroll_mn
#endif
#define SGEMM_P gotoblas -> sgemm_p
#define SGEMM_Q gotoblas -> sgemm_q
#define SGEMM_R gotoblas -> sgemm_r
@@ -984,6 +1093,19 @@ extern gotoblas_t *gotoblas;
#define HAVE_EX_L2 0
#endif
#ifdef BUILD_HALF
#define SHGEMM_P SHGEMM_DEFAULT_P
#define SHGEMM_Q SHGEMM_DEFAULT_Q
#define SHGEMM_R SHGEMM_DEFAULT_R
#define SHGEMM_UNROLL_M SHGEMM_DEFAULT_UNROLL_M
#define SHGEMM_UNROLL_N SHGEMM_DEFAULT_UNROLL_N
#ifdef SHGEMM_DEFAULT_UNROLL_MN
#define SHGEMM_UNROLL_MN SHGEMM_DEFAULT_UNROLL_MN
#else
#define SHGEMM_UNROLL_MN MAX((SHGEMM_UNROLL_M), (SHGEMM_UNROLL_N))
#endif
#endif
#define SGEMM_P SGEMM_DEFAULT_P
#define SGEMM_Q SGEMM_DEFAULT_Q
#define SGEMM_R SGEMM_DEFAULT_R
@@ -1119,6 +1241,18 @@ extern gotoblas_t *gotoblas;
#define GEMM_DEFAULT_R DGEMM_DEFAULT_R
#define GEMM_DEFAULT_UNROLL_M DGEMM_DEFAULT_UNROLL_M
#define GEMM_DEFAULT_UNROLL_N DGEMM_DEFAULT_UNROLL_N
#elif defined(HALF)
#define GEMM_P SHGEMM_P
#define GEMM_Q SHGEMM_Q
#define GEMM_R SHGEMM_R
#define GEMM_UNROLL_M SHGEMM_UNROLL_M
#define GEMM_UNROLL_N SHGEMM_UNROLL_N
#define GEMM_UNROLL_MN SHGEMM_UNROLL_MN
#define GEMM_DEFAULT_P SHGEMM_DEFAULT_P
#define GEMM_DEFAULT_Q SHGEMM_DEFAULT_Q
#define GEMM_DEFAULT_R SHGEMM_DEFAULT_R
#define GEMM_DEFAULT_UNROLL_M SHGEMM_DEFAULT_UNROLL_M
#define GEMM_DEFAULT_UNROLL_N SHGEMM_DEFAULT_UNROLL_N
#else
#define GEMM_P SGEMM_P
#define GEMM_Q SGEMM_Q
@@ -1204,28 +1338,32 @@ extern gotoblas_t *gotoblas;
#define GEMM_THREAD gemm_thread_n
#endif
#ifndef SHGEMM_DEFAULT_R
#define SHGEMM_DEFAULT_R (((BUFFER_SIZE - ((SHGEMM_DEFAULT_P * SHGEMM_DEFAULT_Q * 4 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (SHGEMM_DEFAULT_Q * 4) - 15) & ~15UL)
#endif
#ifndef SGEMM_DEFAULT_R
#define SGEMM_DEFAULT_R (((BUFFER_SIZE - ((SGEMM_DEFAULT_P * SGEMM_DEFAULT_Q * 4 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (SGEMM_DEFAULT_Q * 4) - 15) & ~15)
#define SGEMM_DEFAULT_R (((BUFFER_SIZE - ((SGEMM_DEFAULT_P * SGEMM_DEFAULT_Q * 4 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (SGEMM_DEFAULT_Q * 4) - 15) & ~15UL)
#endif
#ifndef DGEMM_DEFAULT_R
#define DGEMM_DEFAULT_R (((BUFFER_SIZE - ((DGEMM_DEFAULT_P * DGEMM_DEFAULT_Q * 8 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (DGEMM_DEFAULT_Q * 8) - 15) & ~15)
#define DGEMM_DEFAULT_R (((BUFFER_SIZE - ((DGEMM_DEFAULT_P * DGEMM_DEFAULT_Q * 8 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (DGEMM_DEFAULT_Q * 8) - 15) & ~15UL)
#endif
#ifndef QGEMM_DEFAULT_R
#define QGEMM_DEFAULT_R (((BUFFER_SIZE - ((QGEMM_DEFAULT_P * QGEMM_DEFAULT_Q * 16 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (QGEMM_DEFAULT_Q * 16) - 15) & ~15)
#define QGEMM_DEFAULT_R (((BUFFER_SIZE - ((QGEMM_DEFAULT_P * QGEMM_DEFAULT_Q * 16 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (QGEMM_DEFAULT_Q * 16) - 15) & ~15UL)
#endif
#ifndef CGEMM_DEFAULT_R
#define CGEMM_DEFAULT_R (((BUFFER_SIZE - ((CGEMM_DEFAULT_P * CGEMM_DEFAULT_Q * 8 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (CGEMM_DEFAULT_Q * 8) - 15) & ~15)
#define CGEMM_DEFAULT_R (((BUFFER_SIZE - ((CGEMM_DEFAULT_P * CGEMM_DEFAULT_Q * 8 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (CGEMM_DEFAULT_Q * 8) - 15) & ~15UL)
#endif
#ifndef ZGEMM_DEFAULT_R
#define ZGEMM_DEFAULT_R (((BUFFER_SIZE - ((ZGEMM_DEFAULT_P * ZGEMM_DEFAULT_Q * 16 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (ZGEMM_DEFAULT_Q * 16) - 15) & ~15)
#define ZGEMM_DEFAULT_R (((BUFFER_SIZE - ((ZGEMM_DEFAULT_P * ZGEMM_DEFAULT_Q * 16 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (ZGEMM_DEFAULT_Q * 16) - 15) & ~15UL)
#endif
#ifndef XGEMM_DEFAULT_R
#define XGEMM_DEFAULT_R (((BUFFER_SIZE - ((XGEMM_DEFAULT_P * XGEMM_DEFAULT_Q * 32 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (XGEMM_DEFAULT_Q * 32) - 15) & ~15)
#define XGEMM_DEFAULT_R (((BUFFER_SIZE - ((XGEMM_DEFAULT_P * XGEMM_DEFAULT_Q * 32 + GEMM_DEFAULT_OFFSET_A + GEMM_DEFAULT_ALIGN) & ~GEMM_DEFAULT_ALIGN)) / (XGEMM_DEFAULT_Q * 32) - 15) & ~15UL)
#endif
#ifndef SNUMOPT

View File

@@ -68,12 +68,14 @@
#endif
#if defined(POWER8) || defined(POWER9)
#if defined(POWER8) || defined(POWER9) || defined(POWER10)
#define MB __asm__ __volatile__ ("eieio":::"memory")
#define WMB __asm__ __volatile__ ("eieio":::"memory")
#define RMB __asm__ __volatile__ ("eieio":::"memory")
#else
#define MB __asm__ __volatile__ ("sync")
#define WMB __asm__ __volatile__ ("sync")
#define RMB __asm__ __volatile__ ("sync")
#endif
#define INLINE inline
@@ -103,6 +105,7 @@ static void INLINE blas_lock(volatile unsigned long *address){
" bne- 1f\n"
" stwcx. %2,0, %1\n"
" bne- 0b\n"
" isync\n"
"1: "
: "=&r"(ret)
: "r"(address), "r" (val)
@@ -270,7 +273,7 @@ static inline int blas_quickdivide(blasint x, blasint y){
#define HAVE_PREFETCH
#endif
#if defined(POWER3) || defined(POWER6) || defined(PPCG4) || defined(CELL) || defined(POWER8) || defined(POWER9) || defined(PPC970)
#if defined(POWER3) || defined(POWER6) || defined(PPCG4) || defined(CELL) || defined(POWER8) || defined(POWER9) || defined(POWER10) || defined(PPC970)
#define DCBT_ARG 0
#else
#define DCBT_ARG 8
@@ -292,7 +295,7 @@ static inline int blas_quickdivide(blasint x, blasint y){
#define L1_PREFETCH dcbtst
#endif
#if defined(POWER8) || defined(POWER9)
#if defined(POWER8) || defined(POWER9) || defined(POWER10)
#define L1_DUALFETCH
#define L1_PREFETCHSIZE (16 + 128 * 100)
#define L1_PREFETCH dcbtst
@@ -841,7 +844,7 @@ Lmcount$lazy_ptr:
#define BUFFER_SIZE ( 2 << 20)
#elif defined(PPC440FP2)
#define BUFFER_SIZE ( 16 << 20)
#elif defined(POWER8) || defined(POWER9)
#elif defined(POWER8) || defined(POWER9) || defined(POWER10)
#define BUFFER_SIZE ( 64 << 20)
#else
#define BUFFER_SIZE ( 16 << 20)

View File

@@ -45,6 +45,10 @@
#define SSYMV_THREAD_U ssymv_thread_U
#define SSYMV_THREAD_L ssymv_thread_L
#define SGEMM_DIRECT_PERFORMANT sgemm_direct_performant
#define SGEMM_DIRECT sgemm_direct
#define SGEMM_ONCOPY sgemm_oncopy
#define SGEMM_OTCOPY sgemm_otcopy
@@ -160,6 +164,16 @@
#define SGEADD_K sgeadd_k
#define SGEMM_SMALL_KERNEL_NN sgemm_small_kernel_nn
#define SGEMM_SMALL_KERNEL_NT sgemm_small_kernel_nt
#define SGEMM_SMALL_KERNEL_TN sgemm_small_kernel_tn
#define SGEMM_SMALL_KERNEL_TT sgemm_small_kernel_tt
#define SGEMM_SMALL_KERNEL_B0_NN sgemm_small_kernel_b0_nn
#define SGEMM_SMALL_KERNEL_B0_NT sgemm_small_kernel_b0_nt
#define SGEMM_SMALL_KERNEL_B0_TN sgemm_small_kernel_b0_tn
#define SGEMM_SMALL_KERNEL_B0_TT sgemm_small_kernel_b0_tt
#else
#define SAMAX_K gotoblas -> samax_k
@@ -204,6 +218,14 @@
#define SSYMV_THREAD_U ssymv_thread_U
#define SSYMV_THREAD_L ssymv_thread_L
#ifdef ARCH_X86_64
#define SGEMM_DIRECT_PERFORMANT gotoblas -> sgemm_direct_performant
#define SGEMM_DIRECT gotoblas -> sgemm_direct
#else
#define SGEMM_DIRECT_PERFORMANT sgemm_direct_performant
#define SGEMM_DIRECT sgemm_direct
#endif
#define SGEMM_ONCOPY gotoblas -> sgemm_oncopy
#define SGEMM_OTCOPY gotoblas -> sgemm_otcopy
#define SGEMM_INCOPY gotoblas -> sgemm_incopy

65
common_sh.h Normal file
View File

@@ -0,0 +1,65 @@
#ifndef COMMON_SH_H
#define COMMON_SH_H
#ifndef DYNAMIC_ARCH
#define SHGEMM_ONCOPY shgemm_oncopy
#define SHGEMM_OTCOPY shgemm_otcopy
#if SHGEMM_DEFAULT_UNROLL_M == SHGEMM_DEFAULT_UNROLL_N
#define SHGEMM_INCOPY shgemm_oncopy
#define SHGEMM_ITCOPY shgemm_otcopy
#else
#define SHGEMM_INCOPY shgemm_incopy
#define SHGEMM_ITCOPY shgemm_itcopy
#endif
#define SHGEMM_BETA shgemm_beta
#define SHGEMM_KERNEL shgemm_kernel
#else
#define SHGEMM_ONCOPY gotoblas -> shgemm_oncopy
#define SHGEMM_OTCOPY gotoblas -> shgemm_otcopy
#define SHGEMM_INCOPY gotoblas -> shgemm_incopy
#define SHGEMM_ITCOPY gotoblas -> shgemm_itcopy
#define SHGEMM_BETA gotoblas -> shgemm_beta
#define SHGEMM_KERNEL gotoblas -> shgemm_kernel
#endif
#define SHGEMM_NN shgemm_nn
#define SHGEMM_CN shgemm_tn
#define SHGEMM_TN shgemm_tn
#define SHGEMM_NC shgemm_nt
#define SHGEMM_NT shgemm_nt
#define SHGEMM_CC shgemm_tt
#define SHGEMM_CT shgemm_tt
#define SHGEMM_TC shgemm_tt
#define SHGEMM_TT shgemm_tt
#define SHGEMM_NR shgemm_nn
#define SHGEMM_TR shgemm_tn
#define SHGEMM_CR shgemm_tn
#define SHGEMM_RN shgemm_nn
#define SHGEMM_RT shgemm_nt
#define SHGEMM_RC shgemm_nt
#define SHGEMM_RR shgemm_nn
#define SHGEMM_THREAD_NN shgemm_thread_nn
#define SHGEMM_THREAD_CN shgemm_thread_tn
#define SHGEMM_THREAD_TN shgemm_thread_tn
#define SHGEMM_THREAD_NC shgemm_thread_nt
#define SHGEMM_THREAD_NT shgemm_thread_nt
#define SHGEMM_THREAD_CC shgemm_thread_tt
#define SHGEMM_THREAD_CT shgemm_thread_tt
#define SHGEMM_THREAD_TC shgemm_thread_tt
#define SHGEMM_THREAD_TT shgemm_thread_tt
#define SHGEMM_THREAD_NR shgemm_thread_nn
#define SHGEMM_THREAD_TR shgemm_thread_tn
#define SHGEMM_THREAD_CR shgemm_thread_tn
#define SHGEMM_THREAD_RN shgemm_thread_nn
#define SHGEMM_THREAD_RT shgemm_thread_nt
#define SHGEMM_THREAD_RC shgemm_thread_nt
#define SHGEMM_THREAD_RR shgemm_thread_nn
#endif

Some files were not shown because too many files have changed in this diff Show More