Zhang Xianyi
77460ac255
Fix gemm_batch bug for SMALL_MATRIX_OPT=1.
2020-12-14 09:50:47 +08:00
Zhang Xianyi
88e6806e3f
Init cblas_?gemm_batch implementation.
2020-12-14 09:50:32 +08:00
Xianyi Zhang
4130d1732e
Refs #2587 fix small matrix c/zgemm bug.
2020-08-28 22:36:36 +08:00
Xianyi Zhang
255b6dd0fa
Merge branch 'develop' into small_matrices
2020-08-28 21:38:58 +08:00
Xianyi Zhang
741d6c5cb8
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
2020-08-28 21:00:54 +08:00
Martin Kroeker
514a3d7d63
Merge pull request #2798 from kadler/aix-cpuid
...
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler
085aae8bdb
Fix compile error on AIX cpuid detection
...
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Xianyi Zhang
712ca43069
Change a1b0 gemm to b0 gemm.
2020-08-28 07:55:27 +08:00
Martin Kroeker
5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
...
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker
e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1
...
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj
48a1364e10
Add aliases for armv6, armv7
...
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing
0c1c903f1e
Fix OMP num specify issue
...
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com >
2020-08-24 02:45:54 +08:00
Martin Kroeker
a073fa870e
Merge pull request #2791 from martin-frbg/issue2787
...
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker
b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
2020-08-23 15:08:16 +02:00
Martin Kroeker
b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
...
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker
1840bc5b52
Add OpenMP dependency to pkgconfig file if needed
2020-08-22 13:55:18 +02:00
Martin Kroeker
7c0977c267
Add OpenMP dependency to pkgconfig file if needed
2020-08-22 13:53:44 +02:00
Martin Kroeker
fb3d80c42a
Merge pull request #78 from xianyi/develop
...
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker
9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
...
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker
bd3207b4b4
Update system.cmake
2020-08-19 22:51:10 +02:00
Martin Kroeker
b8ebfc9335
Update system.cmake
2020-08-19 22:30:19 +02:00
Martin Kroeker
7c1986640b
fallback from cooperlake to skylake if gcc<10
2020-08-19 20:48:39 +02:00
Martin Kroeker
71d33c952d
Typo fix
2020-08-19 17:44:23 +02:00
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
2020-08-19 17:22:12 +02:00
Martin Kroeker
430f741b30
-march=cooperlake requires gcc10
2020-08-19 17:17:53 +02:00
Martin Kroeker
6f4dc7445d
Fix typo
2020-08-19 16:36:55 +02:00
Martin Kroeker
81fbe8d088
-march=cooperlake only available in gcc >= 10
2020-08-19 16:10:15 +02:00
Martin Kroeker
bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1
2020-08-19 15:06:30 +02:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
...
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel
6b731d917f
Do not require pkg-config to generate the *.pc file
...
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker
5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
...
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker
aa286e301b
Add typedef for bfloat16 if needed
2020-08-17 15:32:14 +02:00
Martin Kroeker
9f0ef9cdfc
Merge pull request #77 from xianyi/develop
...
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker
6bfc66663c
revert
2020-08-17 15:20:41 +02:00
Martin Kroeker
a8c6fb9e1c
revert
2020-08-17 15:20:16 +02:00
Martin Kroeker
5ec8f716cf
revert
2020-08-17 15:19:40 +02:00
Martin Kroeker
82f8a0aeba
Update .drone.yml
2020-08-15 15:46:18 +02:00
Martin Kroeker
d57d503c15
Update Makefile
2020-08-15 14:46:26 +02:00
Martin Kroeker
37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build
2020-08-15 13:38:05 +02:00
Martin Kroeker
6a93e3b2ba
Add simple sgemm preicsion test
2020-08-15 13:33:52 +02:00
Martin Kroeker
47ce1dd08f
Update gemm64.cpp
2020-08-15 13:31:28 +02:00
Martin Kroeker
f5fcc5baec
Add trivial gemm test for multithread consistency
2020-08-15 13:30:29 +02:00
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
...
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
efdd237a91
Add a dedicated POWER9 build to the Travis CI ( #2774 )
...
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker
4573cb2f43
Merge pull request #2765 from martin-frbg/issue2760
...
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker
2a4bb797db
Merge pull request #2773 from martin-frbg/issue2770
...
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker
cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
...
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker
619343278d
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 13:40:40 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Martin Kroeker
62f4c84f27
Merge pull request #76 from xianyi/develop
...
rebase
2020-08-11 13:25:12 +02:00
Marius Hillenbrand
e115c97e05
s390x/SGEMM: adjust default P and Q to multiples of M
...
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:56:46 +02:00
Marius Hillenbrand
07c334e7be
s390x: Factor out small block sizes for SGEMM/DGEMM on z14
...
For small register blockings that are too small to fill up vector
registers with column vectors, we currently use a generic code block.
Replace that with instantiations of the generic code as individual
functions, so that the compiler can optimize each one separately.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:56:39 +02:00
Marius Hillenbrand
e2828e30aa
s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
...
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.
Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:55:42 +02:00
Martin Kroeker
7219c9cb87
Merge pull request #2764 from martin-frbg/lapacktests
...
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker
c9d32674ea
Add memory barrier to the blas_lock implementation for Linux
...
as recommended by cparrott73 in #2760
2020-08-09 19:17:04 +02:00
Martin Kroeker
64259d521a
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL
2020-08-09 13:02:27 +02:00
Martin Kroeker
6f5ca44c1a
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3
2020-08-09 12:59:20 +02:00
Martin Kroeker
d28b3f2776
Create Jenkinsfile for OSUOSL PowerCI
2020-08-08 18:05:20 +02:00
Martin Kroeker
ba3f7b3acf
Merge pull request #2761 from RajalakshmiSR/Makefile_err
...
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00
Rajalakshmi Srinivasaraghavan
475b5c95b9
Remove extra symbol in Makefile
...
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker
cd60080d4a
Merge pull request #2758 from martin-frbg/undef_shift
...
Fix GCC ubsan warnings in x86_64 complex dot and gemv_t kernels
2020-08-03 23:30:26 +02:00
Martin Kroeker
4847bfdddd
Merge pull request #2757 from martin-frbg/cmake64
...
Fix lapack-tests linking to a suffixed libopenblas in cmake builds
2020-08-02 23:05:21 +02:00
Martin Kroeker
81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number
...
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number
...
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb
Multiply by two instead of left-shifting one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c
Multiply by two rather than left shift by one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Martin Kroeker
aaf1a17168
Apply current library name suffix
2020-08-02 17:58:33 +02:00
Martin Kroeker
53add6a80d
Apply library name suffix to openblas if any
2020-08-02 17:57:12 +02:00
Martin Kroeker
9eb897cc01
Merge pull request #75 from xianyi/develop
...
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker
7cead56258
Merge pull request #2753 from martin-frbg/issue2751
...
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker
6794ac3415
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed
2020-08-02 11:20:08 +02:00
Martin Kroeker
ecf4b9e0fc
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition
2020-08-01 17:06:03 +02:00
Martin Kroeker
dfe5d09641
Merge pull request #2756 from martin-frbg/issue2755
...
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker
60cd5e55fc
Protect against inadvertent activation of USE_CUDA
2020-08-01 12:31:39 +02:00
Martin Kroeker
da9e2a7ada
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-07-31 16:03:33 +02:00
Martin Kroeker
c88cbc5e0d
Merge pull request #2752 from kadler/cpuid_aix
...
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00
Kevin Adler
589c74aed3
Use systemcfg APIs for CPU detection on AIX
...
AIX libc already provides ready access to an integer that contains a bit
identifying the CPU it's running on, so there's no need to call a
program and grep its output. Additionally, prtconf is not available in
the PASE runtime, which provides an AIX emulation layer on the IBM i
operating system.
The AIX systemcfg.h also provides macro definitions like POWER_8,
POWER_9, etc for all the bits defining the CPUs as well as macros like
__power_8(), __power_9_andup() that return booleans, but I did not use
them. Since these macros depend on the level of the OS in which it is
built, they may not be defined and instead the associated hex literals
are used directly.
2020-07-30 21:06:40 -05:00
Martin Kroeker
104aa678b0
Fix inadvertent version number reversal to 0.3.9.dev caused by #2710
2020-07-30 11:40:52 +02:00
Martin Kroeker
c6b48e0394
Merge pull request #2749 from martin-frbg/make_ppc
...
Reorganize OpenMP build options for POWER and allow compiling for POWER9 with old gcc
2020-07-30 11:35:53 +02:00
Martin Kroeker
4927251298
Merge pull request #2750 from RajalakshmiSR/dgemv_p10
...
dgemv optimization for POWER10
2020-07-30 10:13:19 +02:00
Rajalakshmi Srinivasaraghavan
f77b6a83f4
dgemv optimization for POWER10
...
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1. Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Martin Kroeker
39724e8128
Separate OpenMP handling and allow compilation of Power9 code with older gcc
2020-07-30 01:14:08 +02:00
Martin Kroeker
525db5401c
Merge pull request #74 from xianyi/develop
...
rebase
2020-07-30 01:04:09 +02:00
Martin Kroeker
cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
...
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker
7c02f4b1f7
Merge pull request #2744 from martin-frbg/issue2738
...
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker
383262035d
Merge pull request #2740 from RajalakshmiSR/clang-power
...
Fix compilation issues with clang on POWER
2020-07-28 18:15:25 +02:00
Martin Kroeker
5fa581c87e
Put hint to use git develop rather than master branch in README
2020-07-28 14:22:41 +00:00
Martin Kroeker
12918358aa
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
...
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Martin Kroeker
200f5c44cc
Add AMD Renoir models and preliminary support for ZEN3 as ZEN2
...
also remap erroneous family 16 entry to BOBCAT and reclaim erroneous family 25 "Barcelona" for Zen3
2020-07-28 13:45:23 +00:00
Martin Kroeker
64e2e4aaf3
missing braces
2020-07-27 20:19:22 +00:00
Martin Kroeker
921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel
2020-07-27 19:54:46 +00:00
Rajalakshmi Srinivasaraghavan
d557584b71
Fix compilation issues with clang on POWER
...
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Martin Kroeker
a4ceb1ade9
Merge pull request #2737 from ashwinyes/add_thunderx3_target
...
ARM64: Add THUNDERX3T110 Target
2020-07-27 15:19:47 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
49b83e00b7
Merge pull request #2735 from martin-frbg/move_potrf
...
Move potrf_parallel.c from lapack/getrf to lapack/potrf where it belongs
2020-07-26 19:54:11 +02:00
Martin Kroeker
769ed9ffad
Merge pull request #2734 from RajalakshmiSR/p10_fix
...
Fix to store results in correct order for POWER10 GEMM kernels
2020-07-25 09:02:32 +02:00
Martin Kroeker
f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
...
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff )
the only changes since then were
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50
Change _STDC_VERSION__ to __STDC_VERSION__
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99
Delete potrf_parallel.c (moving it to ../potrf)
2020-07-25 06:42:39 +00:00
Rajalakshmi Srinivasaraghavan
9be2688c78
Fix to store results in correct order for POWER10 GEMM kernels
...
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker
6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694
...
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker
251a09ec90
Typo fix
2020-07-24 16:04:58 +00:00
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
2020-07-24 10:13:46 +00:00
Martin Kroeker
3523bb778e
Merge pull request #2721 from martin-frbg/p8align
...
Fix alignment errors in the power8 saxpy kernel
2020-07-24 11:06:20 +02:00
Martin Kroeker
a50d0e29c8
Merge pull request #2731 from martin-frbg/pgippc
...
Fixes for compilation on POWER with PGI compilers
2020-07-24 11:05:16 +02:00
Martin Kroeker
bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only
2020-07-23 20:40:13 +00:00
Martin Kroeker
ca3561cab9
Add ifdefs around call to altivec microkernel
2020-07-23 18:30:42 +00:00
Martin Kroeker
21072e502a
Typo fix
2020-07-23 17:34:56 +00:00
Martin Kroeker
7c6e56b5df
Rewrite assignment to complex for better portability
2020-07-23 17:10:59 +02:00
Martin Kroeker
661c6bfa5a
Exclude altivec code paths if the compiler does not support them
2020-07-23 17:08:20 +02:00
Martin Kroeker
9796e552ea
Avoid undefining NAME,CNAME etc for pgcc as it makes it ignore the new defininitions
2020-07-23 17:03:28 +02:00
Martin Kroeker
d6b6e5ccd7
Merge pull request #73 from xianyi/develop
...
rebase
2020-07-23 16:59:06 +02:00
Martin Kroeker
349b722d8d
Merge pull request #2729 from martin-frbg/issue2728
...
Unify BUFFER_SIZE settings for x86_64 again to fix DYNAMIC_ARCH crashes
2020-07-22 22:45:57 +02:00
Martin Kroeker
6c33764ca4
Unify BUFFER_SIZE settings for x86_64 again to fix potentially fatal mismatch in DYNAMIC_ARCH builds
2020-07-22 17:30:55 +00:00
Martin Kroeker
d1b9613fd4
Merge pull request #2727 from wyphan/develop
...
Patch for building on POWERPC with PGI compilers (was Patch for building on Summit)
2020-07-21 17:06:53 +02:00
Martin Kroeker
3cfc74b1a0
Merge pull request #2726 from martin-frbg/2725-2
...
Add detection of stdatomic.h for cmake
2020-07-21 16:42:06 +02:00
Wileam Phan
9ae154ba89
Patch for building on Summit
2020-07-20 23:30:28 -04:00
Martin Kroeker
9e21a100e3
Add trivial check for stdatomic.h
2020-07-20 22:52:09 +00:00
Martin Kroeker
31d30312dc
Merge pull request #72 from xianyi/develop
...
rebase
2020-07-21 00:49:12 +02:00
Martin Kroeker
fcfb7ffafb
Merge pull request #2725 from martin-frbg/ccheck_c11
...
Have c_check probe availability of C11 atomics support and stdatomic.h
2020-07-18 23:08:08 +02:00
Martin Kroeker
bbe119ee3b
Update conditional for atomics to use HAVE_C11
2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd
Update conditional for atomics to use HAVE_C11
2020-07-18 17:14:50 +00:00
Martin Kroeker
a36eb19ae0
Update conditional for C11 atomics to use HAVE_C11
2020-07-18 17:13:24 +00:00
Martin Kroeker
ce45af8151
Update conditional for atomics to use HAVE_C11
2020-07-18 17:09:56 +00:00
Martin Kroeker
6f38de06d2
Update conditional for atomics to use HAVE_C11
2020-07-18 17:09:01 +00:00
Martin Kroeker
09eb9d2584
Update conditional for atomics to HAVE_C11
2020-07-18 17:07:38 +00:00
Martin Kroeker
791e046744
Update conditional for atomics to use HAVE_C11
2020-07-18 17:05:59 +00:00
Martin Kroeker
94bab9d1f9
Update conditional for atomics to use HAVE_C11
2020-07-18 17:03:31 +00:00
Martin Kroeker
97d6eb97b1
Report availability of C11 support
2020-07-18 16:59:33 +00:00
Martin Kroeker
4afd11dae5
Add a check for C11 atomics and stdatomic.h
2020-07-18 16:57:41 +00:00
Martin Kroeker
72ec6280c7
Merge pull request #2724 from martin-frbg/loongsonreadme
...
Update cross-compiling example in README to reflect change in Loongson gcc
2020-07-18 18:08:40 +02:00
Martin Kroeker
26b7f24d16
Update cross-compiling example to reflect change in Loongson gcc
...
for #2723
2020-07-18 12:51:37 +00:00
Martin Kroeker
0db4218fed
Merge pull request #2722 from martin-frbg/cmakefcheck
...
Handle lack of fortran compiler more gracefully in cmake
2020-07-17 10:33:03 +02:00
Martin Kroeker
9d000ecaa2
include CheckLanguage module
2020-07-16 22:36:35 +00:00
Martin Kroeker
a847d00366
handle missing lack of fortran compiler more gracefully
2020-07-16 22:17:39 +00:00
Martin Kroeker
0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan
2020-07-16 23:32:54 +02:00
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
2020-07-15 10:00:07 +02:00
Martin Kroeker
4f5d26bb02
Merge pull request #2716 from RajalakshmiSR/p10_ldflag
...
Add new linker option for POWER10
2020-07-15 01:20:54 +02:00
Rajalakshmi Srinivasaraghavan
417c4e8af8
Add new linker option for POWER10
...
While building with DYNAMIC_ARCH on POWER9 with POWER10
aware toolchain, new LDFLAG is needed to avoid POWER10
instructions on PLT calls .
2020-07-14 11:54:04 -05:00
Martin Kroeker
da17abec87
fix trailing whitespace
2020-07-14 18:20:03 +02:00
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
2020-07-14 18:11:19 +02:00
Martin Kroeker
b144423f0f
Do not define USE_TRMM for 32bit POWER8
2020-07-14 18:10:12 +02:00
Martin Kroeker
bd2498c886
Use POWER6 GEMM parameters on 32bit POWER8
2020-07-14 18:07:58 +02:00
Martin Kroeker
d8e2edfc20
Merge pull request #71 from xianyi/develop
...
rebase
2020-07-14 18:01:34 +02:00
Martin Kroeker
419b8686d1
Merge pull request #2682 from martin-frbg/aix
...
[WIP] fix compilation on AIX
2020-07-13 14:43:24 +02:00
Martin Kroeker
3ab15ff34c
Merge pull request #2651 from leezu/actionsflang
...
Add flang build to Github Actions
2020-07-13 13:00:39 +02:00
Martin Kroeker
8916c4ae2c
Merge branch 'develop' into actionsflang
2020-07-12 20:37:29 +02:00
Martin Kroeker
4fa283de66
Merge pull request #2706 from jussienko/use-always-omp-threads
...
Fix OpenMP builds defaulting to singlethreading with OMP_PLACES or OMP_PROC_BIND set
2020-07-12 20:17:11 +02:00
Martin Kroeker
5865c7d4d6
Make 32bit POWER8 use POWER6 kernels for now
2020-07-12 18:59:01 +02:00
Martin Kroeker
ae3a90f78f
merge overwritten part of power10 support
2020-07-12 18:51:58 +02:00
Martin Kroeker
009864edde
Merge pull request #2710 from martin-frbg/cmake-lapacktest
...
Add LAPACK-TESTING to the cmake build
2020-07-10 12:06:50 +02:00
Martin Kroeker
3de80b3f5a
Merge pull request #2713 from RajalakshmiSR/p10-gcc10
...
Change minimum gcc version for POWER10
2020-07-10 10:43:33 +02:00
Rajalakshmi Srinivasaraghavan
af1e140e35
Change minimum gcc version for POWER10
...
As the MMA patches for POWER10 are backported to gcc10.2, changing
the minimum gcc version needed to build OpenBLAS for POWER10.
2020-07-09 21:46:06 -05:00
Martin Kroeker
d4a0299e16
Do not build lapack-test on MSVC for now (same as with BLAS test)
2020-07-09 13:57:27 +02:00
Martin Kroeker
f766024749
enable fortran for cmake
2020-07-09 13:44:25 +02:00
Martin Kroeker
c502760bef
Modify for building with OpenBLAS
2020-07-09 13:13:16 +02:00
Martin Kroeker
29b5887d5f
Modify for building with OpenBLAS
2020-07-09 13:12:35 +02:00
Martin Kroeker
60188a8c82
Append crude hack for enabling lapack tests in the OpenBLAS build
2020-07-09 11:44:31 +02:00
Martin Kroeker
1d63631afe
Add lapack-test
2020-07-09 11:42:02 +02:00
Martin Kroeker
e82bb953a7
Merge pull request #2708 from RajalakshmiSR/p10_future
...
Changing mcpu option as power10
2020-07-08 12:26:44 +02:00
Martin Kroeker
ed7e155c35
Merge branch 'develop' into aix
2020-07-07 18:52:06 +02:00
Rajalakshmi Srinivasaraghavan
45d819ca82
Changing mcpu option as power10
...
As compiler enabled mcpu option as power10, changing it from future.
2020-07-07 11:25:20 -05:00
Martin Kroeker
8751a69271
Obtain actual cpu count on AIX and suppress spurious NO_AVX512 on non-x86
2020-07-07 15:46:32 +02:00
Jussi Enkovaara
10a2923f64
fixes #2238
...
Always obey omp_get_max_threads() when build with USE_OPENMP
2020-07-07 13:35:43 +03:00
Martin Kroeker
5ff83a4261
Merge pull request #2670 from mhillenibm/dumpfullversion_on_gcc7
...
RFC: Use -dumpfullversion to get minor version on gcc-7 and newer
2020-07-07 00:12:28 +02:00
Martin Kroeker
5bc9680a86
Merge pull request #2703 from martin-frbg/issue2702
...
Compatibility fix for gcc < 4.7
2020-07-02 22:32:51 +02:00
Martin Kroeker
4ab3651591
Option -mavx2 requires at least gcc 4.7
2020-07-02 17:00:15 +02:00
Martin Kroeker
a83680b40b
Merge pull request #69 from xianyi/develop
...
rebase
2020-07-02 16:56:00 +02:00
Martin Kroeker
c3aa036e99
Merge pull request #2693 from EGuesnet/AIX-build-on-POWER8-32bits
...
AIX build on POWER8 32bits
2020-07-01 08:29:52 +02:00
EGuesnet
634e1305f9
Update cgemm_kernel_8x4_power8.S
2020-06-30 15:16:39 +02:00
Martin Kroeker
c467516132
Merge pull request #2688 from martin-frbg/cometlake
...
Add autodetection of Intel Comet Lake H and S models
2020-06-27 17:47:24 +02:00
Martin Kroeker
83f4746825
Add support for Comet Lake H and S
2020-06-27 14:41:24 +02:00
Martin Kroeker
584ef8d4ae
Add support for Comet Lake H & S
2020-06-27 14:36:37 +02:00
Martin Kroeker
8dfda02e89
Merge pull request #68 from xianyi/develop
...
rebase
2020-06-27 14:29:29 +02:00
Martin Kroeker
28d69e0097
Merge pull request #2687 from martin-frbg/utfbom
...
Strip UTF8 byte order marker from source files
2020-06-26 22:53:09 +02:00
Martin Kroeker
c2467c9619
Merge pull request #2686 from RajalakshmiSR/p10_shgemm
...
powerpc: Optimized SHGEMM kernel for POWER10
2020-06-26 22:52:45 +02:00
Martin Kroeker
f86e749df4
Merge pull request #2683 from mtreinish/add-comet-lake-support
...
Add cpu detection support for comet lake U
2020-06-26 12:11:03 +02:00
Martin Kroeker
d199c2787d
Merge pull request #2680 from kavanabhat/aix_makefile_fix
...
Fix for #2671
2020-06-26 11:27:28 +02:00
Martin Kroeker
e30ad0e521
Strip UTF8 byte order marker from source
2020-06-26 09:00:43 +02:00
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
...
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Matthew Treinish
2f9c10810c
Also set CPUTYPE in get_cpuname()
2020-06-25 15:53:56 -04:00
Matthew Treinish
f37e941d52
Add support to driver/others/dynamic.c too
2020-06-25 11:56:49 -04:00
Matthew Treinish
2a91452bdd
Add cpu detection support for comet lake U
...
Comet Lake U CPUs have family: 6, model: 6, extended family: 0, and
extended model: 10 were not being correctly detected by GETARCH during
openblas builds and would show CORE=UNKNOWN and LIBCORE=unknown. This
commit adds the necessary information to cpuid_x86 to detect extended
family 10 model 6 and return the proper core information. It's
essentially just a skylake cpu, not skylake x, so I just took the used
the same return fields as skylake.
2020-06-25 11:32:09 -04:00
Martin Kroeker
c854ef5471
Fix variable names in conditional
2020-06-25 13:29:52 +02:00
Martin Kroeker
c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
...
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Martin Kroeker
c592f0f80a
Fix utest build on AIX
2020-06-25 12:58:13 +02:00
Martin Kroeker
3f613b1301
Tentative changes for building on AIX
2020-06-25 12:57:00 +02:00
Martin Kroeker
72a0ec8e75
Fix reading of CPU name from prtconf output on AIX
2020-06-25 12:55:10 +02:00
Martin Kroeker
3446e58daf
Fix handling of uname output on AIX
2020-06-25 12:31:35 +02:00
Martin Kroeker
4ca8becc4b
Merge pull request #67 from xianyi/develop
...
rebase
2020-06-25 10:33:03 +02:00
Martin Kroeker
dfe819f3bd
Merge pull request #2679 from RajalakshmiSR/P10_GEMM
...
POWER10 Optimized GEMM kernels
2020-06-25 08:31:38 +02:00
Martin Kroeker
4369e52555
Merge pull request #2677 from brada4/develop
...
address vs2019 C4293 warning
2020-06-25 08:31:17 +02:00
Gordon Fossum
bb2f52844b
powerpc: Optimized ZGEMM kernel for POWER10
...
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan
571eadb880
powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
...
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat
df4ade070f
Fix for #2671
2020-06-24 04:25:47 -05:00
User User-User
e6b9275034
address vs2019 C4293
2020-06-24 09:12:23 +03:00
Martin Kroeker
53ea5bfece
Merge pull request #66 from xianyi/develop
...
rebase
2020-06-23 10:13:44 +02:00
Martin Kroeker
93592d1260
Merge pull request #2675 from wjc404/develop
...
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
Martin Kroeker
6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
...
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker
45d542c9d1
Merge pull request #65 from xianyi/develop
...
rebase
2020-06-21 12:41:01 +02:00
wjc404
086d87a302
AVX512 dgemm tcopy_16 function
2020-06-20 00:07:43 +08:00
Martin Kroeker
af501eb753
Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection
...
Zarch fix gcc detection
2020-06-17 17:55:25 +02:00
Martin Kroeker
0eb6c4dded
Merge pull request #2672 from mhillenibm/test_num_threads
...
cpp_thread_test: Change adjustment of concurrency on systems with <52 hw threads
2020-06-17 17:54:31 +02:00
Marius Hillenbrand
de838c38ef
cpp_thread_test/dgemv: fail early if concurrency is zero
...
The two test cases dgemv_tester and dgemm_tester accept the degree of
concurrency as command line argument (amongst others). Fail early if
value 0 has been specified, instead of later with less-clear symptoms.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-17 16:15:44 +02:00
Marius Hillenbrand
478898b37a
cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems
...
... instead of (number of hw threads - 4) to avoid invalid numbers on
smaller systems. Currently, systems with 4 or fewer CPUs (e.g., small CI
VMs) would fail the test. Fixes one of the issues discussed in #2668
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-17 16:08:48 +02:00
Marius Hillenbrand
cde4690721
RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x
...
In gcc-7.1, the behavior of -dumpversion changed to be configured
at compile-time. On some distributions it only dumps the major version
(e.g., Ubuntu), so the current checks for the gcc minor version report
false negatives. As a replacement, gcc-7.1 introduced -dumpfullversion
which always prints the full version.
Update the gcc version detection in Makefile.system to employ
-dumpfullversion with gcc-7 and newer.
Posting this patch for discussion, since it emerged from discussions
around issue #2668 and PR #2669 . It is not solving a problem right now,
but may be useful in the future.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:45:59 +02:00
Marius Hillenbrand
2389291766
Makefile.system: remove duplicate variable GCCVERSIONGT5
...
... to bring unified gcc version detection with common variables to the
one remaining spot in Makefile.system.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:06:03 +02:00
Marius Hillenbrand
a2d13ea611
Fix gcc version detection for zarch
...
Employ common variables for gcc version detection and fix the broken
check for gcc >= 5.2.
Fixes #2668
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:06:03 +02:00
Martin Kroeker
1bd3cd66c2
Increment version to 0.3.10.dev
2020-06-14 22:05:19 +02:00
Martin Kroeker
1c53e1366d
Increment version to 0.3.10.dev
2020-06-14 22:04:37 +02:00
Martin Kroeker
95dbeff66d
Merge branch 'release-0.3.0' into develop
2020-06-14 22:02:45 +02:00
Martin Kroeker
3b673a24b7
Increment version to 0.3.10.dev
2020-06-14 21:57:52 +02:00
Martin Kroeker
1eb1979050
Increment version to 0.3.10.dev
2020-06-14 21:57:15 +02:00
Martin Kroeker
efc53b6e7e
Merge pull request #2665 from martin-frbg/flang-fixes-2a
...
Fix spelling of flang option -Mrecursive, add -Kieee and workaround for AOCC optimizer bug
2020-06-14 21:56:08 +02:00
Martin Kroeker
72888497e2
Update with 0.3.10 changes
2020-06-14 21:55:31 +02:00
Martin Kroeker
7e3e006af6
Merge pull request #2666 from martin-frbg/blastest
...
Update BLAS tests to what netlib 3.9.0 uses
2020-06-14 18:28:37 +02:00
Martin Kroeker
d906d14402
Merge pull request #2664 from ACSimon33/exported_symbols
...
Add missing exported symbols.
2020-06-14 18:27:03 +02:00
Martin Kroeker
3785c0e82b
Merge pull request #2663 from martin-frbg/issue2654
...
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-14 18:26:43 +02:00
Martin Kroeker
f2d8879af6
Merge pull request #2661 from martin-frbg/issue2660
...
Report selected DYNAMIC_ARCH kernel rather than one of its aliases in gotoblas_corename
2020-06-14 18:25:37 +02:00
Martin Kroeker
6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead
2020-06-14 17:40:24 +02:00
Martin Kroeker
79cdcde717
Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang
2020-06-14 17:18:16 +02:00
Martin Kroeker
18a11137f1
Update BLAS tests to correspond to Reference-LAPACK 3.9.0
...
replaces calculation of machine precision with call to epsilon intrinsic and removes the requirement for previous output files to be removed before rerunning tests
2020-06-14 10:26:25 +02:00
Martin Kroeker
1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee
2020-06-14 00:09:31 +02:00
Martin Kroeker
0ed2adf0b2
Fix spelling of flang option -Mrecursive and add -Kieee
2020-06-14 00:01:20 +02:00
Martin Kroeker
abf670757b
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-13 23:21:13 +02:00
Simon Märtens
41fc6f3cd2
Added missing exported symbols.
2020-06-13 22:37:39 +02:00
Martin Kroeker
007d9f97d7
Make gotoblas_corename report the name of the selected TARGET rather than its aliases
2020-06-13 19:25:28 +02:00
Martin Kroeker
63d26090f5
Merge pull request #64 from xianyi/develop
...
rebase
2020-06-13 19:14:47 +02:00
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
...
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker
3a1b58d54a
Merge pull request #2653 from craft-zhang/cortex-a53
...
fix INIT8x4 of SGEMM on Arm Cortex-A53
2020-06-10 12:19:33 +02:00
Martin Kroeker
f7659be4a0
Merge pull request #2652 from martin-frbg/flang-fixes
...
Fixes for compilation with flang binary release 20190329
2020-06-09 20:31:06 +02:00
ZhangDanfeng
bc6fd20a40
fix INIT8x4
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-10 01:01:16 +08:00
Martin Kroeker
3ce469a34f
Limit optimization level to O1 for flang and add -frecursive
2020-06-09 16:11:13 +02:00
Martin Kroeker
ba2c5b404d
When building with flang, use it also for the final link step to get dependencies right
2020-06-09 16:09:34 +02:00
Martin Kroeker
f07a80354b
Apply previously AOCC-specific workaround to all versions of flang
2020-06-09 16:07:03 +02:00
Martin Kroeker
fdd1b50263
Merge pull request #63 from xianyi/develop
...
rebase
2020-06-09 15:54:30 +02:00
Leonard Lausen
b98923f33a
Test enforce -O1 for flang
2020-06-09 06:54:47 +00:00
Leonard Lausen
4cb1db0e3b
Test flang build
2020-06-09 06:31:17 +00:00
Martin Kroeker
430e8b45fe
Merge pull request #2648 from martin-frbg/lapack411
...
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
2020-06-07 19:45:52 +02:00
Martin Kroeker
88fe85f4e0
Merge pull request #2647 from martin-frbg/aocc-flang
...
Small fixes for flang in general and the AMD AOCC version of it in particular
2020-06-07 19:45:11 +02:00
Martin Kroeker
89091e6b64
Merge pull request #2645 from martin-frbg/misc_fixes
...
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker
522aaf53bf
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
...
Reference-LAPACK issue 411
2020-06-07 14:30:20 +02:00
Martin Kroeker
c3574ffe53
Merge pull request #2646 from wjc404/develop
...
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
Martin Kroeker
4e28dc6353
Use only -O1 with AMD AOCC version of flang
...
to prevent miscompilation of LAPACK codes and tests on Ryzen
2020-06-07 00:05:02 +02:00
Martin Kroeker
13c28889a2
Update "cosmetic fixes for non-C99 compilers"
2020-06-06 15:22:27 +02:00
wjc404
0e3ac4a06b
Add files via upload
2020-06-06 14:56:57 +08:00
Martin Kroeker
28915eed72
Cosmetic fixes for non-C99 compilers
2020-06-05 10:05:34 +02:00
Martin Kroeker
7f60fb6b91
Delete spurious copy of common_param.h
2020-06-05 10:04:16 +02:00
Martin Kroeker
0464e662ad
make blas_quickdivide unsigned and guard against miscompilation
2020-06-05 10:03:36 +02:00
Martin Kroeker
0f9a935a5a
Merge pull request #62 from xianyi/develop
...
rebase
2020-06-05 09:51:06 +02:00
Martin Kroeker
79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
...
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
2020-06-04 19:07:27 +02:00
Martin Kroeker
32c1c1e125
Update azure-pipelines.yml
2020-06-04 19:03:46 +02:00
Martin Kroeker
f1953b8b81
Update azure-pipelines.yml
2020-06-04 17:58:13 +02:00
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-04 14:45:31 +02:00
Martin Kroeker
729303e5ed
Merge pull request #2643 from craft-zhang/cortex-a53
...
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker
547965530f
Merge pull request #2638 from leezu/actions
...
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng
9b7877ccf1
sgemm copy source init
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1
Insert prefetch
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-04 02:08:48 +08:00
Martin Kroeker
3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
...
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker
a8f42ae85c
set cmake build type to Release
2020-06-03 15:28:59 +02:00
Martin Kroeker
e6e2e531bc
revert clang pragma
2020-06-03 15:16:27 +02:00
Martin Kroeker
456dc04441
Update sgemm_kernel_16x4_skylakex_3.c
2020-06-03 15:15:41 +02:00
Martin Kroeker
89323458a9
preset optimization level for apple clang
2020-06-03 15:07:25 +02:00
Martin Kroeker
e153bdeb70
Update dynamic_arch.yml
2020-06-03 13:46:43 +02:00
Martin Kroeker
c2001f7756
Make cmake build verbose to see options in use
2020-06-03 12:18:15 +02:00
Martin Kroeker
c2b3f0b3f6
Revert "keep Apple Clang from optimizing this"
2020-06-03 10:22:15 +02:00
Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
2020-06-03 09:15:29 +02:00
Martin Kroeker
b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
...
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker
9f7358d7dc
Keep Apple Clang from optimizing this
2020-06-03 08:52:53 +02:00
Martin Kroeker
54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers)
2020-06-02 17:31:45 +02:00
Leonard Lausen
5a709b8340
Print CPU info in output
2020-06-01 20:51:11 +00:00
Leonard Lausen
b31a68b835
Add Github Actions test for DYNAMIC_ARCH builds
2020-06-01 20:16:59 +00:00
Martin Kroeker
86552bf4c7
Update f_check
2020-05-31 15:22:12 +02:00
Martin Kroeker
a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
...
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker
4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)
2020-05-31 12:39:36 +02:00
Martin Kroeker
909897f13b
Document option USE_LOCKING
2020-05-31 12:37:57 +02:00
Martin Kroeker
e79245acd9
Merge pull request #2635 from ilayn/patch-1
...
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat
76d2612e0c
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:11:11 +02:00
Martin Kroeker
ced49466f0
Use the fortran compiler to link LAPACK-related benchmarks
...
to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.
2020-05-29 13:35:51 +02:00
Martin Kroeker
6e270f91ec
add support for RETURN_BY_STACK semantics, e.g. clang
2020-05-29 13:29:10 +02:00
Martin Kroeker
200296b0f4
remove libomp from link list only for pgfortran
...
at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default
2020-05-29 13:23:51 +02:00
Martin Kroeker
dd7a650792
Merge pull request #59 from xianyi/develop
...
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker
4a4c50a7ce
Merge pull request #2627 from pkubaj/patch-1
...
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker
d069780e63
Merge pull request #2626 from docularxu/working-gcc-version-detections
...
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj
33c8790603
Add powerpc (32-bit)
...
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu
06387ac0e6
make GCC version detection OS-independent
...
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.
The fix is to move GCC version detection into common part, not
wrapped by anything.
Signed-off-by: Guodong Xu <guodong.xu@linaro.com >
2020-05-25 10:57:03 +00:00
Martin Kroeker
f1a18d245b
Merge pull request #2618 from craft-zhang/cortex-A53
...
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫
2a3aa91354
update CONTRIBUTORS.md, adding myself
2020-05-20 22:35:26 +08:00
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
2020-05-20 22:34:47 +08:00
张丹枫
9df79ae9a3
update sgemm and strmm kernel selecting strategy
2020-05-20 22:26:58 +08:00
张丹枫
a1fc6041cd
use general register to speedup
2020-05-20 22:26:58 +08:00
张丹枫
edb423d772
align general register using to strmm_kernel_8x8
2020-05-20 22:26:58 +08:00
zhangdanfeng
0e6eb8c247
sgemm kernel use sgemm_kernel_8x8_cortexa53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn >
2020-05-20 22:26:58 +08:00
zhangdanfeng
d475db29c6
optimized for cortex-a53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn >
2020-05-20 22:26:58 +08:00
Martin Kroeker
729ac6bd4a
Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
...
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
2020-05-20 14:51:04 +02:00
Marius Hillenbrand
89fe17f20e
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
...
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-20 10:23:35 +02:00
Marius Hillenbrand
bdd795ed03
s390x/GEMM: replace 0-init with peeled first iteration
...
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-20 10:23:35 +02:00
Martin Kroeker
e1038ea836
Merge pull request #2622 from martin-frbg/issue2619
...
Improve declaration of LAPACKE_get_nancheck
2020-05-19 23:07:22 +02:00
Martin Kroeker
6baa9a778d
Improve declaration of LAPACKE_get_nancheck
2020-05-19 17:59:31 +02:00
Martin Kroeker
cf46c9f84e
Merge pull request #2617 from martin-frbg/issue2616
...
Add workaround for unhandled gmake jobserver flags in c_check/f_check
2020-05-18 13:23:58 +02:00
Martin Kroeker
55602fce56
Ignore spurious all-numeric library names derived from mishandled jobserver flags
2020-05-17 15:28:14 +02:00
Martin Kroeker
3d5e159e7a
Ignore spurious all-numeric library names derived from mishandled jobserver flags
2020-05-17 15:26:57 +02:00
Martin Kroeker
2931feb575
Merge pull request #58 from xianyi/develop
...
rebase
2020-05-17 15:23:32 +02:00
Martin Kroeker
20245ded5f
Merge pull request #2615 from mhillenibm/z14_alignment_hints
...
s390x: improvise vector alignment hints for older compilers
2020-05-14 21:06:34 +02:00
Marius Hillenbrand
2840432e49
s390x: improvise vector alignment hints for older compilers
...
Introduce inline assembly so that we can employ vector loads with
alignment hints on older compilers (pre gcc-9), since these are still
used in distributions such as RHEL 8 and Ubuntu 18.04 LTS.
Informing the hardware about alignment can speed up vector loads. For
that purpose, we can encode hints about 8-byte or 16-byte alignment of
the memory operand into the opcodes. gcc-9 and newer automatically emit
such hints, where applicable. Add a bit of inline assembly that achieves
the same for older compilers. Since an older binutils may not know about
the additional operand for the hints, we explicitly encode the opcode in
hex.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-14 15:36:03 +02:00
Martin Kroeker
ea78106c71
Merge pull request #2614 from mhillenibm/gemm_vec_z14
...
s390x: Improve performance of SGEMM and STRMM on z14 and newer
2020-05-13 15:09:23 +02:00
Marius Hillenbrand
cb9dc36dd5
Update CONTRIBUTORS.md
...
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 16:14:00 +02:00
Marius Hillenbrand
1b0b4349a1
s390x/Z14: Change register blocking for SGEMM to 16x4
...
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.
On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
71b6eaf459
s390x: Use new sgemm kernel also for strmm on Z14 and newer
...
Employ the newly added GEMM kernel also for STRMM on Z14. The
implementation in C with vector intrinsics exploits FP32 SIMD operations
and thereby gains performance over the existing assembly code. Extend
the implementation for handling triangular matrix multiplication,
accordingly. As added benefit, the more flexible C code enables us to
adjust register blocking in the subsequent commit.
Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
43c0d4f312
s390x: Add vectorized sgemm kernel for Z14 and newer
...
Add a new GEMM kernel implementation to exploit the FP32 SIMD
operations introduced with z14 and employ it for SGEMM on z14 and newer
architectures.
The SIMD extensions introduced with z13 support operations on
double-sized scalars in vector registers. Thus, the existing SGEMM code
would extend floats to doubles before operating on them. z14 extended
SIMD support to operations on 32-bit floats. By employing these
instructions, we can operate on twice the number of scalars per
instruction (four floats in each vector registers) and avoid the
conversion operations.
The code is written in C with explicit vectorization. In experiments,
this kernel improves performance on z14 and z15 by around 2x over the
current implementation in assembly. The flexibilty of the C code paves
the way for adjustments in subsequent commits.
Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking (e.g., partial register blocks with
fewer than UNROLL_M rows and/or fewer than UNROLL_N columns).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
d7c1677c20
Update CONTRIBUTORS.md, adding myself
...
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:09:28 +02:00
Marius Hillenbrand
0dbe61a612
s390x: choose SIMD kernels at run-time based on OS and compiler support
...
Extend and simplify the run-time detection for dynamic architecture support for z
to check HW_CAP and only use SIMD features if advertised by the OS.
While at it, also honor the env variable LD_HWCAP_MASK and do not use
the CPU features masked there.
Note that we can only use the SIMD features on z13 or newer (i.e.,
Vector Facility or Vector-Enhancements Facilities) when the operating
system supports properly context-switching the vector registers. The OS
advertises that support as a bit in the HW_CAP value in the auxiliary
vector. While all recent Linux kernels have that support, we should
maintain compatibility with older versions that may still be in use.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
62cf391cbb
s390x: only build kernels supported by gcc with dynamic arch support
...
When building with dynamic arch support, only build kernels for
architectures that are supported by the gcc we are building with.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
8c338616f9
s390x: gate dynamic arch detection on gcc version and add generic
...
When building OpenBLAS with DYNAMIC_ARCH=1 on s390x (aka zarch), make
sure to include support for systems without the facilities introduced
with z13 (i.e., zarch_generic). Adjust runtime detection to fallback to
that generic code when running on a unknown platform other than Z13
through Z15.
When detecting a Z13 or newer system, add a check for gcc support for
the architecture-specific features before selecting the respective
kernel. Fallback to Z13 or generic code, in case.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Martin Kroeker
f94c53ec0a
Merge pull request #2612 from RajalakshmiSR/testshgemm
...
Improve shgemm test
2020-05-12 08:34:02 +02:00
Rajalakshmi Srinivasaraghavan
8efba9b7c0
Improve shgemm test
...
This patch adds another check to test shgemm results.
2020-05-11 17:15:10 -05:00
Martin Kroeker
4fffa556d8
Merge pull request #2611 from RajalakshmiSR/bench_half
...
Include shgemm in benchtest
2020-05-11 21:08:41 +02:00
Rajalakshmi Srinivasaraghavan
ce90e2bd3f
Include shgemm in benchtest
...
This patch is to enable benchtest for half precision gemm
when BUILD_HALF is set during make.
2020-05-11 09:57:46 -05:00
Martin Kroeker
948b6712ba
Merge pull request #2610 from martin-frbg/issue2552-3
...
Temporary workaround for excessive LAPACK test failures with COMPLEX on Skylake-X
2020-05-10 13:10:31 +02:00
Martin Kroeker
2271c3506b
Work around excessive LAPACK test failures on Skylake-X
...
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
2020-05-09 23:49:18 +02:00
Martin Kroeker
db00b21445
Merge pull request #2609 from martin-frbg/issue2552-2
...
Correct ifort options
2020-05-09 21:33:02 +02:00
Martin Kroeker
58d26b4448
Correct ifort options
...
to same as suggested by reference-lapack
2020-05-09 17:15:36 +02:00
Martin Kroeker
8e47d14053
Merge pull request #2608 from martin-frbg/issue2604
...
Handle trailing whitespace and empty variables in KERNEL files
2020-05-09 16:36:14 +02:00
Martin Kroeker
cd10b35fe9
Handle trailing spaces and empty condition variables
2020-05-09 13:42:33 +02:00
Martin Kroeker
9472dd99cd
Merge pull request #57 from xianyi/develop
...
rebase
2020-05-09 13:20:44 +02:00
Martin Kroeker
7181665452
Merge pull request #2605 from RajalakshmiSR/cmake-power
...
Fix cmake compilation issue - POWER9
2020-05-09 11:29:28 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
...
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Martin Kroeker
63e45def70
Merge pull request #2603 from martin-frbg/issue2552
...
Add FFLAGS_DRV entry to the generated make.inc to fix lapack-test failure with Intel compilers
2020-05-08 22:08:39 +02:00
Martin Kroeker
ec0f228632
Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort
...
fixes #2552
2020-05-08 18:06:12 +02:00
Martin Kroeker
90e2941c61
Merge pull request #56 from xianyi/develop
...
rebase
2020-05-07 22:43:48 +02:00
Martin Kroeker
10d5f3c87b
Merge pull request #2602 from ashwinyes/thunderx2_develop
...
DAXPY Optimizations for ThunderX2
2020-05-07 22:06:41 +02:00
Ashwin Sekhar T K
8353cb245a
ARM64: Improve DAXPY for ThunderX2
...
Improve performance of DAXPY for ThunderX2
when the vector fits in L1 Cache.
2020-05-07 09:22:50 -07:00
Martin Kroeker
ec2dd7b875
Merge pull request #2601 from martin-frbg/issue818
...
Undefine NAME/CNAME etc in Makefile.system before defining them
2020-05-07 10:12:33 +02:00
Martin Kroeker
4e82eb9f8a
Undefine ASMNAME/NAME/CNAME before defining them
...
to avoid redefinition warning when environment variables like CFLAGS are being used (fixes #818 )
2020-05-07 00:31:32 +02:00
Martin Kroeker
61300bb735
Merge pull request #55 from xianyi/develop
...
rebase
2020-05-07 00:27:14 +02:00
Martin Kroeker
33e9b12464
Merge pull request #2597 from martin-frbg/appleclang
...
Use Clang 9.0.0 miscompilation fix for corresponding AppleClang version as well
2020-05-05 13:55:08 +02:00
Martin Kroeker
90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
...
As discussed on the original PR #2329 , the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
2020-05-05 10:44:50 +02:00
Martin Kroeker
424d551e01
Merge pull request #53 from xianyi/develop
...
rebase
2020-05-01 15:18:46 +02:00
Martin Kroeker
596f5df9e8
Merge pull request #2591 from RajalakshmiSR/testhalf
...
Add test for shgemm
2020-05-01 09:59:39 +02:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
a54e35e780
Merge pull request #2586 from martin-frbg/miscfixes
...
Trivial fix for compiler warnings
2020-04-29 22:01:41 +02:00
Rajalakshmi Srinivasaraghavan
564b0d39ef
Add test for shgemm
...
This patch has Makefile changes to add test for shgemm which
compares sgemm and shgemm result.
2020-04-29 13:40:34 -05:00
Martin Kroeker
5d58b11101
Merge pull request #52 from xianyi/develop
...
rebase
2020-04-29 14:36:15 +02:00
Martin Kroeker
d394d4e677
Merge pull request #2585 from martin-frbg/mips64fix
...
Increase default BUFFER_SIZE on MIPS64
2020-04-28 19:47:55 +02:00
Xianyi Zhang
9d3a317abc
Refs #2587 Fix typos.
2020-04-29 00:19:19 +08:00
Xianyi Zhang
92372c70fc
Fix gemm interface bug for small matrix.
2020-04-28 23:15:20 +08:00
Xianyi Zhang
43bef4aaac
Add alpha=1.0 beta=0.0 for small gemm.
2020-04-28 22:35:36 +08:00
Xianyi Zhang
aae6af94bb
Add small marix optimization kernel interface.
...
make SMALL_MATRIX_OPT=1
2020-04-28 19:02:41 +08:00
Martin Kroeker
f4248af26e
Fix compiler warnings
2020-04-28 10:43:12 +02:00
Martin Kroeker
2d89603e9d
Increase BUFFER_SIZE on mips64 to match SGEMM parameters
2020-04-28 10:40:40 +02:00
Martin Kroeker
26bc15258a
Merge pull request #51 from xianyi/develop
...
rebase
2020-04-28 10:38:50 +02:00
Martin Kroeker
141998dce2
Merge pull request #2584 from martin-frbg/issue2583
...
[WIP] Have CMAKE parse conditional lines in KERNEL files
2020-04-28 10:35:12 +02:00
Martin Kroeker
3bd56846bb
Silence a debug message
2020-04-27 16:27:09 +02:00
Martin Kroeker
e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
...
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker
b6795db731
Merge pull request #2582 from martin-frbg/mips32fix
...
Increase BUFFER_SIZE on MIPS32 to accomodate SGEMM requirements
2020-04-27 09:18:34 +02:00
Martin Kroeker
5e0dbf8dfe
Increase default BUFFER_SIZE to accomodate SGEMM parameters
...
in response to compile-time warning from #2551
2020-04-26 22:21:05 +02:00
Martin Kroeker
955d73127f
Merge pull request #50 from xianyi/develop
...
rebase
2020-04-26 22:17:56 +02:00
Martin Kroeker
a8c1bea7ae
Merge pull request #2581 from martin-frbg/raji
...
Fix travis configuration and update CONTRIBUTORS.md
2020-04-25 19:57:10 +02:00
Martin Kroeker
e43b49e064
Drop the set -e from travis scripts
2020-04-25 16:18:54 +02:00
Martin Kroeker
3e28db7f38
Update CONTRIBUTORS.md
2020-04-25 13:51:44 +02:00
Martin Kroeker
4b69ee31af
Merge pull request #2580 from martin-frbg/issue2538-3
...
Increase POWER8 ZGEMM_R and use same R values for POWER9
2020-04-25 00:28:18 +02:00
Martin Kroeker
03ff213c51
Increase POWER8 ZGEMM_R and use same R values for POWER9
...
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
2020-04-24 21:46:54 +02:00
Martin Kroeker
299d1c8de0
Merge pull request #2578 from martin-frbg/issue2576
...
Quote getarch include paths in prebuild.cmake
2020-04-24 14:32:46 +02:00
Martin Kroeker
70869d571f
Quote include paths for getarch to protect any embedded spaces
2020-04-24 10:30:44 +02:00
Martin Kroeker
cba87222b2
Merge pull request #49 from xianyi/develop
...
rebase
2020-04-24 10:21:48 +02:00
Martin Kroeker
f80dd2151e
xcode 11.4.1 for homebrew ?
2020-04-23 14:31:09 +02:00
Martin Kroeker
4412ee1754
Switch homebrew build env to new xcode 11.4
...
default 11.3.1 in the github image is causing brew to fail with "outdated xcode" message
2020-04-23 10:54:46 +02:00
Martin Kroeker
f6104b68c1
Merge pull request #2571 from martin-frbg/issue2299
...
Work around IDAMAX/IZAMAX bugs on POWER8BE with ELFv2 FreeBSD
2020-04-22 18:27:13 +02:00
Martin Kroeker
84f2c71e93
Merge pull request #2573 from martin-frbg/issue2572
...
Enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 15:04:49 +02:00
Martin Kroeker
06208c8d01
Limit this fix to ELFv2 builds
2020-04-22 14:16:40 +02:00
Martin Kroeker
c90b28dee6
Export ELF_VERSION for use in powerpc kernel configurations
2020-04-22 14:14:20 +02:00
Martin Kroeker
6275b43918
Avoid duplicate printout of byte order and report ELF_VERSION
2020-04-22 14:12:27 +02:00
Martin Kroeker
2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 11:01:28 +02:00
Martin Kroeker
57549f5c92
Merge pull request #2569 from martin-frbg/issue2472-2
...
Fix linker option passing for MSVS and ReLAPACK
2020-04-21 20:26:53 +02:00
Martin Kroeker
f5c4c28b98
Work around POWER8BE bugs on FreeBSD (ELFv2)
...
for #2299
2020-04-21 17:17:17 +02:00
Martin Kroeker
239282d5e2
Use CMAKE_SHARED_LINKER_FLAGS to pass MSVC linker option
...
target_link_libraries does not work here according to issue 2472
2020-04-20 22:30:51 +02:00
Martin Kroeker
568674477c
Merge pull request #48 from xianyi/develop
...
rebase
2020-04-20 21:51:59 +02:00
Martin Kroeker
fa42588e1f
Merge pull request #2565 from martin-frbg/mips24k
...
Support MIPS32 24K family as P5600
2020-04-20 17:13:53 +02:00
Martin Kroeker
8a6d26458b
Merge pull request #2559 from RajalakshmiSR/shgemm
...
Add half precision gemm for bfloat16 in OpenBLAS
2020-04-19 22:09:55 +02:00
Martin Kroeker
db86f516b9
Merge pull request #2568 from martin-frbg/azure-win
...
Add a Windows/CL build job to the Azure CI
2020-04-19 19:06:33 +02:00
Martin Kroeker
aec353b5a7
Add a Windows/CL build to the Azure Ci configuration
2020-04-19 19:04:33 +02:00
Martin Kroeker
c62fbefad4
Merge pull request #2567 from xianyi/revert-2566-azurewin
...
Revert "Add Windows build job on Azure CI"
2020-04-19 19:01:58 +02:00
Martin Kroeker
04706e760d
Revert "Add Windows build job on Azure CI ( #2566 )"
...
This reverts commit e1e543b145 .
2020-04-19 19:00:37 +02:00
Martin Kroeker
e1e543b145
Add Windows build job on Azure CI ( #2566 )
...
* Add Windows-CL build job on Azure
2020-04-19 16:16:15 +02:00
Martin Kroeker
e55ec82bb9
Delete KERNEL.1004K
2020-04-19 15:44:30 +02:00
Martin Kroeker
7353ea5afc
Delete KERNEL.24K
2020-04-19 15:44:19 +02:00
Martin Kroeker
6a04efb122
Rename KERNEL files to include MIPS prefix
2020-04-19 15:43:54 +02:00
Martin Kroeker
5afb66812f
Update getarch.c
2020-04-19 14:55:31 +02:00
Martin Kroeker
0d18f231fc
Update getarch.c
2020-04-19 13:52:58 +02:00
Martin Kroeker
2f4a8e5bc4
Rename the FORCE entries for 24K and 1004K to include the MIPS prefix
2020-04-19 13:22:19 +02:00
Martin Kroeker
4f70512b97
Update kernel.cmake
2020-04-19 08:10:26 +02:00
Martin Kroeker
8792fc4d5f
Disable RPCC macro on MIPS24K
2020-04-19 07:21:48 +02:00
Martin Kroeker
577c5d9f8f
Update README.md
2020-04-19 06:54:52 +02:00
Martin Kroeker
6721f2750e
Update TargetList.txt
2020-04-19 06:51:57 +02:00
Martin Kroeker
b0b02a080d
Add compiler options for MIPS32 24K/1004K
2020-04-19 06:50:51 +02:00
Martin Kroeker
a1fc98dc57
rename 1004K, 24K to MIPS1004K, MIPS24K to avoid identifier naming problem
2020-04-18 23:50:23 +02:00
Martin Kroeker
d0737b0142
Update kernel.cmake
2020-04-18 21:36:28 +02:00
Martin Kroeker
7dbb59b256
Update common_macro.h
2020-04-18 21:34:14 +02:00
Martin Kroeker
00172d440b
Typo fix in MIPS24K addition
2020-04-18 21:16:49 +02:00
Martin Kroeker
d712ea724c
Add MIPS24K support
2020-04-18 21:10:18 +02:00
Martin Kroeker
61bbae3ac1
Handle MIPS24K like P5600
...
and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)
2020-04-18 21:09:32 +02:00
Martin Kroeker
1c1ca2bc0a
Merge pull request #47 from xianyi/develop
...
rebase
2020-04-18 21:07:14 +02:00
Martin Kroeker
c7d668c248
Update common_macro.h
2020-04-18 16:04:38 +02:00
Martin Kroeker
a83a59b038
Use generic kernels for ishama,shasum,shdot,shrot
2020-04-18 15:53:51 +02:00
Martin Kroeker
0a19bd813c
Use generic codes for shamax and shcopy
2020-04-18 12:52:51 +02:00
Martin Kroeker
e7afe8a969
Define AXPBY_K fallback for float16
2020-04-18 11:10:15 +02:00
Martin Kroeker
f361de30a3
Use generic axpy.c for SHAXPY as x86 lacks saxpy.c
2020-04-18 11:07:16 +02:00
Martin Kroeker
9f6d6f6cb6
use saxpy.c instead of axpy.S for SHAXPY
2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan
22bb50fb81
cmake fixes
2020-04-17 13:35:17 -05:00
Martin Kroeker
236a3d8ce6
Merge pull request #2563 from zelong-1024/develop
...
[OpenBLAS]: benchmark error of potrf
2020-04-16 11:45:32 +02:00
l00536773
6b7ef6543a
[OpenBLAS]: benchmark error of potrf
...
[description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F.
[solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation.
[dts]:
2020-04-16 10:55:10 +08:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16
Fix warnings in clang and export symbol
2020-04-15 19:15:23 -05:00
Martin Kroeker
250e6f8039
Merge pull request #2557 from martin-frbg/dronebadge
...
Update and reformat README
2020-04-15 20:23:43 +02:00
Martin Kroeker
7a6d0016b0
Merge pull request #2556 from martin-frbg/epicdrone
...
Add a drone.io multithread test for x86_64
2020-04-15 20:23:17 +02:00
Martin Kroeker
e8e8a6e608
Restore USE_OPENMP in the x86 thread test
2020-04-15 19:26:12 +02:00
Martin Kroeker
579811fb6a
Move all 19.04-based jobs back to ubuntu 18.04
2020-04-15 17:38:33 +02:00
Rajalakshmi Srinivasaraghavan
a87793e03c
Fix DYNAMIC_ARCH compilation errors
2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan
ac6a22ae78
Update header
2020-04-14 22:58:39 -05:00
Rajalakshmi Srinivasaraghavan
ff010f496e
Build shgemm for all architecture
2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
84a9614345
try x86_64 test without openmp
2020-04-14 19:18:35 +02:00
Martin Kroeker
b969533703
Add drone.io badge, mention EMAG8180 support, reformat the DYNAMIC_ARCH paragraph
2020-04-14 10:53:28 +02:00
Martin Kroeker
0f08f3efa6
Add a multithread test for x86_64
2020-04-13 22:46:12 +02:00
Martin Kroeker
c861b2a7bd
Merge pull request #2553 from martin-frbg/issue2444
...
Add a read memory barrier to the traversal of the buffer slot list
2020-04-13 21:28:59 +02:00
Martin Kroeker
cf62adffbb
Merge pull request #2555 from martin-frbg/issue1137
...
Handle unaligned data in the SSE2 copy kernel
2020-04-13 18:29:56 +02:00
Martin Kroeker
3eec7d382c
ARMV7 does not support DMB ISHLD, use DMB ISH
2020-04-13 15:56:31 +02:00
Martin Kroeker
5b0093b5fe
Convert aligned moves to unaligned
...
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
2020-04-13 14:58:52 +02:00
Martin Kroeker
f41600e66f
Add a read barrier in the traversing of the buffer list
...
Needed on systems with weak memory ordering - the inferior, partially working fix from #2544 was already removed in #2551
2020-04-13 12:34:02 +02:00
Martin Kroeker
f5efecb7ca
Add (empty) read barrier definition
2020-04-13 12:24:10 +02:00
Martin Kroeker
a52bdd9d7b
Add (empty) read barrier definition
2020-04-13 12:22:35 +02:00
Martin Kroeker
db3226a646
Add (empty) read barrier definition
2020-04-13 12:18:48 +02:00
Martin Kroeker
69b6e258d8
Add (empty) read barrier definition
2020-04-13 12:17:41 +02:00
Martin Kroeker
3d4db4d002
Add read barrier definition
2020-04-13 12:16:44 +02:00
Martin Kroeker
99dde1d2c9
Add read barrier definition
2020-04-13 12:14:58 +02:00
Martin Kroeker
ee6b3df02c
Add read barrier definition
2020-04-13 12:14:06 +02:00
Martin Kroeker
25e879fe92
Add (empty) read barrier definition
2020-04-13 12:12:54 +02:00
Martin Kroeker
d237dc1360
Add read barrier definition
2020-04-13 12:11:58 +02:00
Martin Kroeker
8692456226
Add read barrier definition
2020-04-13 12:10:37 +02:00
Martin Kroeker
d1d69e1b9a
Add read barrier definition
2020-04-13 12:09:24 +02:00
Martin Kroeker
20d0cb2f65
Merge pull request #46 from xianyi/develop
...
rebase
2020-04-13 12:06:40 +02:00
Martin Kroeker
e7f0da9295
Merge pull request #2551 from martin-frbg/issue2538-2
...
Increase BUFFER_SIZEs and add a safeguard; supply GEMM_R for POWER8/9
2020-04-12 22:34:41 +02:00
Martin Kroeker
e9bfa2291a
Fix parameter overflow
2020-04-12 19:47:02 +02:00
Martin Kroeker
2a28448a96
Add safeguards for sufficient BUFFER_SIZE
2020-04-12 19:45:36 +02:00
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
...
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
f73391c9c9
Merge pull request #45 from xianyi/develop
...
rebase
2020-04-12 19:39:05 +02:00
Martin Kroeker
7905383cb5
Merge pull request #2547 from sharvil/develop
...
Add API to set thread affinity on Linux.
2020-04-11 00:35:38 +02:00
Martin Kroeker
a8cbd451bf
Merge pull request #2541 from bapt/develop
...
libname: treat FreeBSD and DragonFly like linux and sunos
2020-04-11 00:35:07 +02:00
Martin Kroeker
eecd8c3204
Merge pull request #2548 from gxw-loongson/develop
...
Add a GENERIC target for 64bit MIPS
2020-04-11 00:34:04 +02:00
Martin Kroeker
ea85eb2e02
Merge pull request #2549 from martin-frbg/fixthreadtest
...
Match thread count in cpp_thread_test to host capability
2020-04-10 23:54:40 +02:00
Martin Kroeker
66f89c0aaf
Match thread count to machine capability
2020-04-10 22:06:44 +02:00
gxw
8d07cf9b67
Fix compilation problem on loongson platform
...
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Sharvil Nanavati
7b4773b24d
Add API to set thread affinity on Linux.
...
Issue: #2545
2020-04-08 12:49:35 -07:00
Martin Kroeker
69f277f8ee
Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues ( #2544 )
...
* Add another memory barrier in memory.c to prevent races in memory slot allocation
* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
2020-04-08 11:04:51 +02:00
Martin Kroeker
3a6d51c2fd
Merge pull request #44 from xianyi/develop
...
Add a Z13 build to the Travis configuration (#2542 )
2020-04-04 22:48:53 +02:00
Martin Kroeker
1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
...
Revert 42 z12ci to keep forked develop clean
2020-04-04 22:46:58 +02:00
Martin Kroeker
a56c9ec52a
Revert "Add IBM Z to Travis configuration ( #42 )"
...
This reverts commit 7972beb375 .
2020-04-04 22:45:01 +02:00
Martin Kroeker
4ae6d1a01b
Add a Z13 build to the Travis configuration ( #2542 )
...
* Add IBM Z to Travis configuration
2020-04-03 16:02:11 +02:00
Martin Kroeker
7972beb375
Add IBM Z to Travis configuration ( #42 )
...
* Add IBM Z to Travis configuration
2020-04-03 15:59:18 +02:00
Baptiste Daroussin
41e802443a
libname: treat FreeBSD and DragonFly like linux and sunos
...
There is no difference in the way libnames are handle between FreeBSD
and linux or sunos. FreeBSD and DragonFly prefers having sonames as well
2020-04-03 06:20:42 +02:00
Martin Kroeker
7bd8624b79
Merge pull request #41 from xianyi/develop
...
rebase
2020-04-02 10:32:19 +02:00
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
...
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
f059e614eb
Merge pull request #2536 from martin-frbg/recurs
...
Add "recursive" option for LAPACK builds with ifort or pgfort as well
2020-04-01 20:00:13 +02:00
Martin Kroeker
e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well
2020-04-01 15:39:16 +02:00
Martin Kroeker
a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
...
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
...
Fix zero initialization for beta=0 case
2020-03-31 20:53:13 +02:00
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
2020-03-31 16:53:56 +02:00
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
...
use immediate initialization instead of multiplication in case register content is a NaN
2020-03-31 00:21:02 +02:00
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
...
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
Martin Kroeker
018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
...
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
2020-03-30 20:15:37 +02:00
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
2020-03-26 21:25:39 +01:00
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
2020-03-26 21:12:56 +01:00
Martin Kroeker
a986d42ea6
Merge pull request #39 from xianyi/develop
...
rebase
2020-03-26 21:06:51 +01:00
Martin Kroeker
b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
...
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
2020-03-24 15:44:46 +01:00
Martin Kroeker
0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
...
add ctest for drotm and modified ctest for drot.
2020-03-24 15:44:27 +01:00
Martin Kroeker
fe47dc8673
Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
...
related to #2526
2020-03-23 19:35:51 +01:00
Martin Kroeker
9f67d03d3b
Merge pull request #2527 from martin-frbg/gemmdirect
...
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-23 12:47:19 +01:00
shengyang
50f4fb2fbd
add ctest for drotm and modified ctest for drot.
...
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-23 10:47:05 +08:00
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-22 14:33:16 +01:00
Martin Kroeker
8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
...
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker
5f6f6a2c7d
Merge pull request #2525 from andreas-schwab/develop
...
Fix ARCHCONFIG for Neoverse-N1
2020-03-21 18:47:48 +01:00
Andreas Schwab
71cf2acdef
Fix ARCHCONFIG for Neoverse-N1
...
../config_kernel.h:24:9: warning: missing whitespace after the macro name
24 | #define ARMV8-march armv8.2-a
| ^~~~~
2020-03-21 17:35:42 +01:00
Martin Kroeker
1d9773b800
Use proper extension on the avx512 testcase filename
...
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker
a46a8c4956
Merge pull request #2518 from shengyang-3390/dev
...
add ctest for srotm and modified ctest for srot.
2020-03-20 23:00:06 +01:00
Martin Kroeker
7ae737e04c
Merge pull request #2519 from martin-frbg/issue2472
...
Fix cmake compilation with ifort on Windows
2020-03-20 22:57:44 +01:00
wjc404
64daad4365
Update param.h
2020-03-20 21:46:18 +00:00
wjc404
b8307768e2
Add files via upload
2020-03-21 05:42:10 +08:00
Martin Kroeker
6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
...
tentative fix for #2472
2020-03-20 01:08:10 +01:00
Martin Kroeker
c0da205412
Merge pull request #38 from xianyi/develop
...
rebase
2020-03-20 01:05:22 +01:00
shengyang
a06d78556d
add ctest for srotm and modified ctest for srot.
...
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-18 14:17:32 +08:00
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
...
Temporary fix for SKX STRSM
2020-03-17 10:12:53 +01:00
wjc404
62b9608986
Update KERNEL.SKYLAKEX
2020-03-17 12:52:55 +08:00
Martin Kroeker
717c604aeb
Merge pull request #2515 from zelong-1024/develop
...
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
2020-03-16 21:59:55 +01:00
Martin Kroeker
ce33da4cab
Merge pull request #2513 from aaawuanjun/develop
...
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
2020-03-16 21:58:55 +01:00
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
...
AVX2 STRSM kernels
2020-03-16 21:58:34 +01:00
wjc404
cdc0e9011e
Update KERNEL.ZEN
2020-03-16 16:39:37 +00:00
wjc404
fa049d49c2
AVX2 STRSM kernel
2020-03-17 00:34:08 +08:00
l00536773
d45c53ecf1
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
...
[description]: benchmark for her/her2
[solution]: added benchmark for her/her2, modified makefile in benchmark
[dts]:
2020-03-16 11:19:05 +08:00
Martin Kroeker
c2840997db
Merge pull request #2508 from liujingjue/develop
...
[OpenBLAS]:fix the iamax benchmark error
2020-03-14 14:21:30 +01:00
Martin Kroeker
c775458299
Merge pull request #2512 from martin-frbg/lapackh
...
Move declarations of lapack_complex_custom types outside the extern C
2020-03-14 13:27:40 +01:00
Martin Kroeker
c0649aa694
Merge pull request #2506 from xiaofengF/develop
...
Add benchmark for SPMV and fix segmentation fault when data size >= 50000
2020-03-14 13:08:36 +01:00
wuanjun 00447568
2428dc9fd3
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
...
[Description]: Solve lack of tpsv benchmark.
2020-03-14 09:11:08 +08:00
Martin Kroeker
0fe0914437
Merge pull request #2505 from aaawuanjun/develop
...
[OpenBlas]:Add benchmark tpmv.c and modify benchmark/Makefile
2020-03-13 23:16:35 +01:00
Martin Kroeker
4cfd8a3f57
Merge pull request #2511 from martin-frbg/fixppctest
...
Prevent attempts to run ctest or test when fortran is not available
2020-03-13 23:04:01 +01:00
Martin Kroeker
ee2e758278
Move declarations of lapack_complex_custom types outside the extern C
...
fixes #2510
2020-03-13 20:34:13 +01:00
Martin Kroeker
2d8781b0dc
Do not attempt to run test without fortran
2020-03-13 20:11:19 +01:00
Martin Kroeker
c436e8af7b
Do not attempt to run ctest without fortran
...
The main Makefile takes care of this in the build process, but users or CI jobs may try to run this directly
2020-03-13 20:10:26 +01:00
l00546269
a0a3bf7c81
[OpenBLAS]:fix the iamax benchmark error
...
[Description]:the result for i?amax is not MFlops, it is MBytes
2020-03-13 10:58:39 +08:00
jayfely@qq.com
ae3f2c2e49
Remove cspmv and zspmv to remove the error occured in travis CI
2020-03-11 17:02:34 +08:00
jayfely@qq.com
83ecf9fea7
Modify Makefile in interface to remove the error occured in travis CI
2020-03-11 16:36:45 +08:00
jayfely@qq.com
649733ff15
Only keep spmv.goto and spmv.atlas
2020-03-11 15:48:58 +08:00
wuanjun 00447568
3e8f1c6cc5
[OpenBlas]:Add benchmark tpmv.c and modify Makefile
...
[Description]:Solve the problem of missing tpmv.c benchmark file
2020-03-11 12:31:48 +08:00
jayfely@qq.com
2f4c5bb3a9
Update spmv.c: solve segmentation fault when m and n are larger than 50000
2020-03-11 10:30:09 +08:00
Martin Kroeker
4e1c4e67d4
Merge pull request #2503 from martin-frbg/xerbl
...
Apply fix for LAPACK issue 394 (fixed-form code beyond column 72)
2020-03-10 23:38:07 +01:00
Martin Kroeker
b9a2a3c540
Merge pull request #2502 from martin-frbg/issue2497
...
Fix INTERFACE64 not propagating to the fortran codes on ARMV8
2020-03-10 20:01:23 +01:00
Martin Kroeker
047dfb216d
Merge pull request #2501 from jijiwawa/Fix_mistakes
...
Fix pr #2487 error
2020-03-10 16:44:40 +01:00
s00527847
cd8871f1a1
Use the correct unit of measure
2020-03-10 19:26:06 -04:00
Martin Kroeker
b25ae1fc60
Apply fix for Reference-LAPACK issue 394
...
reference to XERBLA extending beyond column 72, breaking builds with compilers that default to traditional punch card format
2020-03-10 13:37:41 +01:00
Martin Kroeker
3f7f7ab7e2
Restore INTERFACE64 for arm64
2020-03-10 12:51:07 +01:00
Martin Kroeker
9c22170f52
Merge pull request #37 from xianyi/develop
...
rebase
2020-03-10 12:49:21 +01:00
jayfely@qq.com
08e1d8cbae
Modify Makefile in Benchmark
2020-03-10 14:32:18 +08:00
jayfely@qq.com
ff40a4e726
Add benchmark for SPMV
2020-03-10 14:22:18 +08:00
Zhang Xianyi
51019feae1
Merge pull request #2498 from njutcz/develop
...
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
2020-03-09 16:04:33 +08:00
s00548429
bec7923a0d
Fix the functional bugs for zamax.
2020-03-09 15:36:50 +08:00
s00548429
c5bdd21352
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
2020-03-09 14:59:03 +08:00
njutcz
d2d16d091e
Merge pull request #1 from xianyi/develop
...
update
2020-03-09 10:39:40 +08:00
Martin Kroeker
b6a6ccbbea
Merge pull request #2495 from ZuoQ3/develop
...
add benchmark for axpby test
2020-03-08 08:09:58 +01:00
Martin Kroeker
8b720f7365
Merge pull request #2494 from shengyang-3390/develop
...
add benchmark for csrot and zdrot
2020-03-07 23:04:21 +01:00
Martin Kroeker
14df234edb
Merge pull request #2489 from jijiwawa/brightness
...
Remove redundant code
2020-03-07 22:26:00 +01:00
s00527847
bbeda55b7b
add trmm.c
2020-03-07 13:09:19 -05:00
s00527847
efcf89aec7
Remove redundant code
2020-03-07 12:03:05 -05:00
Martin Kroeker
37d456f7e0
Merge pull request #2493 from martin-frbg/plainmake
...
Fix use of make vs $(MAKE) in building lapack-testing
2020-03-07 16:55:53 +01:00
Martin Kroeker
0b9e96922b
Merge pull request #2488 from liujingjue/develop
...
Modify the main Makefile in OpenBLAS
2020-03-07 16:52:29 +01:00
zq
0c8162eba6
Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby
2020-03-07 17:48:55 +08:00
zq
9a94a30132
Merge pull request #1 from xianyi/develop
...
update
2020-03-07 17:04:59 +08:00
shengyang
09c7a191bd
add benchmark for csrot and zdrot
...
modified: benchmark/Makefile
modified: benchmark/rot.c
2020-03-07 15:17:49 +08:00
l00546269
8a8df530e2
[OpenBLAS]:modifed the Makefile
...
[Description]: check the compiler version and show the detail info
2020-03-07 10:14:33 +08:00
Martin Kroeker
37f46f2fa0
Fix another spot where make was used instead of $(MAKE)
...
Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax
2020-03-06 15:37:26 +01:00
Martin Kroeker
9afc561be4
Merge pull request #36 from xianyi/develop
...
rebase
2020-03-06 15:32:27 +01:00
Martin Kroeker
dca3e0cf20
Merge pull request #2491 from chenxuqiang/hbmv_benchmark
...
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
2020-03-06 15:06:42 +01:00
Martin Kroeker
c9f8db979b
Merge pull request #2490 from shengyang-3390/develop
...
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
2020-03-06 15:05:55 +01:00
Martin Kroeker
18099de976
Merge pull request #2487 from jijiwawa/develop
...
add benchmark for spr/spr2
2020-03-06 14:42:25 +01:00
Martin Kroeker
97c36ca58c
Merge branch 'develop' into develop
2020-03-06 14:41:40 +01:00
Martin Kroeker
9f5a74f3c7
Merge pull request #2486 from qqqil/develop
...
add benchmark for trsv
2020-03-06 14:30:09 +01:00
Martin Kroeker
2afb10975d
Merge pull request #2485 from Darkness303/develop
...
Add syr2 benchmark
2020-03-06 14:29:27 +01:00
Martin Kroeker
dbef479227
Merge pull request #2469 from AGSaidi/acq-rel-2
...
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
2020-03-06 14:28:58 +01:00
Ali Saidi
208c7e7ca5
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
...
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
chenxuqiang
32c847df45
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
...
Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
2020-03-06 01:02:02 -05:00
shengyang
e0df9485d4
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
...
modified: benchmark/Makefile
new file: benchmark/rotm.c
2020-03-05 10:05:59 +08:00
s00527847
0f1a2b12f9
add benchmark for spr/spr2
2020-03-04 15:50:19 -05:00
q00437336
233838b4bc
change clock to CLOCK_PROCESS_CPUTIME_ID
2020-03-04 03:54:40 -05:00
l00546269
13f9afbd99
[OpenBLAS]:modifed the Makefile
...
[Description]:add c/fortran compiler version information in final note
2020-03-04 16:47:23 +08:00
q00437336
de74e11641
add benchmark for trsv
2020-03-04 03:23:22 -05:00
Martin Kroeker
ad9e53154d
Merge pull request #2484 from RajalakshmiSR/power-dynamic
...
Fix DYNAMIC_ARCH build for POWER9
2020-03-04 08:06:06 +01:00
Martin Kroeker
6e70621b0d
Merge pull request #2483 from aaawuanjun/develop
...
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-04 07:59:56 +01:00
Martin Kroeker
e6edb7431f
Merge pull request #2466 from AGSaidi/acq-rel-1
...
Switch blas_server to use acq/rel semantics
2020-03-04 07:59:31 +01:00
Darkness303
114dbec947
1.Add syr2 benchmark
...
2.Fixed some errors
2020-03-04 14:09:10 +08:00
Martin Kroeker
d68e4ba59b
Fix cut/paste glitch
2020-03-03 21:37:48 +01:00
Martin Kroeker
635c9e4e09
Restore initializers for mutex and conditional
2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan
2afc074803
Fix DYNAMIC_ARCH build for POWER9
...
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
wuanjun 00447568
5d6c688a7e
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:57 +08:00
wuanjun 00447568
87baf9cfe6
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:28 +08:00
wuanjun 00447568
c0ca7d6258
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 17:39:26 +08:00
wuanjun 00447568
f682d19ed4
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:37:33 +08:00
wuanjun 00447568
790d50fbba
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:13:49 +08:00
Martin Kroeker
59243d49ab
Merge pull request #2479 from Darkness303/develop
...
Fix potential index overflows at large matrix sizes in the benchmark codes
2020-03-03 08:46:49 +01:00
Martin Kroeker
d41f83e128
Merge pull request #2436 from marxin/improve-utest-coverage
...
Improve test coverage for utests.
2020-03-03 08:43:00 +01:00
Martin Kroeker
ee4ca7ca6b
Merge pull request #2481 from ChinouneMehdi/fix2480
...
Fix #2480
2020-03-02 21:21:29 +01:00
Martin Kroeker
e326c89ae8
Merge pull request #2478 from MacChen02/develop
...
Update benchmark statistical time function
2020-03-02 21:20:51 +01:00
مهدي شينون (Mehdi Chinoune)
21f6c4b5a9
fixes #2480
2020-03-02 17:22:28 +01:00
Martin Liska
7ca4ffdbdd
Improve test coverage for utests.
2020-03-02 13:38:17 +01:00
jianghesong
0f65c05cd1
fix core dumped error
2020-03-02 19:13:45 +08:00
MacChen02
917d243580
Update benchmark statistical time function
...
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
2020-03-02 14:36:27 +08:00
Ali Saidi
43c2e845ab
Switch blas_server to use acq/rel semantics
...
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
2020-03-02 02:52:49 +00:00
Martin Kroeker
33f76a6c37
Version 0.3.9
2020-03-02 00:10:20 +01:00
Martin Kroeker
960dec234f
Version 0.3.9
2020-03-02 00:09:49 +01:00
Martin Kroeker
6b92979f35
Merge pull request #2476 from xianyi/develop
...
Update from develop in preparation for 0.3.9
2020-03-02 00:08:32 +01:00
Martin Kroeker
014fc13995
Merge pull request #2475 from martin-frbg/039changes
...
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker
f1e05676a0
Merge pull request #2474 from martin-frbg/p9be
...
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker
d221c50f27
Add Ampere EMAG8180
2020-03-02 00:02:36 +01:00
Martin Kroeker
f14013da7f
Update with 0.3.9 changes
2020-03-02 00:01:22 +01:00
Martin Kroeker
4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now
2020-03-01 23:45:58 +01:00
Martin Kroeker
02d60c1563
Merge pull request #35 from xianyi/develop
...
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker
2e6963259b
Merge pull request #2471 from AGSaidi/l3-fix-2
...
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker
e94590e400
Merge pull request #2468 from AGSaidi/wfe
...
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker
69d4687142
Merge pull request #2464 from Darkness303/develop
...
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker
a731a9bbb9
Merge pull request #2467 from AGSaidi/rpcc
...
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker
1aa5907a2c
Merge pull request #2463 from martin-frbg/mingwfix
...
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
...
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
97ce6bbce2
Fix barriers in level3_thread
2020-02-29 17:45:17 +00:00
Martin Kroeker
a9aeb6745c
Merge pull request #2465 from AGSaidi/neoverse-n1
...
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi
0af9991cc9
Use wait-for-event to not spin in the blas_lock
2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c
Make rpcc() on arm64 get closer to what x86 returns
...
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi
c623a965f9
Add Neoverse-N1 core
...
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
j00520245
e1062400c4
New add syr benchmark
2020-02-28 16:36:53 +08:00
Martin Kroeker
a66f4d80c8
Apply MinGW AVX512 compilation fix to fortran options as well
...
original issue was #1708 , I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
2020-02-27 23:09:40 +01:00
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
2020-02-27 22:26:15 +08:00
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
2020-02-27 22:25:19 +08:00
Martin Kroeker
430ee31e66
Merge pull request #2447 from martin-frbg/issue2446
...
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
2020-02-27 15:07:02 +01:00
Martin Kroeker
8164fd1328
Always assume server-class cpu count for TSV110 and EMAG8180
2020-02-26 22:19:57 +01:00
Martin Kroeker
531c6b96d6
Merge pull request #34 from xianyi/develop
...
rebase
2020-02-26 22:16:28 +01:00
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
2020-02-26 18:38:12 +08:00
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
2020-02-26 18:36:54 +08:00
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
...
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
2020-02-25 18:42:52 +01:00
Martin Kroeker
f8ec538c82
Add Ampere EMAG8180
2020-02-25 14:30:00 +01:00
Martin Kroeker
ca4f7dceff
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake
2020-02-24 20:23:18 +01:00
Martin Kroeker
1ddf9f1067
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake
2020-02-24 20:16:18 +01:00
Martin Kroeker
4c5fac5a2b
Typo fix
2020-02-24 20:15:04 +01:00
Martin Kroeker
320e2648cd
Add EMAG8180 to DYNAMIC_CORE list for ARM64
2020-02-24 19:23:46 +01:00
Martin Kroeker
9b732696c6
Add DYNAMIC_ARCH support for ARMV8 EMAG8180
2020-02-24 19:20:00 +01:00
Martin Kroeker
c9dcb3d4a4
Merge pull request #2443 from aaawuanjun/develop
...
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 13:14:51 +01:00
Martin Kroeker
3bb7f0138e
Merge pull request #2442 from martin-frbg/lapackpr390
...
Apply fix from Reference-LAPACK PR 390
2020-02-24 12:27:01 +01:00
wuanjun 00447568
c93ae92579
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 11:23:39 +08:00
Martin Kroeker
87ac1ceb0b
Apply fix from Reference-LAPACK PR390, NaN not propagating
2020-02-23 22:40:40 +01:00
Martin Kroeker
9e40c080f2
Apply fix from Reference-LAPACK PR390, NaN not propagating
2020-02-23 22:39:01 +01:00
wjc404
903854c168
Add files via upload
2020-02-22 23:40:02 +08:00
wjc404
a2ff577a30
Update KERNEL.ZEN
2020-02-22 23:39:43 +08:00
wjc404
97a32cb0a5
Update KERNEL.HASWELL
2020-02-22 23:39:20 +08:00
wjc404
f1746e7284
Delete sgemm_kernel_8x4_haswell_2.c
2020-02-22 23:38:48 +08:00
wjc404
f6fcbd7906
Fix performance bug when LDC is a multiple of 1024
2020-02-22 23:37:45 +08:00
Martin Kroeker
1e8410f18c
Merge pull request #2441 from martin-frbg/ismin2
...
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
2020-02-22 11:21:03 +01:00
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
...
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:58:15 +01:00
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
...
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:55:52 +01:00
Martin Kroeker
75577f95a7
Merge pull request #33 from xianyi/develop
...
rebase
2020-02-21 09:56:05 +01:00
Martin Kroeker
33d92c7a37
Merge pull request #2435 from martin-frbg/issue2433
...
Fix handling of ppc endianness
2020-02-21 00:01:58 +01:00
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
2020-02-19 19:00:28 +01:00
Martin Kroeker
71e5669c3e
Add preliminary support for EMAG8180 ARMV8 processor
2020-02-19 18:57:26 +01:00
Martin Kroeker
e8d82c01d4
Recognize Ampere EMAG8180
2020-02-19 18:49:13 +01:00
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
2020-02-19 18:09:54 +01:00
Martin Kroeker
76b2cec6ce
Get endianness into Makefile variable
2020-02-19 18:08:20 +01:00
Martin Kroeker
276c1791ea
Merge pull request #32 from xianyi/develop
...
rebase
2020-02-19 18:06:39 +01:00
Martin Kroeker
c5bbfd8fee
Merge pull request #2432 from isuruf/install_name
...
Fix install name on osx again
2020-02-19 08:14:28 +01:00
Isuru Fernando
130c1741e5
Fix install name on osx again
2020-02-18 10:22:49 -08:00
Martin Kroeker
8f782f0673
Merge pull request #2426 from zbeekman/nightly-homebrew-check
...
Nightly homebrew check
2020-02-18 12:09:15 +01:00
Martin Kroeker
6a517dcb6a
Merge pull request #2427 from martin-frbg/powermin
...
Fix ISMIN and ISMAX kernel choices for POWER8
2020-02-18 08:15:02 +01:00
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
...
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
2020-02-17 19:55:39 +01:00
Izaak Beekman
1a88c4ab26
Fix bottle upload problem & typo
2020-02-17 13:36:17 -05:00
Izaak Beekman
0b44802164
Test push & PRs only when workflow file changes
...
Also, add comments to clarify what the test is testing
2020-02-17 13:22:09 -05:00
Izaak Beekman
2c242b4cef
Add Github Action to build development branch nightly with Homebrew
2020-02-17 12:36:37 -05:00
Martin Kroeker
0bfb7336d2
Merge pull request #2424 from isuruf/osx
...
Fix building on osx
2020-02-17 17:00:08 +01:00
Martin Kroeker
403cde104e
Merge pull request #30 from xianyi/develop
...
rebase
2020-02-17 14:53:46 +01:00
Martin Kroeker
634f2bddda
Merge pull request #2414 from marxin/fix-iamax_sse-implementation
...
Fix iamax sse implementation and add utests
2020-02-17 14:50:18 +01:00
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
2020-02-17 09:01:53 +01:00
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
...
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
2020-02-17 09:01:53 +01:00
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
2020-02-17 09:01:53 +01:00
Martin Kroeker
79e201fbba
Merge pull request #2423 from xianyi/issue2419
...
Restore -march flag for Android builds
2020-02-17 07:24:02 +01:00
Isuru Fernando
4326dcb460
Pass CFLAGS from env to Makefile.prebuild and remove iOS hack
2020-02-16 15:13:01 -06:00
Martin Kroeker
e32f3b1447
Restore -march flag for Android builds
...
fixes #2419 - renewed discussion in #2112 suggests removal of the option was primarily aimed at non-Android builds
2020-02-16 17:32:13 +01:00
Martin Kroeker
d483e9270a
Update KERNEL.POWER8
2020-02-16 17:29:35 +01:00
Martin Kroeker
01834aee33
Merge pull request #29 from xianyi/develop
...
rebase
2020-02-16 17:28:10 +01:00
wjc404
b0558c11b9
Update param.h
2020-02-16 23:01:31 +08:00
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
2020-02-16 22:58:44 +08:00
wjc404
e3368cbf18
AVX512 STRMM kernel
2020-02-16 22:58:00 +08:00
Martin Kroeker
d92bd5be24
Update KERNEL.POWER8
2020-02-15 23:07:50 +01:00
Martin Kroeker
46e4b12946
Update KERNEL.POWER8
2020-02-15 23:06:51 +01:00
Martin Kroeker
5e94aa4877
Merge pull request #2417 from marxin/make-ctest-verbose-for-drone
...
Make ctest verbose for drone
2020-02-15 21:57:41 +01:00
Martin Kroeker
93f3e27574
Merge pull request #2415 from marxin/add-cmake-to-gitignore
...
Add CMake related files to .gitignore.
2020-02-15 21:57:03 +01:00
Martin Kroeker
785c389b0e
Merge pull request #2420 from martin-frbg/issue2396
...
Correct generation of GETRF files by the CMAKE build
2020-02-15 21:56:16 +01:00
Martin Kroeker
c222b25b81
Correct generation of GETRF files by the CMAKE build
...
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
221da8bf05
Merge pull request #2411 from martin-frbg/fix2254-038
...
Fix pre-processed POWER8 codes and wrong conditionals in the POWER8,PPC440 and PPC970 KERNEL files
2020-02-14 23:07:43 +01:00
Martin Liska
eb285b4d20
Make ctest verbose for drone builder.
2020-02-14 10:46:37 +01:00
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
2020-02-13 22:44:09 +01:00
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
2020-02-13 21:24:54 +01:00
Martin Kroeker
486c35c5dc
Update icamin_power8.S
2020-02-13 18:38:43 +01:00
Martin Liska
0e05ea9bac
Add CMake related files to .gitignore.
2020-02-13 14:51:55 +01:00
Martin Kroeker
5ba3699f41
Update isamin_power8.S
2020-02-13 00:00:32 +01:00
Martin Kroeker
8eefa530cd
Update isamax_power8.S
2020-02-12 23:59:50 +01:00
Martin Kroeker
de40d47edf
Update isamin_power8.S
2020-02-12 23:57:48 +01:00
Martin Kroeker
7c162b8a21
Update isamax_power8.S
2020-02-12 23:56:57 +01:00
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
2020-02-12 20:00:29 +01:00
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
2020-02-12 19:58:42 +01:00
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
2020-02-12 19:56:52 +01:00
Martin Kroeker
616921fd91
Merge pull request #27 from xianyi/develop
...
rebase
2020-02-12 19:16:14 +01:00
Martin Kroeker
8a9e9a82a1
Merge pull request #2410 from bartoldeman/fix-dscal-inline-asm
...
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
2020-02-12 15:38:37 +01:00
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
...
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
2020-02-12 14:11:44 +00:00
Martin Kroeker
cb6ef49857
Merge pull request #2407 from susilehtola/patch-2
...
Patch out instances of Z15 in dynamic_zarch.c
2020-02-11 13:04:44 +01:00
Martin Kroeker
63994e1cdb
Merge pull request #2405 from susilehtola/patch-1
...
Fix typo in dynamic_zarch.c
2020-02-11 13:03:35 +01:00
Martin Kroeker
496e3019bc
Merge pull request #2404 from martin-frbg/issue2395
...
Fix spurious application of USE_TRMM in cmake builds
2020-02-11 13:00:36 +01:00
Martin Kroeker
169be3f097
Merge pull request #2403 from martin-frbg/issue2400
...
Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont
2020-02-11 13:00:16 +01:00
Martin Kroeker
6ccbb089c2
Merge pull request #2402 from gxw-loongson/develop
...
Avoid printing the following information on mips and mips64 when check msa
2020-02-11 12:59:53 +01:00
Martin Kroeker
59ebe3636a
Merge pull request #2399 from martin-frbg/buffersize
...
Make BUFFER_SIZE configurable at build time
2020-02-11 12:56:56 +01:00
Susi Lehtola
5a6bba3061
Patch out instances of Z15 in dynamic_zarch.c
...
There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.
2020-02-11 15:07:33 +13:00
Susi Lehtola
dff173e50e
Fix typo in dynamic_zarch.c
2020-02-11 14:46:30 +13:00
Martin Kroeker
7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM
2020-02-10 21:17:39 +01:00
Martin Kroeker
303bdb673b
Fix coretype detection for Intel extended models 6 and 7
...
affecting Goldmont, Cannon Lake, Ice Lake autodetection
2020-02-10 19:17:32 +01:00
gxw
754433f420
Avoid printing the following information on mips and mips64 when check msa:
...
"unrecognized command line option ‘-mmsa’"
2020-02-10 19:11:45 +08:00
Martin Kroeker
7f0d523b42
Make BUFFER_SIZE configurable
2020-02-09 23:32:57 +01:00
Martin Kroeker
c353d8b106
Make BUFFER_SIZE configurable
2020-02-09 23:30:22 +01:00
Martin Kroeker
579be3aa9d
Add configuration option for BUFFER_SIZE
2020-02-09 23:28:04 +01:00
Martin Kroeker
449e8ea443
Merge pull request #26 from xianyi/develop
...
rebase
2020-02-09 23:23:55 +01:00
Martin Kroeker
3bec250cf9
Increment version to 0.3.9.dev
2020-02-09 23:18:44 +01:00
Martin Kroeker
f03dd23e90
Increment version to 0.3.9.dev
2020-02-09 23:18:07 +01:00
Martin Kroeker
fb5eb47558
Merge pull request #2398 from xianyi/develop
...
Update from develop in preparation of the 0.3.8 release
2020-02-09 23:16:28 +01:00
Martin Kroeker
fa93d63365
Merge branch 'release-0.3.0' into develop
2020-02-09 23:16:06 +01:00
Martin Kroeker
90e6c66a57
Merge pull request #2397 from martin-frbg/038changes
...
Update Changelog with changes from 0.3.8
2020-02-09 23:01:52 +01:00
Martin Kroeker
32d97330b3
Update with changes from 0.3.8
2020-02-09 23:00:36 +01:00
Martin Kroeker
29eaf4b6d7
Merge pull request #25 from xianyi/develop
...
rebase
2020-02-09 22:48:15 +01:00
Martin Kroeker
47c1bf7f4d
typo fixes
2020-02-09 01:06:40 +01:00
Martin Kroeker
2b55f0ad30
Merge pull request #2393 from martin-frbg/issue2388
...
Provide more documentation in README.md
2020-02-09 01:00:33 +01:00
Martin Kroeker
a5b32ab06c
Merge pull request #2390 from martin-frbg/pgi
...
Small corrections for compilation with PGI compilers
2020-02-09 00:13:40 +01:00
Martin Kroeker
50545b19d0
Update CPU and OS support and document DYNAMIC_ARCH option in README.md
...
prompted by #2388
2020-02-09 00:06:07 +01:00
Martin Kroeker
b3cbd60d7a
Remove PGI from list again as it is actually still not capable
2020-02-08 10:20:13 +01:00
Martin Kroeker
70199d1905
Merge pull request #2389 from Zeyiii/develop
...
Fix bugs in benchmark of gemv
2020-02-07 16:05:46 +01:00
Martin Kroeker
cfe63d8cc2
Remove OpenMP libraries from link list
2020-02-07 16:03:51 +01:00
Martin Kroeker
d55b10830f
Remove OpenMP libraries from link list
2020-02-07 16:02:17 +01:00
Martin Kroeker
c1c10cbb21
Merge pull request #2384 from wjc404/develop
...
Optimize AVX512 DGEMM (& DTRMM)
2020-02-07 13:47:12 +01:00
Martin Kroeker
5989841524
Add PGI to avx512-supporting compilers
2020-02-07 13:01:31 +01:00
Martin Kroeker
68a43db358
Fix utest compilation with PGI
2020-02-07 10:15:18 +01:00
Martin Kroeker
9694037b23
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test
2020-02-07 10:09:25 +01:00
Martin Kroeker
71faa1c1a7
Merge pull request #24 from xianyi/develop
...
rebase
2020-02-07 10:03:02 +01:00
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 02:14:10 +00:00
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
2020-02-06 01:47:46 +00:00
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 01:46:36 +00:00
w00421467
ce9ea8f826
Fix another branch
2020-02-05 15:07:18 +08:00
w00421467
0b909203cb
Fix bugs in benchmark of gemv
2020-02-05 14:53:37 +08:00
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
2020-02-05 13:36:57 +08:00
wjc404
2f96a2c55b
Update trmm_R.c
2020-02-05 10:15:02 +08:00
wjc404
833bd0f8ff
Update trmm_L.c
2020-02-05 10:09:41 +08:00
wjc404
77b8f49556
Update level3_thread.c
2020-02-04 20:33:08 +08:00
wjc404
1c3e20ce48
Update level3.c
2020-02-04 20:30:23 +08:00
wjc404
83b6be7976
Update param.h
2020-02-04 19:55:26 +08:00
wjc404
081b188529
Update KERNEL.SKYLAKEX
2020-02-03 21:38:08 +08:00
wjc404
f3f969f681
Update param.h
2020-02-03 21:34:12 +08:00
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
2020-02-03 21:32:56 +08:00
Martin Kroeker
8d2a796f49
Merge pull request #2378 from martin-frbg/issue2377
...
Add -march option for AVX512 in cmake as well
2020-01-30 17:07:19 +01:00
Martin Kroeker
8dc9fd4dfe
Add -march option for AVX512
2020-01-30 12:41:18 +01:00
Martin Kroeker
abc67bdd74
Merge pull request #2375 from ewanglong/master
...
fix a few performance drop in some matrix size per data type
2020-01-30 10:27:29 +01:00
Martin Kroeker
1f62a82789
Merge pull request #2376 from wjc404/develop
...
Fix remaining bugs in parallel GEMM3M
2020-01-23 21:50:19 +01:00
wjc404
e9fb8f62b1
Update level3_gemm3m_thread.c
2020-01-22 17:40:03 +00:00
Wang,Long
fbf4f48f4a
fix a few performance drop in some matrix size per data type
...
Signed-off-by: Wang,Long <long1.wang@intel.com >
2020-01-22 15:15:04 +00:00
Martin Kroeker
b9ad450295
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
...
Optimize genenal Gemm Beta
2020-01-21 15:05:38 +01:00
Martin Kroeker
e011ad820a
Merge pull request #2372 from martin-frbg/winexit
...
Do not run any cleanup if the program is exiting anyway
2020-01-21 14:56:45 +01:00
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
2020-01-20 11:49:42 +08:00
Martin Kroeker
23f322f997
Do not run any cleanup if the program is exiting anyway
...
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
2020-01-19 13:28:27 +01:00
Martin Kroeker
093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
...
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
2020-01-18 20:39:34 +01:00
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
...
fix error messages
2020-01-18 20:39:04 +01:00
Martin Kroeker
78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
...
as suggested by hjmndv in #2370
2020-01-18 15:06:39 +01:00
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
...
Improve paralleled SGEMM performance on SKYLAKEX CPUs
2020-01-15 21:13:43 +01:00
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
2020-01-13 16:59:23 +08:00
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
2020-01-13 16:58:54 +08:00
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
...
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
wjc404
1c67567008
improve skylakex paralleled sgemm performance
2020-01-13 16:26:03 +08:00
Martin Kroeker
4e979bf75b
Merge pull request #2366 from martin-frbg/install390
...
Add new file lapack.h from LAPACK 3.9.0 to installable headers
2020-01-13 09:00:21 +01:00
Martin Kroeker
daa4310db5
Install new lapack.h
...
new file in LAPACK 3.9.0, split off from lapacke.h
2020-01-12 22:00:50 +01:00
Martin Kroeker
b8f3605132
Merge pull request #23 from xianyi/develop
...
rebase
2020-01-12 21:57:23 +01:00
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
...
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
2020-01-09 13:48:41 +08:00
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
...
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
2020-01-07 11:22:46 +08:00
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
2020-01-06 20:11:36 +08:00
wjc404
9f5cdc49d4
Update CONTRIBUTORS.md
2020-01-06 12:28:43 +08:00
wjc404
b7b408a120
optimize AVX2 SGEMM
2020-01-06 12:16:09 +08:00
wjc404
92b10212de
optimize AVX2 SGEMM
2020-01-06 12:11:21 +08:00
wjc404
b73bf01378
optimize AVX2 SGEMM
2020-01-06 12:09:14 +08:00
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
2020-01-06 12:07:02 +08:00
Martin Kroeker
fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
...
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
2020-01-03 15:03:30 +01:00
Martin Kroeker
2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
...
from Reference-LAPACK PR 330
2020-01-03 11:10:00 +01:00
Martin Kroeker
fbb894948c
Merge pull request #22 from xianyi/develop
...
rebase
2020-01-03 10:23:25 +01:00
Martin Kroeker
e711659c90
Merge pull request #2358 from shengyang-3390/develop
...
Test all 7 declared values of N in float and double cblas3 tests
2020-01-03 09:02:03 +01:00
shengyang
893e6e57c4
modified: ctest/din3 ctest/sin3
2020-01-03 10:03:33 +08:00
Martin Kroeker
456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
...
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
Martin Kroeker
9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
...
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
2020-01-02 22:27:44 +01:00
shengyang
80db5f11e1
update
2020-01-02 11:01:57 +08:00
chenxuqiang
52de4cc8fd
kernel/arm64/dgemm_beta.S: add beta == zero branch
...
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com >
2020-01-01 21:50:45 -05:00
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
...
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
...
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00
Martin Kroeker
375b1875c8
[WIP] Update LAPACK to 3.9.0 ( #2353 )
...
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
...
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
...
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
...
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
...
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
shengyang
8729db117c
modified: ctest/din3
...
modified: ctest/sin3
2019-12-31 15:59:52 +08:00
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
2019-12-31 10:42:03 +08:00
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
2019-12-31 10:21:23 +08:00
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
2019-12-31 10:13:24 +08:00
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
2019-12-30 17:33:42 +08:00
wjc404
700fe5b5ee
Add files via upload
2019-12-30 17:18:59 +08:00
wjc404
bb2729c855
Update CONTRIBUTORS.md
2019-12-30 16:11:37 +08:00
wjc404
aae44d040d
Update CONTRIBUTORS.md
2019-12-30 16:10:08 +08:00
wjc404
6362c34ee6
Update param.h
2019-12-30 16:08:19 +08:00
wjc404
f60840c420
Update KERNEL.ZEN
2019-12-30 16:04:23 +08:00
wjc404
109e18cd96
Update KERNEL.HASWELL
2019-12-30 16:03:24 +08:00
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
2019-12-30 16:02:51 +08:00
w00421467
3ccf8885ac
prefetching for dgemm_beta
2019-12-30 11:45:49 +08:00
Martin Kroeker
454847588e
Update LAPACK to 3.9.0
2019-12-29 21:27:18 +01:00
Martin Kroeker
0257f26488
Merge pull request #21 from xianyi/develop
...
rebase
2019-12-29 18:08:55 +01:00
Martin Kroeker
c45b7aef14
Merge pull request #2348 from wjc404/develop
...
AVX2 CGEMM3M kernel
2019-12-28 20:07:56 +01:00
wjc404
312060d0d6
Update CONTRIBUTORS.md
2019-12-27 23:36:13 +08:00
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:23:29 +08:00
wjc404
64639f440f
Update param.h
2019-12-27 18:06:42 +08:00
wjc404
3a66c8cac1
Update KERNEL.ZEN
2019-12-27 18:04:08 +08:00
wjc404
4c35b8dbaa
Update gemm3m_level3.c
2019-12-27 18:03:01 +08:00
wjc404
ed9af2f7da
Update KERNEL.HASWELL
2019-12-27 18:01:38 +08:00
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:00:55 +08:00
Martin Kroeker
26478eb0d0
Merge pull request #2345 from wjc404/develop
...
Optimize AVX2 CGEMM
2019-12-25 22:26:41 +01:00
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
2019-12-24 00:40:16 +08:00
wjc404
3ce6bcdb5f
Update CONTRIBUTORS.md
2019-12-24 00:30:16 +08:00
wjc404
6fbe51072b
Update CONTRIBUTORS.md
2019-12-24 00:24:40 +08:00
wjc404
611445c7f8
Update param.h
2019-12-23 23:44:55 +08:00
wjc404
2cd9306bb5
Update KERNEL.ZEN
2019-12-23 23:42:30 +08:00
wjc404
c418c81224
Update KERNEL.HASWELL
2019-12-23 23:41:44 +08:00
wjc404
025741f16a
Fast Haswell CGEMM kernel
2019-12-23 23:40:03 +08:00
Martin Kroeker
0ae49d2990
Merge pull request #2344 from wjc404/develop
...
Optimize AVX2 ZGEMM
2019-12-21 12:16:55 +01:00
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
2019-12-21 14:38:51 +08:00
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
2019-12-21 14:37:06 +08:00
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
2019-12-21 14:35:15 +08:00
Martin Kroeker
31d6c2eb7d
Merge pull request #2340 from Zeyiii/develop
...
[WIP] Use arm neon instructions to optimize gemm beta operation
2019-12-20 08:38:57 +01:00
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
2019-12-20 10:11:50 +08:00
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
2019-12-17 10:00:13 +08:00
Martin Kroeker
445ca2f418
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
...
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan
13226e3101
driver: more reasonable thread wait timeout on Windows.
...
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.
Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
...
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
...
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429
Support two-digit version numbers in gcc version check
...
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7
AIX changes for Power8
2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC ( #2332 )
...
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat
3938e59569
AIX changes for Power8
2019-12-04 00:23:46 -06:00
Martin Kroeker
9d5079008f
Merge pull request #2334 from martin-frbg/fix2228
...
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker
3518617f5b
Add Intel Goldmont+ cpuid
...
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00
Martin Kroeker
715f4650d9
Delete stray copy of dynamic.c from PR 2228
2019-12-03 08:24:10 +01:00
Martin Kroeker
10705183ce
Merge pull request #20 from xianyi/develop
...
Rebase
2019-12-03 08:22:40 +01:00
Martin Kroeker
235599f17a
Merge pull request #2329 from isuruf/patch-1
...
Workaround an ICE in clang 9.0.0
2019-12-02 08:30:43 +01:00
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
...
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker
dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
...
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker
f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
...
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker
dedd822d1a
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:56:57 +01:00
Martin Kroeker
2181fb7047
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:54:15 +01:00
Martin Kroeker
a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x
2019-11-29 23:49:50 +01:00
Martin Kroeker
97762234f9
Add variable for gcc >=9 test
...
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
Martin Kroeker
948d11fc51
Merge pull request #19 from xianyi/develop
...
rebase
2019-11-29 23:44:09 +01:00
Martin Kroeker
c815b8fb85
Merge pull request #2323 from wjc404/develop
...
some optimizations of AVX512 DGEMM
2019-11-28 20:55:16 +01:00
wjc404
e20709e976
Update param.h
2019-11-28 19:57:50 +08:00
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
2019-11-28 19:56:35 +08:00
Martin Kroeker
a4c3668f99
Merge pull request #2321 from martin-frbg/issue2319
...
Fix race conditions in multithreaded GEMM3M
2019-11-28 09:30:24 +01:00
Martin Kroeker
867232c6a4
Merge pull request #2327 from martin-frbg/travisosx
...
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
2019-11-28 08:43:45 +01:00
Martin Kroeker
5aaf70ef95
Merge pull request #2326 from xianyi/revert-2325-travisosx
...
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:17:19 +01:00
Martin Kroeker
ae2a0995cc
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker
83dae28ae2
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:09:06 +01:00
Martin Kroeker
da986d2e83
Merge pull request #2325 from martin-frbg/travisosx
...
Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now
2019-11-27 21:59:36 +01:00
Martin Kroeker
6bc487de35
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-27 15:10:57 +01:00
Anton Blanchard
cf2a8e410c
Fix SEGV in cdot_power9
...
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00
wjc404
eb1e9c8c92
some optimizations
2019-11-26 14:12:20 +08:00
Martin Kroeker
f95989cbc1
Fix AVX512 capability test (always returning zero)
...
from #2322
2019-11-23 22:38:07 +01:00
Martin Kroeker
f3065a0eed
Fix race conditions in multithreaded GEMM3M
...
by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago
2019-11-23 19:54:56 +01:00
Martin Kroeker
04226f1e97
Add the cpuid of the business/rackmount version of z15 as well
2019-11-21 18:14:29 +01:00
Martin Kroeker
0925ef70db
Merge pull request #2316 from sharkcz/s390x
...
zarch: treat z15 as z14 instead of generic
2019-11-21 18:03:00 +01:00
Martin Kroeker
371e6f73d4
Merge pull request #2317 from aarnez/develop
...
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
2019-11-21 17:59:21 +01:00
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
...
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
2019-11-21 13:49:13 +01:00
Dan Horák
883c39773a
zarch: treat z15 as z14 instead of generic
...
Signed-off-by: Dan Horák <dan@danny.cz >
2019-11-21 12:53:23 +01:00
Martin Kroeker
b09b5be0a4
Merge pull request #2315 from ewanglong/develop
...
revised fix windows compatible for #2313
2019-11-21 05:06:44 +01:00
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
...
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-21 10:22:58 +08:00
Martin Kroeker
3da6d66da9
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
...
Fix usage of TerminateThread() causing critical section corruption.
2019-11-20 16:16:35 +01:00
Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
...
Further Power8 big-endian corrections
2019-11-20 15:12:06 +01:00
Martin Kroeker
63d3ee8dfc
Merge pull request #2313 from ewanglong/develop
...
Fix the integer overflow issue for large matrix size
2019-11-20 14:49:15 +01:00
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
...
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-20 21:30:47 +08:00
Jehan
1f6071590d
Fix usage of TerminateThread() causing critical section corruption.
...
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:
First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread )
Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).
And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
2019-11-20 13:00:49 +01:00
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
...
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-20 14:11:17 +08:00
Martin Kroeker
73128f3883
Merge pull request #2310 from martin-frbg/ppc440
...
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
2019-11-17 23:19:48 +01:00
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
2019-11-17 23:12:10 +01:00
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
2019-11-17 22:58:32 +01:00
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
2019-11-17 19:25:08 +01:00
Martin Kroeker
82b75f97e5
Disable the old QCDOC qalloc by default and copy utility functions from memory.c
...
1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.
2019-11-17 19:22:04 +01:00
Martin Kroeker
7887c45077
Merge pull request #17 from xianyi/develop
...
rebase
2019-11-17 19:09:49 +01:00
Martin Kroeker
3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be
...
Fix PPC970 big-endian support
2019-11-17 18:22:24 +01:00
Martin Kroeker
b3ac6ee222
Define alternate kernels for big-endian PPC970
...
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
2019-11-17 15:19:39 +01:00
Martin Kroeker
6082e556cd
Use "generic" S/CGEMM unroll M on big-endian PPC970
...
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
2019-11-17 15:10:26 +01:00
Martin Kroeker
92315173d5
Merge pull request #2308 from martin-frbg/ctestfix
...
Fix potential issue in the c/z blas3 ctests
2019-11-15 08:33:17 +01:00
Martin Kroeker
351d12b94e
Fix potential spurious failure from uninitialized variable
2019-11-15 00:20:36 +01:00
Martin Kroeker
bf73aa141b
Fix potential spurious failure from uninitialized variable
2019-11-15 00:19:24 +01:00
Martin Kroeker
71e96163db
Merge pull request #2305 from wjc404/develop
...
AVX512 CGEMM & ZGEMM kernels
2019-11-12 07:38:37 +01:00
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
...
96-99% 1-thread performance of MKL2018
2019-11-11 20:04:52 +08:00
Martin Kroeker
4e466d739c
Merge pull request #15 from xianyi/develop
...
rebase
2019-11-09 18:52:08 +01:00
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
...
Optimize SGEMM on SKYLAKEX CPUs
2019-11-06 07:27:33 +01:00
wjc404
836c414e22
optimizations of software prefetching
2019-11-05 13:36:56 +08:00
Martin Kroeker
d403eb3c2f
Merge pull request #2302 from martin-frbg/ppc970
...
Disable three-operand DCBT on PPC970 regardless of operating system
2019-11-04 22:55:05 +01:00
Martin Kroeker
3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be
...
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
2019-11-04 22:54:28 +01:00
Martin Kroeker
9955f0996f
Merge pull request #2294 from martin-frbg/ios-cleanup
...
Remove obsolete workarounds for IOS on ARMV8
2019-11-04 22:53:58 +01:00
wjc404
430c11e135
Add files via upload
2019-11-04 20:10:12 +08:00
wjc404
fbacd2605d
optimizations via software prefetches
2019-11-04 19:37:19 +08:00
Martin Kroeker
6fa89b06a1
Use the two-operand form of DCBT on all PPC970 regardless of OS
...
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well
2019-11-03 22:55:31 +01:00
Martin Kroeker
68597002ea
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:42:46 +01:00
Martin Kroeker
d2a6285549
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:41:19 +01:00
Martin Kroeker
d999688d1a
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:39:06 +01:00
Martin Kroeker
928fe1b28e
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:37:27 +01:00
Martin Kroeker
ccc28c6d60
Merge pull request #13 from xianyi/develop
...
resync with upstream
2019-11-03 22:33:31 +01:00
wjc404
ae43b75a6a
Add files via upload
2019-11-02 10:09:19 +08:00
wjc404
54fc06fd70
Add files via upload
2019-11-02 10:06:13 +08:00
wjc404
1df9a2013d
new sgemm kernel for skylakex
2019-11-02 00:00:48 +08:00
wjc404
274ff5cdb8
update sgemm_q on skylakex cpus
2019-11-01 23:59:18 +08:00
Martin Kroeker
eb2eddf241
Merge pull request #2296 from kdunee/develop
...
Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty
2019-10-28 13:24:18 +01:00
k.dunikowski
8691825944
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty
2019-10-28 08:51:05 +01:00
Martin Kroeker
7dc8a76f60
Merge pull request #2293 from martin-frbg/pr2288
...
Add support for NetBSD by adding it to the existing xBSD conditionals
2019-10-25 23:46:39 +02:00
Martin Kroeker
df857551c0
Remove special parameter set for obsolete IOS/ARMV8 workaround
2019-10-25 23:07:00 +02:00
Martin Kroeker
85ccdce8c4
Remove the IOS fallbacks to generic C kernels
2019-10-25 23:02:37 +02:00
Martin Kroeker
aeabe0a83f
Fix regex to parse -R options with and without whitespace
...
Both forms are seen on NetBSD (#2288 )
2019-10-25 22:52:30 +02:00
Martin Kroeker
1b90989662
Add NetBSD to the xBSD conditionals
2019-10-25 12:52:49 +02:00
Martin Kroeker
e3e8b5cdca
Add NetBSD
2019-10-25 12:51:06 +02:00
Martin Kroeker
69b16a894d
Merge pull request #2292 from martin-frbg/g95fixes
...
Improve support for g95 and non-GNU ld
2019-10-25 10:35:17 +02:00
Martin Kroeker
6782e5767d
Merge pull request #2291 from martin-frbg/gensymbol
...
Fix netlib 3.7/3.8 function enumeration for linktest
2019-10-25 10:34:50 +02:00
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
...
Optimize RPCC function on ARM64
2019-10-25 09:56:30 +02:00
Martin Kroeker
4ae1610f37
Merge pull request #2290 from martin-frbg/cpuidfixes
...
Fixup x86 cpuid changes from #2283
2019-10-24 22:52:15 +02:00
Martin Kroeker
911c3e2f4b
Improve support for g95 and non-GNU ld
...
Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check
2019-10-24 22:43:27 +02:00
Martin Kroeker
fab49e49e5
Move most lapack 3.7/3.8 additions to the embedded_underscores list
...
to allow linktest to pass with a compiler that adds a second underscore to such names
2019-10-24 21:26:20 +02:00
Martin Kroeker
b687fba5bc
Disable direct clock register access on IOS and Android
...
as I find conflicting information on accessibility from non-priviledged processes
2019-10-24 21:18:17 +02:00
luzpaz
46a8c2519a
Remove prototype of unused, unimplemented function ( #2274 )
...
* Fix source typo
Found via `codespell -q 3 -L amin,als,ba,dum,mone,nd,nto,orign -S Changelog.txt,./lapack*`
* Remove beta-thread function per request
2019-10-24 18:56:53 +02:00
Martin Kroeker
e9437eebd2
Restore Goldmont ID and improve QEMU support
...
#2283 had inadvertently removed Goldmont+, and cpuid was reporting a mix of Core2 and Pentium2 for some QEMU configurations
2019-10-24 18:45:27 +02:00
Martin Kroeker
3a39062cfc
Merge pull request #12 from xianyi/develop
...
resync with upstream
2019-10-24 18:40:13 +02:00
Martin Kroeker
eaa0be1313
Merge pull request #2286 from wjc404/develop
...
AVX512 DGEMM kernel
2019-10-20 12:44:19 +02:00
wjc404
6ff013bae0
native support for icopy_4
...
90% MKL 1-thread performance.
2019-10-19 03:54:44 +08:00
wjc404
0d669e04bb
Update dgemm_kernel_8x8_skylakex.c
2019-10-18 15:00:17 +08:00
wjc404
17cdd9f9e1
some correction
2019-10-18 14:58:07 +08:00
wjc404
6bcb06fcb1
make further changes to icopy_8 easier
2019-10-18 10:47:31 +08:00
wjc404
b7315f8401
Add files via upload
2019-10-16 19:23:36 +08:00
wjc404
9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c
2019-10-16 10:14:51 +08:00
wjc404
6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c
2019-10-16 03:20:08 +08:00
wjc404
5da9484d93
Add files via upload
2019-10-16 02:01:13 +08:00
wjc404
844629af57
Add files via upload
2019-10-16 02:00:34 +08:00
Martin Kroeker
2beaa82c05
Merge pull request #2283 from martin-frbg/issue2176
...
Support QEMU virtual cpu in 64bit mode as CORE2 or BARCELONA
2019-10-09 22:06:09 +02:00
Martin Kroeker
e8a2aed2b9
Support QEMU cpu calling itself 64bit AMD Athlon as well
...
Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.
2019-10-09 18:24:13 +02:00
Martin Kroeker
f262031685
Support QEMU virtual cpu as CORE2
...
qemu itself claims it is a 64bit P6, which does not exist in the wild.
2019-10-08 22:30:02 +02:00
Martin Kroeker
5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it ( #2279 )
...
* Add automatic fixups for OSX/IOS cross-compilation
* Add OSX/IOS cross-compilation test to Travis CI
* Handle platforms that lack hwcap.h by falling back to ARMV8
* Fix PROLOGUE for OSX/IOS
2019-10-08 20:13:14 +02:00
Martin Kroeker
f2cde2ccfb
Update common_arm64.h
2019-10-08 20:12:08 +02:00
Martin Kroeker
ba7838d2e1
Merge pull request #2280 from martin-frbg/iosfix
...
Add overlooked part of IOS compilation fix
2019-10-08 10:25:25 +02:00
Martin Kroeker
a448884a63
Remove automatic label postfixes from macro included only once
2019-10-08 08:37:50 +02:00
Martin Kroeker
17609f88f1
Merge pull request #11 from xianyi/develop
...
sync with upstream
2019-10-08 08:32:52 +02:00
Martin Kroeker
3a2df19db6
Fix accidental duplication of jump instruction
2019-10-08 08:09:26 +02:00
Martin Kroeker
d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275
...
Rewrite ARMV8 code to allow cross-compilation for IOS
2019-10-06 23:01:54 +02:00
Martin Kroeker
aa04b0925e
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
...
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-06 11:12:44 +02:00
Martin Kroeker
258ac56e0a
Move 32bit OSX build back to xcode 8.3 but switch to gcc8
2019-10-05 10:52:47 +02:00
Martin Kroeker
56837e9d92
Make local labels in macro compatible with the xcode assembler
...
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
2019-10-04 14:53:23 +02:00
Martin Kroeker
bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios
2019-10-04 14:50:03 +02:00
Martin Kroeker
32f5907fef
Update 32bit macOS again to xcode 9.3
...
os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source
2019-10-03 01:09:02 +02:00
Martin Kroeker
ac10236cc8
Update the OSX BINARY=32 test to xcode9.2
...
in response to Homebrew updates
2019-10-02 22:35:34 +02:00
Martin Kroeker
8617d75548
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-01 23:50:41 +02:00
Martin Kroeker
c07d78b9e9
Merge pull request #2272 from seberg/thread-sqrt-of-negative
...
Avoid taking root of negative number in symv_thread.c
2019-09-30 11:27:29 +02:00
Sebastian Berg
6355c25dde
Avoid taking root of negative number in symv_thread.c
...
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
2019-09-29 22:03:12 -07:00
Martin Kroeker
5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix
...
fixed bug power9 strmm . BLAS-TESTER passes
2019-09-29 13:53:45 +02:00
AbdelRauf
ede5efebab
trmm fix
2019-09-29 02:28:34 +00:00
Martin Kroeker
84908d60d2
Merge pull request #2269 from martin-frbg/ppc-fixes
...
Ppc fixes
2019-09-27 09:52:19 +02:00
Martin Kroeker
596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc
2019-09-27 00:47:18 +02:00
Martin Kroeker
7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel
2019-09-27 00:44:26 +02:00
Martin Kroeker
c0d570a357
Merge pull request #7 from xianyi/develop
...
update
2019-09-27 00:42:32 +02:00
Martin Kroeker
6b83079368
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters ( #2267 )
...
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
...
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Martin Kroeker
bfa2cc7d64
Restore ppc64 CI job and remove the travis_wait that caused the problem with it
2019-09-20 10:29:35 +02:00
Martin Kroeker
e7c4d6705a
Revert #2051 and replace with a better fix ( #2261 )
...
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Martin Kroeker
2a1911cc14
Merge pull request #6 from xianyi/develop
...
update to current develop
2019-09-13 14:00:23 +02:00
Martin Kroeker
9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
...
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel
2463938879
fix error message
2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c
more bugfix
2019-09-10 17:30:57 -04:00
Guillaume Horel
6cb47ea3f0
fix Makefile
2019-09-10 17:11:01 -04:00
Guillaume Horel
459bb9291d
fix error codes
2019-09-10 17:10:33 -04:00
Martin Kroeker
3f1077ce6f
Merge pull request #2249 from brada4/gcc7minor
...
Address minor warnings popping up in gcc7+
2019-09-10 08:27:32 +02:00
Martin Kroeker
eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds ( #2248 )
...
* Fix compiler identification and option setting
* Handle BINARY=32 option on X86_64
* Add xGEMM3M unroll parameters for crossbuild-target CORE2
* Replace bogus mingw64/32bit CI job with actual 32bit build
mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Guillaume Horel
f2becb777a
fix Makefile
2019-09-09 11:36:50 -04:00
Guillaume Horel
5997b6b491
bugfix
2019-09-08 11:14:49 -04:00
Guillaume Horel
4b21b646ea
turn on optimized code
2019-09-08 11:14:49 -04:00
Guillaume Horel
7ec7b999a5
add missing file
2019-09-08 11:14:49 -04:00
Guillaume Horel
af9ac0898a
fix Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
c7b5a459b6
add missing defines and headers
2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6
update Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
9f6984fe4b
add missing files
2019-09-08 11:14:49 -04:00
Guillaume Horel
42203dafdc
add logic
2019-09-08 11:14:49 -04:00
Guillaume Horel
a4f17a9297
add missing objects
2019-09-08 11:14:49 -04:00
Guillaume Horel
733d97b2df
add files
2019-09-08 11:14:49 -04:00
Guillaume Horel
ea747cf933
start working on ?trtrs
2019-09-08 11:14:49 -04:00
Andrew
4de545aa7d
address minor warnings from gcc7
2019-09-07 10:21:08 +03:00
Andrew
6e9a93ec19
init
2019-09-07 10:18:46 +03:00
Martin Kroeker
fde8a8e6a0
Improve cmake build behaviour with non-host cpu targets ( #2246 )
...
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker
256fc15f5f
Merge pull request #2 from xianyi/develop
...
update
2019-09-03 15:12:14 +02:00
Martin Kroeker
ee498525e0
Merge pull request #2242 from martin-frbg/issue2235
...
Add arch data for cmake cross-compiling to CORE2
2019-09-02 22:06:29 +02:00
Martin Kroeker
1fec0570f6
Add cgemm and zgemm unroll factors for core2
2019-09-02 15:03:45 +02:00
Martin Kroeker
b5af7b9c78
Disable ppc64le test environment on Travis CI
...
as this semi-official beta option has suddenly reverted to a standard x86_64 environment causing spurious failures
2019-08-31 18:06:12 +02:00
Martin Kroeker
f3c314550c
Merge pull request #2243 from quickwritereader/develop
...
possible cgemv,caxpy,cdot fix
2019-08-30 23:06:23 +02:00
AbdelRauf
847c20c9b7
fix uninitialized variables i
2019-08-30 11:14:55 +00:00
AbdelRauf
4c22828812
caxpy and cdot are using vec_vsx_ld
2019-08-30 04:09:15 +00:00
AbdelRauf
e79712d969
cgemv using vec_vsx_ld instead of letting gcc to decide
2019-08-30 02:52:04 +00:00
AbdelRauf
be09551cdf
aligned
2019-08-29 23:22:23 +00:00
Martin Kroeker
ec1ef6aa9e
Merge pull request #2241 from martin-frbg/zdotfix
...
Make x86_64 zdot compile with PGI and Sun C again
2019-08-29 07:12:54 +02:00
Martin Kroeker
11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
2019-08-28 18:07:44 +02:00
Martin Kroeker
bf0d92a310
Add arch data for cross-compiling to CORE2
...
for #2235
2019-08-28 17:35:56 +02:00
Martin Kroeker
db066151ee
Merge pull request #2240 from martin-frbg/issue2237
...
Fix PGI build options (again)
2019-08-28 15:30:53 +02:00
Martin Kroeker
3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again
...
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
2019-08-28 11:35:31 +02:00
Martin Kroeker
7d380f7d79
Fix PGI build options (again)
...
for #2237
2019-08-28 11:31:20 +02:00
Martin Kroeker
300f158d3b
Merge pull request #2239 from martin-frbg/issue2231
...
Fix 32bit armv8 compilation regression
2019-08-28 07:54:57 +02:00
Martin Kroeker
3635fdbf2b
Do not abuse the global ARCH variable as a local temporary
...
Setting it with a simple "uname -m" just to be able to decide whether to compile getarch.c with -march=native
may actually keep getarch from doing a proper probe. Fixes #2231 , a regression caused by #2110
2019-08-27 22:52:17 +02:00
Martin Kroeker
b6552b11eb
Merge pull request #2 from xianyi/develop
...
merge develop
2019-08-27 22:41:31 +02:00
Kavana Bhat
3dc6b26eff
AIX changes for Power8
2019-08-20 06:51:35 -05:00
Martin Kroeker
5fdf9ad24f
Merge pull request #2228 from martin-frbg/issue2227
...
Add Intel Goldmont Plus CPUID
2019-08-19 18:26:51 +02:00
Martin Kroeker
2fe967c542
Merge branch 'develop' into issue2227
2019-08-19 14:20:39 +02:00
Martin Kroeker
6d8595351c
Add Intel Goldmont Plus CPUID
...
fixes #2227
2019-08-19 14:19:21 +02:00
Martin Kroeker
f40200f559
Merge pull request #2223 from martin-frbg/getarch-pgi
...
Make getarch compile with PGI
2019-08-16 12:21:30 +02:00
Martin Kroeker
a95a5e52b8
Fix PGI compiler detection for getarch
2019-08-16 09:00:11 +02:00
Martin Kroeker
e3d846ab57
Do not use -march=native with the PGI compiler
2019-08-16 08:58:10 +02:00
Martin Kroeker
8506386d82
Merge pull request #1 from xianyi/develop
...
rebase
2019-08-16 08:56:15 +02:00
Martin Kroeker
9ef96b32a6
Add multithreading support to the x86_64 zdot kernel ( #2222 )
...
* Add multithreading support
copied from the ThunderX2T99 kernel. For #2221
2019-08-15 22:09:12 +02:00
Martin Kroeker
b48c025974
Merge pull request #2218 from martin-frbg/issue2215
...
Make the new DGEMM regression test properly depend on CBLAS and LAPACKE
2019-08-14 07:32:31 +02:00
Martin Kroeker
a1fce67743
Make the new DGEMM regression test properly depend on CBLAS and LAPACKE
...
fixes #2215
2019-08-13 22:29:48 +02:00
Martin Kroeker
103b32fdb7
Merge pull request #2216 from martin-frbg/issue2214
...
Remove case-sensitivity in x86 LSAME on (AMD) cpus without CMOV
2019-08-13 13:59:33 +02:00
Martin Kroeker
aef9804089
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
...
Problem was already noticed some years ago in #238 , but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214
2019-08-13 10:19:10 +02:00
Martin Kroeker
303869f572
Update with changes from 0.3.7
2019-08-11 23:31:36 +02:00
Martin Kroeker
02d9203981
Increment version to 0.3.8.dev
2019-08-11 23:28:47 +02:00
Martin Kroeker
7b6808b69c
Increment version to 0.3.8.dev
2019-08-11 23:28:13 +02:00
Martin Kroeker
5f36f18148
Update with 0.3.7 changes
2019-08-11 23:23:27 +02:00
Martin Kroeker
d47fe78b0e
Set version to 0.3.7
2019-08-11 23:16:45 +02:00
Martin Kroeker
ebe2f47a0f
Set version to 0.3.7
2019-08-11 23:16:11 +02:00
Martin Kroeker
20d417762f
Merge pull request #2213 from xianyi/develop
...
Update from develop in preparation of the 0.3.7 release
2019-08-11 23:14:49 +02:00
Martin Kroeker
321288597c
Merge pull request #2212 from martin-frbg/nofort-nolib
...
Avoid spurious dependency on the fortran runtime despite NOFORTRAN=1
2019-08-11 20:26:34 +02:00
Martin Kroeker
be147a9f28
Avoid adding a spurious dependency on the fortran runtime despite NOFORTRAN=1
...
for cases where a fortran compiler is present but not wanted (e.g. not fully functional)
2019-08-11 16:24:39 +02:00
Martin Kroeker
c275290ea6
Merge pull request #2211 from martin-frbg/arm64_gcc_trivial
...
Silence two nuisance warnings from gcc
2019-08-11 16:08:05 +02:00
Martin Kroeker
b7bbb02447
Silence two nuisance warnings from gcc
2019-08-11 12:46:05 +02:00
Martin Kroeker
bf1430f7d7
Merge pull request #2208 from martin-frbg/munmap-debug
...
Provide more information on mmap/munmap failure
2019-08-09 07:55:35 +02:00
Martin Kroeker
dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm
...
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
2019-08-09 07:55:20 +02:00
Martin Kroeker
5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm
...
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
2019-08-09 07:55:02 +02:00
Martin Kroeker
1776ad82c0
Add files via upload
2019-08-09 00:08:11 +02:00
Martin Kroeker
4e2f81cfa1
Provide more information on mmap/munmap failure
...
for #2207
2019-08-08 23:15:35 +02:00
Martin Kroeker
acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
2019-08-03 12:40:13 +02:00
Martin Kroeker
96a794e9fd
Merge pull request #2198 from martin-frbg/icelake
...
Update CPUID recognition for Intel Ice Lake
2019-08-02 08:36:14 +02:00
Martin Kroeker
3d36c45116
Add CPUID identification of Intel Ice Lake
2019-08-01 22:52:35 +02:00
Martin Kroeker
648491e1aa
Autodetect Intel Ice Lake (as SKYLAKEX target)
2019-08-01 22:51:09 +02:00
Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
...
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
2019-07-28 23:17:28 +02:00
Martin Kroeker
4c153ec9da
Merge pull request #2196 from wjc404/develop
...
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
2019-07-28 23:11:40 +02:00
wjc404
7eecd8e39c
Add files via upload
2019-07-28 07:39:09 +08:00
Martin Kroeker
f0406a7708
Merge pull request #2112 from ffontaine/develop
...
Makefile.arm: remove -march flags
2019-07-27 13:00:13 +02:00
Martin Kroeker
561f3fd995
Merge pull request #2193 from martin-frbg/makeutest
...
Override special make variables
2019-07-24 20:19:21 +02:00
Martin Kroeker
30efed14d1
Unset special make variables in ctest Makefile as well
2019-07-24 15:26:09 +02:00
Martin Kroeker
af2e7f28fc
Override special make variables
...
as seen in https://github.com/xianyi/OpenBLAS/issues/1912#issuecomment-514183900 , any external setting of TARGET_ARCH (which could result from building OpenBLAS as part of a larger project that actually uses this variable) would cause the utest build to fail.
(Other subtargets appear to be unaffected as they do not use implicit make rules)
2019-07-23 16:56:40 +02:00
Martin Kroeker
4250e6ed64
Merge pull request #2191 from tylerjereddy/conditional_updates
...
MAINT: remove legacy CMake endif()
2019-07-23 16:20:39 +02:00
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
...
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
2019-07-23 16:15:08 +02:00
Martin Kroeker
d14cf1ccf4
Merge pull request #2189 from wjc404/develop
...
Update dgemm_kernel_4x8_haswell.S for reducing cache misses
2019-07-23 08:32:56 +02:00
Tyler Reddy
3f6ab1582a
MAINT: remove legacy CMake endif()
...
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
2019-07-22 21:24:57 -06:00
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
...
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
2019-07-22 08:28:16 +02:00
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
2019-07-21 01:10:32 +08:00
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
2019-07-21 00:47:45 +08:00
wjc404
9440fa607d
Add files via upload
2019-07-20 22:08:22 +08:00
wjc404
94db259e5b
Add files via upload
2019-07-20 22:04:41 +08:00
wjc404
f49f8047ac
Add files via upload
2019-07-20 14:33:37 +08:00
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
2019-07-19 23:58:24 +08:00
wjc404
9c89757562
Add files via upload
2019-07-19 23:47:58 +08:00
Martin Kroeker
b0b7600bef
Merge pull request #2186 from wjc404/develop
...
Update "dgemm_kernel_4x8_haswell.S" for improving performance on zen2 chips
2019-07-18 16:04:44 +02:00
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
2019-07-17 23:50:03 +08:00
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
2019-07-17 23:47:30 +08:00
wjc404
211ab03b14
Update dgemm_kernel_4x8_haswell.S
2019-07-17 22:39:15 +08:00
wjc404
1733f927e6
Update dgemm_kernel_4x8_haswell.S
2019-07-17 21:27:41 +08:00
wjc404
182b06d6ad
Update dgemm_kernel_4x8_haswell.S
2019-07-17 17:02:35 +08:00
wjc404
7a9050d681
Update dgemm_kernel_4x8_haswell.S
2019-07-17 00:55:06 +08:00
wjc404
0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
...
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
2019-07-17 00:46:51 +08:00
Martin Kroeker
bafa021ed6
Merge pull request #2181 from isuruf/install_name
...
Change install_name on osx to match linux
2019-07-09 20:08:52 +02:00
Isuru Fernando
b89d9762a2
Change install_name on osx to match linux
2019-07-08 17:14:35 -05:00
Martin Kroeker
08dedf4c5e
Merge pull request #2177 from martin-frbg/noaff
...
Fix surprising behaviour of NO_AFFINITY=0
2019-07-07 18:28:21 +02:00
Martin Kroeker
b89c781637
Fix surprising behaviour of NO_AFFINITY=0
2019-07-07 16:04:45 +02:00
Martin Kroeker
dd7ff77f4b
Merge pull request #2175 from martin-frbg/cmake-mingw-fixes
...
Fix CMAKE compilation with MinGW32 and add it to Appveyor
2019-07-06 18:07:19 +02:00
Martin Kroeker
8fb76134bc
Mingw32 needs leading underscore on object names
...
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
2019-07-06 15:07:15 +02:00
Martin Kroeker
04d671aae2
Make disabling DYNAMIC_ARCH on unsupported systems work
...
needs to be unset in the cache for the change to have any effect
2019-07-06 15:05:04 +02:00
Martin Kroeker
f69a0be712
Add getarch flags to disable AVX on x86
...
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Martin Kroeker
ae9e8b131e
Add mingw builds to Appveyor config
2019-07-06 14:30:33 +02:00
Martin Kroeker
9086543f50
Utest needs CBLAS but not necessarily FORTRAN
2019-07-06 14:29:47 +02:00
Martin Kroeker
abea977ded
Merge pull request #2162 from martin-frbg/pgi
...
Fixes for PGI compiler
2019-07-03 19:16:30 +02:00
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
...
power9 cgemm/ctrmm. new sgemm 8x16
2019-07-01 21:06:02 +02:00
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
2019-07-01 14:07:54 +00:00
Martin Kroeker
2f13f04224
Merge pull request #2170 from pkubaj/patch-1
...
Fix build on PPC970 for FreeBSD
2019-06-30 23:29:02 +02:00
pkubaj
7c7505a778
Fix build for PPC970 on FreeBSD pt.2
...
FreeBSD needs those macros too.
2019-06-28 10:31:45 +00:00
pkubaj
5a4f1a2118
Fix build for PPC970 on FreeBSD pt. 1
...
FreeBSD needs DCBT_ARG=0 as well.
2019-06-28 10:29:44 +00:00
Martin Kroeker
3b761892df
Merge pull request #2169 from pkubaj/develop
...
Fix build on FreeBSD/powerpc64.
2019-06-25 12:56:33 +02:00
Piotr Kubaj
eebfeba768
Fix build on FreeBSD/powerpc64.
...
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl >
2019-06-25 10:58:56 +02:00
Martin Kroeker
7684c4f8f8
PGI compiler does not like -march=native
2019-06-20 19:56:01 +02:00
Martin Kroeker
7faf42b7bb
Merge pull request #2167 from kavanabhat/dtrmm_power8_segfault
...
Fix DTRMMKERNEL register save for power8 64-bit mode (Fix for #2166 )
2019-06-19 14:38:01 +02:00
kavanabhat
a575f1e4c7
Update dtrmm_kernel_16x4_power8.S
2019-06-19 15:27:14 +05:30
AbdelRauf
cdbfb891da
new sgemm 8x16
2019-06-17 15:33:38 +00:00
Martin Kroeker
280552b988
Fix mov syntax
2019-06-16 18:35:43 +02:00
Martin Kroeker
bbd4bb0154
Zero ecx with a mov instruction
...
PGI assembler does not like the initialization in the constraints.
2019-06-16 15:04:10 +02:00
Martin Kroeker
6d3efb2b58
Update Makefile.x86_64
2019-06-14 08:08:11 +02:00
Martin Kroeker
d9ff2cd90d
Do not force gcc options on non-gcc compilers
...
fixes compile failure with pgi 18.10 as reported on OpenBLAS-users
2019-06-13 23:01:35 +02:00
Martin Kroeker
2a43062de7
Merge pull request #2159 from martin-frbg/issue2149
...
Avoid unintentional activation of TLS codepath via USE_TLS=0
2019-06-10 19:12:45 +02:00
Martin Kroeker
4ea794a522
Avoid unintentional activation of TLS code via USE_TLS=0
...
fixes #2149
2019-06-10 17:24:15 +02:00
Martin Kroeker
ece0bfb881
Merge pull request #2158 from martin-frbg/issue2143
...
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
2019-06-10 14:08:11 +02:00
Martin Kroeker
1f4b6a5d5d
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
...
from #2143 , -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
2019-06-10 09:50:13 +02:00
Martin Kroeker
be8f70d269
Merge pull request #2157 from martin-frbg/2154-2
...
Add gfortran workaround for potential ABI violation
2019-06-09 12:19:08 +02:00
Martin Kroeker
e674e1c735
Update fc.cmake
2019-06-09 09:31:13 +02:00
Martin Kroeker
6ca898b63b
Add gfortran workaround for potential ABI violation
...
for #2154
2019-06-08 23:17:03 +02:00
Martin Kroeker
26411acd56
Merge pull request #2148 from TiborGY/cpp_thread_test_2
...
Thread safety tester using C++11 threading (cleaned history)
2019-06-07 13:23:07 +02:00
Martin Kroeker
0ab4076dd8
Merge pull request #2156 from martin-frbg/issue2154
...
Add gfortran workaround for C->FORTRAN ABI violation
2019-06-06 13:43:12 +02:00
Martin Kroeker
a0caa762b3
Add gfortran workaround for ABI violations
...
for #2154 (see gcc bug 90329)
2019-06-06 10:24:16 +02:00
Martin Kroeker
900d5a3205
Add gfortran workaround for ABI violations in LAPACKE
...
for #2154 (see gcc bug 90329)
2019-06-06 10:18:40 +02:00
Martin Kroeker
a17cf36225
Merge pull request #2153 from quickwritereader/develop
...
improved power9 zgemm,sgemm
2019-06-06 07:42:56 +02:00
AbdelRauf
148c4cc5fd
conflict resolve
2019-06-05 20:50:50 +00:00
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
2019-06-05 20:07:16 +00:00
Martin Kroeker
909ad04aef
Merge pull request #2145 from martin-frbg/1912-3
...
Separate implementations of AMAX and IAMAX on arm
2019-06-05 20:27:45 +02:00
Martin Kroeker
417efd41c6
Merge pull request #2110 from pc2/cpu-detection
...
Fix detection of Skylake processors when using GCC
2019-06-05 20:27:05 +02:00
Michael Lass
9cdc828afa
c_check: Unlink correct file
2019-06-05 17:31:01 +02:00
Michael Lass
7a9a4dbc4f
Fix detection of AVX512 capable compilers in getarch
...
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
2019-06-05 17:30:56 +02:00
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
2019-06-04 07:11:30 +00:00
Martin Kroeker
27649b9543
Document NO_AVX512
...
for #2151
2019-06-03 11:01:33 +02:00
TiborGY
16f3df5d35
add c++ thread test option to Makefile.rule
2019-06-01 21:36:41 +02:00
TiborGY
1aded69821
hook up c++ thread safety test (main Makefile)
2019-06-01 21:32:52 +02:00
TiborGY
c00289ba54
upload thread safety test folder
2019-06-01 21:30:06 +02:00
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
2019-05-30 15:31:25 +00:00
Martin Kroeker
74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp
2019-05-30 11:38:11 +02:00
Martin Kroeker
c5495d2056
Ensure correct output for DAMAX with softfp
2019-05-30 11:25:43 +02:00
Martin Kroeker
c70496b108
Separate implementations of AMAX and IAMAX on arm
...
As noted in #1912 and comment on #1942 , the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
2019-05-29 15:02:51 +02:00
Martin Kroeker
ca8d8835f5
Merge pull request #2144 from xianyi/revert-2142-issue1912-2
...
Revert "Add softfp support in min/max kernels"
2019-05-29 14:09:10 +02:00
Martin Kroeker
d76b20b4d2
Revert "Add softfp support in min/max kernels"
2019-05-29 14:07:17 +02:00
Martin Kroeker
85af04da3c
Merge pull request #2142 from martin-frbg/issue1912-2
...
Add softfp support in min/max kernels
2019-05-28 22:56:08 +02:00
Martin Kroeker
11e0dcbffb
Merge pull request #2141 from martin-frbg/issue1912
...
Build and run utests independently of fortran
2019-05-28 20:50:40 +02:00
Martin Kroeker
79366ff7a9
Add softfp support in min/max kernels
...
fix for #1912
2019-05-28 20:34:22 +02:00
Martin Kroeker
21d05a4835
Merge pull request #2140 from martin-frbg/pgi19
...
Do not try ancient PGI hacks with recent versions of that compiler
2019-05-26 12:39:20 +02:00
Martin Kroeker
940f38f6dd
Build and run utests in any case, they do their own checks for fortran availability
2019-05-24 13:02:23 +02:00
Martin Kroeker
1778fd4219
Do not try ancient PGI hacks with recent versions of that compiler
...
should fix #2139
2019-05-22 13:48:27 +02:00
Martin Kroeker
969dd6175e
Merge pull request #2136 from martin-frbg/issue2126
...
Add option to allow combining USE_THREAD=0 with thread locking support
2019-05-16 12:08:16 +02:00
Martin Kroeker
d8d5682481
Merge pull request #2134 from tylerjereddy/skylake_regress_guard_may14
...
TST: add SkylakeX AVX512 CI test
2019-05-15 23:40:06 +02:00
Martin Kroeker
f66c11fc22
Remove unrelated change
2019-05-15 23:38:12 +02:00
Martin Kroeker
5ecffc28f2
Add option USE_LOCKING but keep default settings intact
2019-05-15 23:36:17 +02:00
Martin Kroeker
86dda5c2fa
Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds
2019-05-15 23:21:20 +02:00
Martin Kroeker
1e52572be3
Add option USE_LOCKING for single-threaded build with locking support
2019-05-15 23:19:30 +02:00
Martin Kroeker
d2cb610272
Add option USE_LOCKING for single-threaded build with locking support
...
for calling from concurrent threads
2019-05-15 23:18:43 +02:00
Tyler Reddy
a211bc9b6a
TST: add SkylakeX AVX512 CI test
...
* adapt the C-level reproducer code for some
recent SkylakeX AVX512 kernel issues, provided
by Isuru Fernando and modified by Martin Kroeker,
for usage in the utest suite
* add an Intel SDE SkylakeX emulation utest run to
the Azure CI matrix; a custom Docker build was required
because Ubuntu image provided by Azure does not support
AVX512VL instructions
2019-05-14 11:32:23 -07:00
Martin Kroeker
9208ab8603
Merge pull request #2130 from isuruf/drone
...
Drone CI for arm64 native builds
2019-05-14 09:37:00 +02:00
Isuru Fernando
b43deb4ad6
Fix typo
2019-05-12 15:26:18 -05:00
Isuru Fernando
b911525c81
arm32 build
2019-05-12 15:21:43 -05:00
Isuru Fernando
7ff44e0016
Remove qemu armv8 builds
2019-05-12 15:09:53 -05:00
Isuru Fernando
e3cb8ad2d6
See if ubuntu 19.04 fixes the ICE
2019-05-12 14:28:48 -05:00
Isuru Fernando
7aa6faad5f
parallel build
2019-05-12 14:22:36 -05:00
Isuru Fernando
3d94ab660f
build without lapack on cmake
2019-05-12 14:17:12 -05:00
Isuru Fernando
cd99dfe034
Add cmake builds and print options
2019-05-12 14:10:10 -05:00
Isuru Fernando
dadafcdcd8
Add a cmake build as well
2019-05-12 14:10:10 -05:00
Isuru Fernando
d40c109eb0
no need of gcc in clang build
2019-05-12 14:10:10 -05:00
Isuru Fernando
608cd69b66
update yes
2019-05-12 14:10:10 -05:00
Isuru Fernando
231472c4c6
Fix typo
2019-05-12 14:10:10 -05:00
Isuru Fernando
612c2d78e0
apt update
2019-05-12 14:10:10 -05:00
Isuru Fernando
dc110e179d
Switch to ubuntu and parallel jobs
2019-05-12 14:10:09 -05:00
Isuru Fernando
9184590c33
gfortran->gcc-gfortran
2019-05-12 14:10:09 -05:00
Isuru Fernando
a0aaf308ed
Install gfortran and add a clang job
2019-05-12 14:10:09 -05:00
Isuru Fernando
15f925fe9a
Install perl
2019-05-12 14:10:09 -05:00
Isuru Fernando
21acf03e9a
Install gcc
2019-05-12 14:10:09 -05:00
Isuru Fernando
ff807473bb
remove sudo
2019-05-12 14:10:09 -05:00
Isuru Fernando
58829c0988
install make
2019-05-12 14:10:09 -05:00
Isuru Fernando
d86f0b9e74
Test drone CI
2019-05-12 14:10:09 -05:00
Martin Kroeker
63554d5dec
Merge pull request #2129 from martin-frbg/armv8azure
...
Move ARMv8/gcc CI job from Travis to Azure
2019-05-12 09:55:57 +02:00
Martin Kroeker
43068288e9
Update .travis.yml
2019-05-11 22:37:06 +02:00
Martin Kroeker
999a04f101
Move ARMv8 gcc build from Travis to Azure
2019-05-11 16:08:23 +02:00
Martin Kroeker
3cb1c8d210
Move ARMv8 gcc build from Travis to Azure
2019-05-11 16:07:30 +02:00
Martin Kroeker
ff1bfe7b16
Merge pull request #2127 from martin-frbg/issue2114_2
...
Add NO_AFFINITY to available CMAKE options on Linux, and set it to ON
2019-05-09 15:25:09 +02:00
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
...
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
2019-05-09 14:42:36 +02:00
Martin Kroeker
a3d4c65d62
Add NO_AFFINITY to available options on Linux, and set it to ON
...
to match the gmake default. Fixes second part of #2114
2019-05-09 11:52:02 +02:00
Martin Kroeker
e1fc02095c
Merge pull request #2124 from tylerjereddy/manylinux1_azure
...
TST: Azure manylinux1 & clean-up
2019-05-09 08:57:37 +02:00
Martin Kroeker
0cd6d8508f
Merge pull request #2123 from tylerjereddy/azure_readme_badge
...
DOC: Add Azure CI status badge to README
2019-05-09 08:10:19 +02:00
Martin Kroeker
c2f152c470
Merge pull request #2120 from brada4/getrf-2113
...
Address redundant code concern #2113
2019-05-09 08:10:00 +02:00
Tyler Reddy
4efbac28ed
TST: Azure manylinux1 & clean-up
...
* remove some of the steps & comments
from the original Azure yml template
* modify the trigger section to use
develop since OpenBLAS primarily uses
this branch; use the same batching
behavior as downstream projects NumPy/
SciPy
* remove Travis emulated ARMv6 gcc build
because this now happens in Azure
* use documented Ubuntu vmImage name for Azure
and add in a manylinux1 test run to the matrix
[skip appveyor]
2019-05-08 21:58:49 -07:00
Martin Kroeker
406c7242f4
Add ARMV6 build to azure CI setup ( #2122 )
...
using aytekinar's Alpine image and docker script from the Travis setup
[skip ci]
2019-05-09 00:47:44 +02:00
Tyler Reddy
53703585aa
DOC: Add Azure CI status badge
2019-05-08 15:15:50 -07:00
Martin Kroeker
ad20ceaa68
Update azure-pipelines.yml
2019-05-08 19:07:58 +02:00
Martin Kroeker
dd77a3f0e2
Update azure-pipelines.yml
2019-05-08 15:25:43 +02:00
Martin Kroeker
a598ab1d32
Update azure-pipelines.yml
2019-05-08 15:23:54 +02:00
Martin Kroeker
16fd8e3dbe
Update azure-pipelines.yml
2019-05-08 14:14:22 +02:00
Martin Kroeker
aa4c41bad2
Update azure-pipelines.yml
...
take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie)
2019-05-08 14:12:02 +02:00
Martin Kroeker
5cf434167a
fix tabbing in azure commands
2019-05-08 13:58:59 +02:00
Martin Kroeker
3a49e8c05a
first try migrating one of the arm builds from travis
2019-05-08 13:52:22 +02:00
Martin Kroeker
95e2cf32e1
Merge pull request #2121 from tylerjereddy/ppc64le-travis
...
TST: add native POWER8 to CI
2019-05-08 13:31:46 +02:00
Martin Kroeker
70cea0b96b
Update link to IBM MASS library, update cpu support status
2019-05-08 12:20:00 +02:00
Martin Kroeker
ae0dec77ec
Merge pull request #2118 from Diazonium/develop
...
Change two http links to https
2019-05-08 11:41:17 +02:00
Tyler Reddy
e47b63466b
TST: add native POWER8 to CI
...
* add native POWER8 testing to
Travis CI matrix with ppc64le
os entry
2019-05-07 19:11:08 -07:00
Zhang Xianyi
7d1b468d9d
Set up CI with Azure Pipelines
...
[skip ci]
2019-05-08 09:58:01 +08:00
Andrew
575a84398a
remove redundant code #2113
2019-05-07 23:46:54 +03:00
Martin Kroeker
5cabda79d0
Merge pull request #2117 from martin-frbg/issue2114
...
Fix errors in cpu affinity setup with glibc 2.6
2019-05-07 18:18:16 +02:00
Diazonium
c516209581
Change two http links to https
...
Closes #2109
2019-05-07 14:55:20 +02:00
Martin Kroeker
a6a8cc2b7f
Fix errors in cpu enumeration with glibc 2.6
...
for #2114
2019-05-07 13:34:52 +02:00
Andrew
3d7debbb28
init
2019-05-07 13:15:08 +03:00
Fabrice Fontaine
5a9cce2bf6
Makefile.arm: remove -march flags
...
The provided -march flags, especially for ARMv5 and ARMv6 may not
necessarily match the needed ones: for ARMv5, it might be armv5,
armv5te, armv5t, etc. If the wrong one is used, the incorrect toolchain
sysroot can be used in a multilib toolchain.
Therefore, let the user building OpenBLAS pass the appropriate -march
flag.
The other flags, such as -mfpu=vfp or -mfloat-abi=hard are kept, as they
are actually required for the build to proceed (OpenBLAS uses VFP
instructions, and assume an EABIhf ABI).
[Peter: update for v0.2.20]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com >
Signed-off-by: Peter Korsgaard <peter@korsgaard.com >
[Retrieved from:
https://git.buildroot.net/buildroot/tree/package/openblas/0001-Makefile.arm-remove-march-flags.patch ]
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com >
2019-05-05 18:37:28 +02:00
Martin Kroeker
6a8b4269b5
Merge pull request #2111 from martin-frbg/issue1955
...
Disable the SkyLakeX DGEMMIxCOPY kernels as well
2019-05-05 18:08:49 +02:00
Martin Kroeker
b1561ecc68
Disable DGEMMINCOPY as well for now
...
#1955
2019-05-05 15:52:01 +02:00
Martin Kroeker
7ed8431527
Disable the SkyLakeX DGEMMITCOPY kernel as well
...
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
2019-05-04 22:54:41 +02:00
Martin Kroeker
a387a23518
Merge pull request #2101 from luzpaz/misc-typos
...
Misc. typo fixes in comments and documentation
2019-05-04 22:28:29 +02:00
luz.paz
b46875b76b
Revert Changelog.txt typos
2019-05-04 15:43:17 -04:00
luz.paz
858e609e1f
Revert reference/ fixes
2019-05-04 15:01:29 -04:00
Martin Kroeker
3f427c0cf9
Merge pull request #2107 from quickwritereader/develop
...
sgemm/strmm kernel for power9
2019-05-02 07:56:57 +02:00
Martin Kroeker
c95317158f
Merge pull request #2105 from martin-frbg/issue2104
...
Correct argument of CPU_ISSET for glibc <2.5
2019-05-02 07:56:37 +02:00
AbdelRauf
47f892198c
conflict resolve
2019-05-01 19:36:22 +00:00
Martin Kroeker
b43c8382c8
Correct argument of CPU_ISSET for glibc <2.5
...
fixes #2104
2019-05-01 10:46:46 +02:00
luz.paz
daf2fec12d
Misc. typo fixes
...
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker
4f8143b098
Increment version to 0.3.7.dev
2019-04-29 19:25:32 +02:00
Martin Kroeker
bfeb9c16b0
Increment version to 0.3.7.dev
2019-04-29 19:24:53 +02:00
Martin Kroeker
15cb124012
Merge pull request #2100 from xianyi/develop
...
Merge develop in preparation of 0.3.6 release
2019-04-29 19:22:19 +02:00
Martin Kroeker
97d5034ed3
Merge branch 'release-0.3.0' into develop
2019-04-29 19:21:54 +02:00
Martin Kroeker
9763f872fc
Update Changelog with changes from 0.3.6
2019-04-29 19:18:26 +02:00
AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
2019-04-29 08:57:44 +00:00
AbdelRauf
0f105dd8a5
sgemm/strmm
2019-04-29 08:49:50 +00:00
Martin Kroeker
9c4edd38f2
Merge pull request #2099 from martin-frbg/rela-gbtrf
...
Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF
2019-04-29 09:25:19 +02:00
Martin Kroeker
1036299da0
Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF
...
due to crashes in LAPACK tests
2019-04-29 00:12:37 +02:00
Martin Kroeker
5b0398186e
Merge pull request #2098 from martin-frbg/rela-malloc
...
Disable reallocation of work array in ReLAPACK xSYTRF
2019-04-28 19:31:01 +02:00
Martin Kroeker
452859f4e1
Merge pull request #2097 from martin-frbg/rela-getrf
...
Correct INFO=4 condition in ReLAPACK xGETRF
2019-04-28 19:28:57 +02:00
Martin Kroeker
2cd463eabd
Disable reallocation of work array in xSYTRF
...
as it appears to cause memory management problems (seen in the LAPACK tests)
2019-04-28 10:02:28 +02:00
Martin Kroeker
11530b76f7
Correct INFO=4 condition
2019-04-28 09:58:56 +02:00
Martin Kroeker
91943b7325
Merge pull request #2096 from martin-frbg/eig-testing
...
Avoid out-of-bounds accesses in LAPACK EIG tests
2019-04-28 09:55:42 +02:00
Martin Kroeker
268c28db7d
Merge pull request #2095 from martin-frbg/trsm
...
Correct length of name string in xerbla call
2019-04-28 09:55:25 +02:00
Martin Kroeker
2aad88d5b9
Avoid out-of-bounds accesses in LAPACK EIG tests
...
see https://github.com/Reference-LAPACK/lapack/issues/333
2019-04-27 23:01:49 +02:00
Martin Kroeker
0bd956fd21
Correct length of name string in xerbla call
2019-04-27 22:49:04 +02:00
Martin Kroeker
bbd9d98664
Merge pull request #2094 from martin-frbg/issue2066
...
Fix ReLAPACK integration problems
2019-04-27 22:45:47 +02:00
Martin Kroeker
798c448b0c
Add support for INTERFACE64 and fix XERBLA calls
...
1. Replaced all instances of "int" with "blasint"
2. Added string length as "hidden" third parameter in calls to fortran XERBLA
2019-04-27 19:06:00 +02:00
Martin Kroeker
9a19616a28
Support INTERFACE64=1
2019-04-27 18:55:47 +02:00
Martin Kroeker
6b41eb9c0c
Merge pull request #2092 from jeffbaylor/snprintf_with_MSC_VER
...
snprintf define consolidated to common.h
2019-04-23 20:12:06 +02:00
Martin Kroeker
ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
...
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Jeff Baylor
40e53e52d6
snprintf define consolidated to common.h
2019-04-22 17:01:34 -07:00
Martin Kroeker
744779d335
Merge pull request #2084 from RashmicaG/develop
...
Add in runtime CPU detection for POWER.
2019-04-14 21:40:07 +02:00
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
2019-04-09 14:20:16 +10:00
Martin Kroeker
e06b8438b4
Merge pull request #2080 from martin-frbg/issue2075
...
Add -lm and disable EXPRECISION support on *BSD
2019-04-02 21:40:58 +02:00
Martin Kroeker
9229d6859b
Add -lm and disable EXPRECISION support on *BSD
...
fixes #2075
2019-04-02 09:38:18 +02:00
Martin Kroeker
21d146a8de
Add declarations for ?sum
2019-03-31 22:12:23 +02:00
Martin Kroeker
7f4e36d219
Merge pull request #2073 from martin-frbg/issue2056-2
...
Detect 32bit environment on 64bit ARM hardware
2019-03-31 13:56:08 +02:00
Martin Kroeker
c04a729081
Add ?sum definitions for generic kernel
2019-03-31 13:55:49 +02:00
Martin Kroeker
100d94f94e
Add ?sum
2019-03-31 13:55:05 +02:00
Martin Kroeker
d17da6c6a4
Add cmake defaults for ?sum kernels
2019-03-31 11:57:01 +02:00
Martin Kroeker
1679de5e59
Detect 32bit environment on 64bit ARM hardware
...
for #2056 , using same approach as #2058
2019-03-31 10:50:43 +02:00
Martin Kroeker
246ca29679
Add ZARCH implementation of ?sum
...
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
2019-03-30 22:49:05 +01:00
Martin Kroeker
9d717cb5ee
Add x86_64 implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:27:04 +01:00
Martin Kroeker
e3bc83f2a8
Add x86 implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:26:10 +01:00
Martin Kroeker
70f2a4e0d7
Add SPARC implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
2019-03-30 22:25:06 +01:00
Martin Kroeker
706dfe263b
Add POWER implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
2019-03-30 22:23:42 +01:00
Martin Kroeker
688fa9201c
Add MIPS64 implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
2019-03-30 22:22:15 +01:00
Martin Kroeker
cdbe0f0235
Add MIPS implementation of ?sum
...
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:20:14 +01:00
Martin Kroeker
f8b82bc6dc
Add ia64 implementation of ?sum
...
as trivial copy of asum with the fabs calls removed
2019-03-30 22:18:03 +01:00
Martin Kroeker
3e3ccb9011
Add ARM64 implementations of ?sum
...
as trivial copies of the respective ?asum kernels with the fabs calls removed
2019-03-30 22:13:36 +01:00
Martin Kroeker
94ab4e6fb2
Add ARM implementations of ?sum
...
(trivial copies of the respective ?asum with the fabs calls removed)
2019-03-30 22:11:38 +01:00
Martin Kroeker
c3cfc6986b
Add implementations of ssum/dsum and csum/zsum
...
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
2019-03-30 22:05:11 +01:00
Martin Kroeker
b9f4943a14
Add ?sum
2019-03-30 22:01:13 +01:00
Martin Kroeker
79cfc24a62
Add interface for ?sum (derived from ?asum)
2019-03-30 21:59:18 +01:00
Martin Kroeker
5c42287c4f
Add declarations for ?sum and cblas_?sum
2019-03-30 21:58:03 +01:00
Martin Kroeker
32c7063cb0
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
...
Disable the AVX512 DGEMM kernel (again)
2019-03-30 21:21:38 +01:00
Martin Kroeker
c19a449096
Merge pull request #2071 from martin-frbg/issue2068
...
Provide CBLAS interfaces to I?MIN and I?MAX
2019-03-30 14:54:28 +01:00
Martin Kroeker
3d1e36d4cb
Build CBLAS interfaces for I?MIN and I?MAX
2019-03-30 12:38:41 +01:00
Martin Kroeker
4f9d3e4b28
Expose CBLAS interfaces for I?MIN and I?MAX
2019-03-30 12:37:13 +01:00
Martin Kroeker
4dec151d0b
Merge pull request #2070 from quickwritereader/develop
...
power9 makefile. dgemm based on power8 kernel with following changes …
2019-03-29 21:46:21 +01:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Martin Kroeker
3ae122e2c7
Merge pull request #2069 from aixoss/aix-asm-change
...
AIX asm syntax changes needed for shared object creation
2019-03-25 21:34:30 +01:00
Ayappan P
b043a5962e
AIX asm syntax changes needed for shared object creation
2019-03-25 18:53:25 +05:30
Martin Kroeker
8502030e5e
Merge pull request #2064 from embray/cygwin/use-tls-thread-memory-cleanup
...
Fix for #2063
2019-03-19 22:12:51 +01:00
Erik M. Bray
8ba9e2a61a
Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.
2019-03-19 11:21:44 +01:00
Erik M. Bray
4ad694eda1
Fix for #2063 : The DllMain used in Cygwin did not run the thread memory
...
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
2019-03-19 09:26:50 +01:00
Martin Kroeker
dff4a197a5
Merge pull request #2058 from xsacha/patch-3
...
Change 64-bit detection as explained in #2056
2019-03-16 11:57:23 +01:00
Martin Kroeker
a5425575b1
Merge pull request #2060 from embray/cygwin/readenv
...
Use POSIX getenv on Cygwin
2019-03-16 11:56:51 +01:00
Erik M. Bray
1006ff8a7b
Use POSIX getenv on Cygwin
...
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
2019-03-15 15:06:30 +01:00
Martin Kroeker
e608d4f7fe
Disable the AVX512 DGEMM kernel (again)
...
Due to as yet unresolved errors seen in #1955 and #2029
2019-03-13 22:10:28 +01:00
Martin Kroeker
4fc17d0d75
Trivial typo fix
...
as suggested in #2022
2019-03-13 19:20:23 +01:00
Sacha
c3e30b2bc2
Change 64-bit detection as explained in #2056
2019-03-13 23:21:54 +10:00
Martin Kroeker
03d7110900
Merge pull request #2042 from maomao194313/develop
...
add TARGET support for HiSilicon tsv110 CPUs
2019-03-12 22:57:39 +01:00
Martin Kroeker
3ce28fb81a
Merge pull request #2055 from martin-frbg/atomid
...
Add CPUID data for Intel Denverton (as Nehalem)
2019-03-12 22:57:07 +01:00
Martin Kroeker
04f2226ea6
Add Intel Denverton
2019-03-12 16:09:55 +01:00
Martin Kroeker
b1393c7a97
Add Intel Denverton
...
for #2048
2019-03-12 16:03:56 +01:00
maomao194313
7e3eb9b25d
make DYNAMIC_ARCH=1 package work on TSV110
2019-03-12 16:11:01 +08:00
maomao194313
f074d7d146
make DYNAMIC_ARCH=1 package work on TSV110.
2019-03-12 16:05:19 +08:00
Martin Kroeker
f18ab6c17b
Merge pull request #2051 from martin-frbg/issue2048
...
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
2019-03-09 16:39:35 +01:00
Martin Kroeker
946ec6c3b8
Merge pull request #2050 from kencu/PowerMacFix
...
PowerMac 970 fixes
2019-03-09 16:39:08 +01:00
Martin Kroeker
5b95534afc
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
...
for issue #2048
2019-03-09 11:21:16 +01:00
ken-cunningham-webuse
f7a06463d9
common_power.h: force DCBT_ARG 0 on PPC970 Darwin
...
without this, we see
../kernel/power/gemv_n.S:427:Parameter syntax error
and many more similar entries
that relates to this assembly command
dcbt 8, r24, r18
this change makes the DCBT_ARG = 0
and openblas builds through to completion on PowerMac 970
Tests pass
2019-03-07 12:03:45 -08:00
ken-cunningham-webuse
b0c714ef60
param.h : enable defines for PPC970 on DarwinOS
...
fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
#define SGEMM_P SGEMM_DEFAULT_P
^
2019-03-07 12:03:25 -08:00
Martin Kroeker
8d3d29e4d7
Merge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64
...
Fix crash in sgemm SSE/nano kernel on x86_64
2019-03-07 19:28:06 +01:00
Celelibi
b7f59da42d
Fix crash in sgemm SSE/nano kernel on x86_64
...
Fix bug #2047 .
Signed-off-by: Celelibi <celelibi@gmail.com >
2019-03-07 16:55:13 +01:00
Martin Kroeker
db3dc9e282
Merge pull request #2046 from kencu/powermac
...
ctest.c : add __POWERPC__ for PowerMac
2019-03-07 14:51:41 +01:00
ken-cunningham-webuse
4290afdae2
ctest.c : add __POWERPC__ for PowerMac
2019-03-06 20:55:06 -08:00
Martin Kroeker
4741ce803b
Merge pull request #2045 from martin-frbg/2033-3
...
Do not compile in AVX512 check if AVX support is disabled
2019-03-06 22:40:26 +01:00
Martin Kroeker
11cfd0bd75
Do not compile in AVX512 check if AVX support is disabled
...
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
2019-03-05 16:04:25 +01:00
Martin Kroeker
651ab01d2b
Merge pull request #2044 from martin-frbg/issue2043
...
Fix module definition conflicts between LAPACK and ReLAPACK
2019-03-05 12:11:32 +01:00
Martin Kroeker
d7b2c53c0b
Merge pull request #2039 from brada4/meminit
...
Address warning in memory.c
2019-03-05 12:11:15 +01:00
Martin Kroeker
e4864a8933
Fix module definition conflicts between LAPACK and ReLAPACK
...
for #2043
2019-03-04 21:17:08 +01:00
Martin Kroeker
10d841d8b9
Merge pull request #2026 from martin-frbg/trmv_threads
...
Correct range limiting in trmv_thread and re-enable TRMV multithreading
2019-03-04 15:08:31 +01:00
Martin Kroeker
12f2b76748
Merge pull request #2038 from martin-frbg/issue2035
...
Improve handling of NO_STATIC and NO_SHARED
2019-03-04 15:07:48 +01:00
Martin Kroeker
6c83b878f6
Merge pull request #2040 from martin-frbg/locks2002
...
Restore locking optimizations for OpenMP case
2019-03-04 15:07:14 +01:00
maomao194313
fb4dae7124
add TARGET support for HiSilicon tsv110 CPUs
2019-03-04 16:48:49 +08:00
maomao194313
760842dda1
add TARGET support for HiSilicon tsv110 CPUs
2019-03-04 16:45:22 +08:00
maomao194313
53f482ee72
add TARGET support for HiSilicon tsv110 CPUs
2019-03-04 16:41:21 +08:00
maomao194313
783ba8058f
HiSilicon tsv110 CPUs optimization branch
...
add HiSilicon tsv110 CPUs optimization branch
2019-03-04 16:30:50 +08:00
Martin Kroeker
af480b02a4
Restore locking optimizations for OpenMP case
...
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
2019-03-03 14:17:07 +01:00
Andrew
e4a79be6bb
address warning introed with #1814 et al
2019-03-03 09:05:11 +02:00
Andrew
e5c316c6b9
init
2019-03-03 08:59:27 +02:00
Martin Kroeker
25427926bc
Improve handling of NO_STATIC and NO_SHARED
...
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
2019-03-02 23:36:36 +01:00
Martin Kroeker
edb8143141
Merge pull request #2037 from martin-frbg/issue2033-2
...
Make sure that AVX512 is disabled in 32bit builds
2019-03-01 11:45:02 +01:00
Martin Kroeker
c4868d11c0
Make sure that AVX512 is disabled in 32bit builds
...
for #2033
2019-03-01 09:23:03 +01:00
Martin Kroeker
4c321ae571
Merge pull request #2034 from martin-frbg/issue2033
...
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
2019-02-28 22:10:12 +01:00
Martin Kroeker
2ffb727187
Keep xcode8.3 for osx BINARY=32 build
...
as xcode10 deprecated i386
2019-02-28 10:51:54 +01:00
Martin Kroeker
d66214c946
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
...
fixes #2033
2019-02-28 09:58:25 +01:00
Martin Kroeker
fd34820b99
Fix AVX512 test always returning false due to missing compiler option
2019-02-25 17:58:31 +01:00
Martin Kroeker
918a0cc4d1
Fix missing -c option in AVX512 test
2019-02-25 17:55:36 +01:00
Martin Kroeker
0db9c03e7e
Merge pull request #2028 from brada4/mv
...
Move one of clobber fixes to right place
2019-02-24 19:50:23 +01:00
Andrew
6eee1beac5
move fix to right place
2019-02-24 20:41:02 +02:00
Andrew
e5df5958cc
init
2019-02-24 20:39:25 +02:00
Martin Kroeker
343b301d14
Reduce list of kernels in the dynamic arch build
...
to make compilation complete reliably within the 1h limit again
2019-02-20 10:27:48 +01:00
Martin Kroeker
45333d5793
Fix error introduced during cleanup
2019-02-19 22:16:33 +01:00
Martin Kroeker
e29b0cfcc4
Allow multithreading TRMV again
...
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388 )
2019-02-19 21:03:30 +01:00
Martin Kroeker
78d9910236
Correct range_n limiting
...
same bug as seen in #1388 , somehow missed in corresponding PR #1389
2019-02-19 20:59:48 +01:00
Martin Kroeker
e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4
...
Fix inline assembly constraints in Bulldozer TRSM kernels
2019-02-17 11:49:15 +01:00
Martin Kroeker
1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3
...
Fix inline assembly constraints in various x86_64 GEMVN kernels
2019-02-17 11:48:57 +01:00
Martin Kroeker
aec905498f
Merge pull request #1988 from TiborGY/patch-1
...
Reword/expand comments in Makefile.rule
2019-02-17 11:36:04 +01:00
TiborGY
56089991e2
fix the the
2019-02-16 23:26:13 +01:00
Martin Kroeker
f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels
...
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
2019-02-16 20:06:48 +01:00
Martin Kroeker
8242b1fe3f
Fix inline assembly constraints
2019-02-16 18:51:09 +01:00
Martin Kroeker
efb9038f72
Fix inline assembly constraints
2019-02-16 18:46:17 +01:00
Martin Kroeker
e976557d29
Fix inline assembly constraints
...
rework indices to allow marking argument lda as input and output.
2019-02-16 18:36:39 +01:00
Martin Kroeker
9d8be15789
Fix inline assembly constraints
...
rework indices to allow marking argument lda4 as input and output. For #2009
2019-02-16 18:24:11 +01:00
Martin Kroeker
d752799a0f
Merge pull request #2021 from martin-frbg/gcc9fixes2
...
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
2019-02-16 18:05:40 +01:00
TiborGY
f209fc7fa9
Update Makefile.rule
...
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
2019-02-16 12:12:39 +01:00
Martin Kroeker
c26c0b77a7
Fix wrong constraints in inline assembly
...
for #2009
2019-02-15 15:08:16 +01:00
Martin Kroeker
1c6da2d03c
Merge pull request #2019 from martin-frbg/gcc9fixes
...
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
2019-02-15 15:02:54 +01:00
Martin Kroeker
4255a58cd2
Rename operands to put lda on the input/output constraint list
2019-02-15 10:10:04 +01:00
Martin Kroeker
d3e4725548
Merge pull request #2020 from martin-frbg/issue1956
...
With the Intel compiler on Linux, prefer ifort for the final link step
2019-02-15 09:57:59 +01:00
Martin Kroeker
adb419ed67
With the Intel compiler on Linux, prefer ifort for the final link step
...
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
2019-02-14 22:57:30 +01:00
Martin Kroeker
46e415b140
Save and restore input argument 8 (lda4)
...
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009 )
2019-02-14 22:43:18 +01:00
Martin Kroeker
cd5a59b9cf
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
...
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
2019-02-14 21:55:11 +01:00
Bart Oldeman
69a97ca7b9
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
...
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
2019-02-14 16:27:58 +00:00
Martin Kroeker
b55c586fac
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function ( #2017 )
...
* Fix missing clobber in blas_quickdivide assembly
2019-02-14 15:21:36 +01:00
Martin Kroeker
056917d616
Merge pull request #2013 from martin-frbg/issue2011
...
Fix invalid memory access in PPC gemm_beta
2019-02-14 09:29:34 +01:00
Martin Kroeker
718efcec6f
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
2019-02-13 22:08:37 +01:00
Martin Kroeker
f9d67bb5e8
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
2019-02-13 22:06:41 +01:00
Martin Kroeker
76bb74fcd4
Merge pull request #2012 from maamountki/z14
...
[ZARCH] Many improvements
2019-02-13 20:15:56 +01:00
maamountki
0a54c98b9d
[ZARCH] Modify constraints
2019-02-13 21:06:25 +02:00
maamountki
bec54ae366
[ZARCH] Fix caxpy
2019-02-13 12:54:35 +02:00
Martin Kroeker
63d7bad8a5
Merge pull request #2010 from martin-frbg/issue2009
...
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
2019-02-12 23:24:02 +01:00
Martin Kroeker
ab1630f9fa
Fix declaration of arguments in inline assembly
...
Argument 0 is modified so should be input and output
2019-02-12 16:14:02 +01:00
Martin Kroeker
b824fa70eb
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
...
Arguments 0 and 1 are both input and output
2019-02-12 16:00:18 +01:00
Martin Kroeker
91481a3e4e
Fix declaration of input arguments in inline assembly
...
Argument 0 is modified as it doubles as a counter
2019-02-12 15:51:43 +01:00
Martin Kroeker
dc6ac9eab0
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
...
Arguments 0 and 1 need to be tagged as both input and output
2019-02-12 15:33:48 +01:00
maamountki
f583674109
[ZARCH] Fix cgemv_t_4
2019-02-12 13:12:28 +02:00
maamountki
77fe70019f
[ZARCH] Fix constraints and source code formatting
2019-02-11 16:01:13 +02:00
Martin Kroeker
03a2bf2602
Fix potential memory leak in cpu enumeration on Linux ( #2008 )
...
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
2019-02-10 23:24:45 +01:00
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
...
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
2019-02-07 20:06:13 +01:00
maamountki
7039770165
[ZARCH] Undo the last commit
2019-02-06 20:11:44 +02:00
Martin Kroeker
641767f846
Merge pull request #2001 from martin-frbg/cmake-dynlist
...
Support DYNAMIC_LIST option in cmake
2019-02-06 08:39:24 +01:00
Martin Kroeker
af6e2253a2
Merge pull request #2000 from martin-frbg/issue1989
...
Make c_check robust against old or incomplete perl installations
2019-02-06 00:29:30 +01:00
Martin Kroeker
5952e586ce
Support DYNAMIC_LIST option in cmake
...
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker
f10408aae8
Merge pull request #1999 from martin-frbg/issue1996-2
...
fix second instance of complex.h for c++ as well
2019-02-05 22:02:11 +01:00
Martin Kroeker
d70ae3ab43
Make c_check robust against old or incomplete perl installations
...
by catching and working around failures to load modules, and avoiding object-oriented syntax in tempfile creation.
Fixes #1989
2019-02-05 20:06:34 +01:00
Martin Kroeker
1391fc46d2
fix second instance of complex.h for c++ as well
2019-02-05 19:29:33 +01:00
maamountki
11a43e8116
[ZARCH] Set alignment hint for vl/vst
2019-02-05 19:17:08 +02:00
Martin Kroeker
817fe9865c
Merge pull request #1998 from martin-frbg/issue1992
...
Include complex rather than complex.h in C++ contexts
2019-02-05 17:39:59 +01:00
Martin Kroeker
f4b82d7bc4
Include complex rather than complex.h in C++ contexts
...
to avoid name clashes e.g. with boost headers that use I as a generic placeholder.
Fixes #1992 as suggested by aprokop in that issue ticket.
2019-02-05 13:30:13 +01:00
maamountki
61526480f9
[ZARCH] Fix copy constraint
2019-02-05 07:51:19 +02:00
maamountki
81daf6bc38
[ZARCH] Format source code, Fix constraints
2019-02-05 07:30:38 +02:00
maamountki
a38aa56e76
Merge pull request #1 from xianyi/develop
...
Update
2019-02-05 07:25:38 +02:00
Martin Kroeker
729e925174
Merge pull request #1996 from quickwritereader/develop
...
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 16:52:04 +01:00
Ubuntu
498ac98581
Note for unused kernels
2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 06:57:11 +00:00
Martin Kroeker
f9c5023e04
Merge pull request #1994 from quickwritereader/develop
...
sgemv cgemv pairs
2019-02-01 21:04:47 +01:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Martin Kroeker
874df65491
Fix incorrect sgemv results for IBM z14
...
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker
1f4b61f572
Delete misplaced file sgemv_t_4.c
...
from #1993 , file should have gone into kernel/zarch
2019-02-01 12:57:01 +01:00
Martin Kroeker
282230c303
Merge pull request #1993 from martin-frbg/aarnes-zarch
...
Various fixes for the new Z14 target
2019-01-31 21:27:00 +01:00
Martin Kroeker
cce574c3e0
Improve the z14 SGEMVT kernel
...
from patch provided by aarnez in #991
2019-01-31 21:24:55 +01:00
Martin Kroeker
877023e1e1
Fix precision of zarch DSDOT
...
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker
265142edd5
Fix typo in the zarch min/max kernels
...
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
Martin Kroeker
885a3c4350
USE_TRMM on Z14
...
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
Martin Kroeker
4b512f84dd
Add cache sizes for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:16:44 +01:00
Martin Kroeker
72d3e7c9b4
Add FORCE Z14
...
from patch provided by aarnez in #991
2019-01-31 21:15:50 +01:00
Martin Kroeker
bdc73a49e0
Add parameters for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:14:37 +01:00
Martin Kroeker
1249ee1fd0
Add Z14 target
...
from patch provided by aarnez in #991
2019-01-31 21:13:46 +01:00
Martin Kroeker
42df9efa0c
Merge pull request #1991 from maamountki/z14
...
[ZARCH] Z14 Support, BLAS 1/2 single precision implementations
2019-01-31 19:10:03 +01:00
maamountki
82124729af
Merge branch 'develop' into z14
2019-01-31 19:36:41 +02:00
maamountki
29416cb5a3
[ZARCH] Add Z13 version for max/min functions
2019-01-31 19:11:11 +02:00
maamountki
48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax
2019-01-31 18:52:11 +02:00
Martin Kroeker
86a824c97f
Fix wrong comparison that made IMIN identical to IMAX
...
as reported by aarnez in #1990
2019-01-31 15:27:21 +01:00
Martin Kroeker
808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
...
as suggested in #1990
2019-01-31 15:25:15 +01:00
maamountki
eaf20f0e7a
Remove ztest
2019-01-31 09:26:50 +02:00
maamountki
fcd814a8d2
[ZARCH] Fix bug in max/min functions
2019-01-29 17:59:38 +02:00
maamountki
dc4d3bccd5
[ZARCH] Fix icamax/icamin
2019-01-29 03:47:49 +02:00
maamountki
c7143c1019
[ZARCH] Fix iamax/imax single precision
2019-01-28 17:52:23 +02:00
maamountki
04873bb174
[ZARCH] Undo the last commit
2019-01-28 17:32:24 +02:00
maamountki
c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin
2019-01-28 17:16:18 +02:00
Martin Kroeker
5be61f4b47
Merge pull request #1985 from martin-frbg/issue1984
...
Correct naming of getrf_parallel object
2019-01-28 15:44:57 +01:00
Martin Kroeker
3d155cff83
Merge pull request #1981 from edisongustavo/develop
...
Fix include directory of exported targets
2019-01-28 15:44:42 +01:00
Martin Kroeker
7d47f0a82d
Merge pull request #1978 from danielgindi/feature/msvc_cmake
...
Better support for MSVC/Windows in CMake (v0.3.x)
2019-01-28 15:43:35 +01:00
Martin Kroeker
a529c71a74
Merge pull request #1962 from brada4/r
...
Modrenize R benchmarks slightly
2019-01-28 15:42:57 +01:00
TiborGY
ea1716ce2a
Update Makefile.rule
...
Revert generate to install, explain the nature of the affinity conflict
2019-01-27 17:22:26 +01:00
TiborGY
0f24b39ebf
Reword/expand comments in Makefile.rule
...
Lots of small changes in the wording of the comments, plus an expansion of the NUM_THREADS and NO_AFFINITY sections.
2019-01-27 15:33:00 +01:00
Martin Kroeker
89b60dab8a
Merge pull request #1987 from martin-frbg/issue1961
...
Change ARMV8 target with BINARY=32 to ARMV7 automatically
2019-01-26 22:25:29 +01:00
Martin Kroeker
58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32
2019-01-26 17:52:33 +01:00
Martin Kroeker
36b844af88
Change ARMV8 target to ARMV7 when BINARY32 is set
...
fixes #1961
2019-01-26 17:47:22 +01:00
Martin Kroeker
e882b239aa
Correct naming of getrf_parallel object
...
fixes #1984
2019-01-26 00:45:45 +01:00
Martin Kroeker
3f7bb87a2a
Merge pull request #1971 from martin-frbg/trsm-threshold
...
Shift transition to multithreading towards larger matrix sizes
2019-01-24 09:17:48 +01:00
Edison Gustavo Muenz
e908ac2a51
Fix include directory of exported targets
2019-01-23 15:09:13 +01:00
Martin Kroeker
8533aca964
Avoid penalizing tall skinny matrices
2019-01-23 10:03:00 +01:00
Martin Kroeker
16494cb7c4
Merge pull request #1980 from martin-frbg/issue1979
...
Report SkylakeX as Haswell if compiler does not support AVX512
2019-01-22 21:10:38 +01:00
Martin Kroeker
b56b34a75c
Syntax fix
2019-01-22 18:55:43 +01:00
Martin Kroeker
21eda8b577
Report SkylakeX as Haswell if compiler does not support AVX512
...
... or make was invoked with NO_AVX512=1
2019-01-22 18:47:12 +01:00
Daniel Cohen Gindi
24288803b3
Adjust test script for correct deployment
2019-01-22 14:38:01 +02:00
Martin Kroeker
f0d834b824
Use VERSION_LESS for comparisons involving software version numbers
2019-01-22 12:32:24 +01:00
Daniel Cohen Gindi
63bbd7b0d7
Better support for MSVC/Windows in CMake
2019-01-21 17:47:47 +02:00
maamountki
b111829226
[ZARCH] Update max/min functions
2019-01-21 15:56:04 +02:00
Martin Kroeker
010d59bfee
Merge pull request #1973 from martin-frbg/issue1464
...
Increase Zen SWITCH_RATIO to 16
2019-01-20 20:30:11 +01:00
Martin Kroeker
83b5c6b92d
Fix compilation with NO_AVX=1 set
...
fixes #1974
2019-01-20 12:18:53 +01:00
Martin Kroeker
bbfdd6c0fe
Increase Zen SWITCH_RATIO to 16
...
following GEMM benchmarks on Ryzen2700X. For #1464
2019-01-19 23:01:31 +01:00
Martin Kroeker
cda81cfae0
Shift transition to multithreading towards larger matrix sizes
...
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
2019-01-19 00:10:01 +01:00
Martin Kroeker
32b0f1168e
Fix declaration of input arguments in the Sandybridge GER microkernels ( #1967 )
...
* Tag arguments 0 and 1 as both input and output
2019-01-18 08:11:39 +01:00
Martin Kroeker
b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels ( #1966 )
...
* Tag arguments 0 and 1 as both input and output (see #1964 )
2019-01-18 08:11:07 +01:00
Martin Kroeker
d5e6940253
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY ( #1965 )
...
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292 )
2019-01-17 23:20:32 +01:00
Martin Kroeker
24e697eadb
Merge pull request #1970 from quickwritereader/develop
...
crot fix
2019-01-17 16:42:11 +01:00
Martin Kroeker
3e9fd6359d
Bump xcode version to 10.1 to make sure it handles AVX512
2019-01-17 16:19:03 +01:00
Ubuntu
43a4572038
crot fix
2019-01-17 14:45:31 +00:00
Martin Kroeker
256eb588bb
Merge pull request #1963 from quickwritereader/develop
...
Blas1 single missing kernels implemented with vector builtins
2019-01-16 18:41:03 +01:00
Abdelrauf
a034e65512
Merge branch 'develop' into develop
2019-01-16 19:25:13 +04:00
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
...
Fixed idamin,icamin choosing the first occurance index of equal minimals
2019-01-16 15:16:21 +00:00
Andrew
3e601bd419
disable NaN checks before BLAS calls dgemm.R
2019-01-16 11:54:22 +02:00
Andrew
478d3c4569
disable NaN checks before BLAS calls deig.R (shorten matrix def)
2019-01-16 11:41:46 +02:00
Andrew
3afceb6c2a
disable NaN checks before BLAS calls deig.R
2019-01-16 11:38:14 +02:00
Andrew
7af8b21dbb
disable NaN checks before BLAS calls dsolve.R (shorter formula)
2019-01-16 11:34:46 +02:00
Martin Kroeker
1e3ada6db4
Merge pull request #1960 from cnjsdfcy/Hygon
...
Add support for Hygon Dhyana
2019-01-16 10:27:14 +01:00
Andrew
2777a7f506
disable NaN checks before BLAS calls dsolve.R (shorter config part)
2019-01-16 11:23:51 +02:00
Andrew
b70fd23836
disable NaN checks before BLAS calls dsolve.R
2019-01-16 11:18:54 +02:00
Andrew
def0385caa
init
2019-01-16 09:51:29 +02:00
caiyu
29dc72889f
Add support for Hygon Dhyana
2019-01-16 14:25:19 +08:00
maamountki
b815a04c87
[ZARCH] fix a bug in max/min functions
2019-01-15 21:04:22 +02:00
Martin Kroeker
dbc9a060ef
Fix missing braces in support_av() call
2019-01-14 22:41:31 +01:00
Martin Kroeker
00401489c2
Fix missing braces in support_avx()
2019-01-14 22:38:32 +01:00
maamountki
1a7925b3a3
[ZARCH] Update dgemv_n_4.c
2019-01-11 17:43:11 +02:00
maamountki
406f835f00
[ZARCH] update cgemv_n_4.c
2019-01-11 17:39:17 +02:00
maamountki
621dedb37b
[ZARCH] Update cgemv_t_4.c
2019-01-11 17:37:11 +02:00
maamountki
b731e8246f
Update sgemv_t_4.c
2019-01-11 17:14:04 +02:00
maamountki
ecc31b743f
Update dgemv_t_4.c
2019-01-11 17:13:02 +02:00
maamountki
5d89d6b143
[ZARCH] fix sgemv_n_4.c
2019-01-11 17:08:24 +02:00
maamountki
67432b23c2
[ZARCH] fix cgemv_n_4.c
2019-01-11 16:44:46 +02:00
Martin Kroeker
21c0f2af7b
Merge pull request #1957 from martin-frbg/issue1954
...
Move TLS key deletion to openblas_quit
2019-01-10 12:04:08 +01:00
Martin Kroeker
ad2c386d6a
Move TLS key deletion to openblas_quit
...
fixes #1954 (as suggested by thrasibule in that issue)
2019-01-10 00:32:50 +01:00
maamountki
be66f5d5c2
[ZARCH] fix data prefetch type in sdot
2019-01-09 16:50:07 +02:00
maamountki
c2ffef8156
[ZARCH] fix data prefetch type in ddot
2019-01-09 16:49:44 +02:00
maamountki
e7455f500c
[ZARCH] fix dsdot.c
2019-01-09 16:33:54 +02:00
maamountki
3eafcfa650
[ZARCH] fix cgemv_n_4.c
2019-01-09 07:43:45 +02:00
Martin Kroeker
8d99dba86b
Merge pull request #1949 from martin-frbg/issue1947
...
Query AVX2 and AVX512VL support when selecting x86 kernels
2019-01-08 20:44:08 +01:00
Martin Kroeker
1650311246
Bump xcode to 8.3
2019-01-08 14:43:45 +01:00
Martin Kroeker
cf5d48e833
Update OSX environment to Sierra
...
as homebrew seems to have dropped support for El Capitan in their gcc packages
2019-01-08 14:41:48 +01:00
Martin Kroeker
191677b902
Add travis_wait to the OSX brew install phase
2019-01-08 10:46:47 +01:00
Martin Kroeker
31ed19e8b9
Add message for SkylakeX and KNL fallbacks to Haswell
2019-01-05 19:41:13 +01:00
Martin Kroeker
e1574fa2b4
Add xcr0 (os support) check
2019-01-05 18:08:02 +01:00
Martin Kroeker
68eb3146ce
Add xcr0 (os support) check
2019-01-05 18:07:14 +01:00
Martin Kroeker
0afaae4b23
Query AVX2 and AVX512VL capability in x86 cpu detection
2019-01-05 16:58:56 +01:00
Martin Kroeker
ae1d1f74f7
Query AVX2 and AVX512 capability for runtime cpu selection
2019-01-05 16:55:33 +01:00
maamountki
94cd946b96
[ZARCH] fix cgemv_n_4.c
2019-01-04 17:45:56 +02:00
Martin Kroeker
ed01f4932a
Merge pull request #1946 from martin-frbg/issue1908
...
More fixes for cross-compiling ARM64 targets
2019-01-04 01:37:37 +01:00
maamountki
1aa840a0a2
[ZARCH] fix sgemv_t_4.c
2019-01-04 01:38:18 +02:00
Martin Kroeker
802f0dbde1
More fixes for cross-compiling ARM64 targets
...
Fixed core naming for DYNAMIC_ARCH. Corrected GEMM_DEFAULT entries and added SYMV_P. Replaced outdated VULCAN define for ThunderX2T99 with ARMV8 to get basic definitions back. For issue #1908
2019-01-03 22:17:31 +01:00
Martin Kroeker
20d1aad13f
Fix missing quotes around thunderx targets
2019-01-02 20:15:35 +01:00
TiborGY
d11554c88f
Validate user supplied TARGET ( #1941 )
...
the build will now abort with an error message when an undefined build TARGET is named
Fixes #1938
2018-12-31 23:19:44 +01:00
Martin Kroeker
ed704185ab
Increment version to 0.3.6.dev
2018-12-31 23:11:37 +01:00
Martin Kroeker
2940798ea7
Increment version to 0.3.6.dev
2018-12-31 23:10:59 +01:00
Martin Kroeker
eebc189287
Version 0.3.5
2018-12-31 23:09:59 +01:00
Martin Kroeker
9185d419d3
Version 0.3.5
2018-12-31 23:09:20 +01:00
Martin Kroeker
4cf9d32694
Merge pull request #1945 from xianyi/develop
...
Merge changes from develop for 0.3.5 release
2018-12-31 23:08:25 +01:00
Martin Kroeker
1c75b65d53
Merge branch 'release-0.3.0' into develop
2018-12-31 23:07:53 +01:00
Martin Kroeker
13d006339b
Update ChangeLog.txt with changes from 0.3.5
2018-12-31 23:00:46 +01:00
Martin Kroeker
bf76162635
Merge pull request #1944 from hartzell/patch-1
...
Typo: Skyalke -> Skylake
2018-12-31 18:36:18 +01:00
George Hartzell
0d52aefc6b
Typo: Skyalke -> Skylake
...
Worth fixing, it gets in the way of searching....
2018-12-30 14:55:34 -08:00
Martin Kroeker
a6787b0f81
Merge pull request #1939 from TiborGY/patch-2
...
Fix typo in UNKNOWN core name
2018-12-30 20:10:05 +01:00
Martin Kroeker
8643521127
Merge pull request #1943 from martin-frbg/issue1748
...
Re-enable loop unrolling in trmv and remove the scary warning
2018-12-30 20:07:01 +01:00
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
...
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
2018-12-30 15:22:37 +01:00
Martin Kroeker
ccd5945d38
Merge pull request #1942 from martin-frbg/issue1720
...
Delete the pthread key on cleanup in TLS mode
2018-12-30 14:47:05 +01:00
Martin Kroeker
9f80e0f5fc
Remove stray include of complex.h
...
already provided conditionally by common.h via openblas_utest.h
Unconditional inclusion breaks older Android and similar platforms that use OPENBLAS_COMPLEX_STRUCT
2018-12-30 14:39:18 +01:00
Martin Kroeker
bba1e67269
Delete the pthread key on cleanup in TLS mode
...
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
2018-12-29 21:59:31 +01:00
Martin Kroeker
93240f489e
Fix wrong case in TARGET setting for Alpine
2018-12-29 18:12:54 +01:00
TiborGY
7cbc2c37d6
Update cpuid_mips64.c
2018-12-28 14:36:39 +01:00
TiborGY
c329de2931
Update Makefile
2018-12-28 14:35:41 +01:00
TiborGY
187233953c
Update cpuid_mips.c
2018-12-28 14:34:38 +01:00
TiborGY
09170268a3
Update cpuid_arm.c
2018-12-28 14:33:18 +01:00
TiborGY
211120c508
Fix typo in UNKNOWN core name
...
Should be of no consequence, right?
2018-12-27 23:09:21 +01:00
Martin Kroeker
9e4d190f4f
Merge pull request #1932 from martin-frbg/issue1915
...
Add -fPIC to provided CFLAGS/FFLAGS if required
2018-12-24 23:48:33 +01:00
Martin Kroeker
fe02ba86a4
Remove unnecessary change again
2018-12-24 20:46:04 +01:00
Martin Kroeker
284fb00971
Merge pull request #1934 from fenrus75/betagoof
...
Fix thinko in skylake beta handling
2018-12-24 19:53:50 +01:00
Arjan van de Ven
795285c587
Fix thinko in skylake beta handling
...
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
2018-12-24 18:49:50 +00:00
Martin Kroeker
d6818777d1
Make sure that -fPIC is present if needed
2018-12-23 23:47:37 +01:00
Martin Kroeker
5bd21ab6e1
Make sure that -fPIC is present when needed
...
override user-provided FFLAGS if necessary
2018-12-23 23:46:48 +01:00
Martin Kroeker
e1eab96502
Merge pull request #1931 from martin-frbg/pr1921
...
Add -mavx2 to TARGET=HASWELL builds
2018-12-23 23:15:54 +01:00
Martin Kroeker
76b4b8980f
Use -dumpversion with gcc only
2018-12-23 19:08:19 +01:00
Martin Kroeker
49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it
2018-12-23 17:26:09 +01:00
Martin Kroeker
43c2b0eb55
Add -mavx2 to TARGET=HASWELL builds
...
to leverage improvements from PR#1921
2018-12-23 17:16:43 +01:00
Martin Kroeker
942e229ed5
Merge pull request #1930 from martin-frbg/issue1908
...
Reflect ARMV8 target definition changes from PR1876
2018-12-23 15:06:33 +01:00
Martin Kroeker
26a3402773
Reflect ARMV8 target definition changes from PR1876
...
and create config target directory for cross-compiles.
2018-12-23 12:26:01 +01:00
Martin Kroeker
20033f992a
Merge pull request #1929 from martin-frbg/issue1924
...
Avoid taking the root of a negative number in simple threaded syrk
2018-12-23 09:03:58 +01:00
Martin Kroeker
f343ed65b5
Avoid taking the root of a negative number
...
Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.
2018-12-22 22:30:29 +01:00
Martin Kroeker
a5a1118527
Merge pull request #1 from xianyi/develop
...
rebase
2018-12-22 22:13:44 +01:00
Martin Kroeker
e23366e860
Merge pull request #1921 from fenrus75/haswelldgemm
...
Replicate some of the SKYLAKEX dgemm improvements also to HASWELL
2018-12-17 08:39:20 +01:00
Arjan van de Ven
b28f75cd7e
set GEMM_PREFERED_SIZE for HASWELL
...
Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the
threading code does to make it a nice multiple of the SIMD kernel size
2018-12-16 23:09:27 +00:00
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
...
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
2018-12-16 23:09:22 +00:00
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
...
it's more efficient for certain tall/skinny matrices
2018-12-16 23:09:17 +00:00
Martin Kroeker
e8ca5a59a9
Merge pull request #1919 from fenrus75/haswelltuning
...
(sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL
2018-12-16 20:11:05 +01:00
Martin Kroeker
c4e23dd016
Update Makefile
2018-12-16 18:14:40 +01:00
Martin Kroeker
cfc4acc221
typo
2018-12-16 16:19:51 +01:00
Martin Kroeker
545c2b1bbb
Add -mavx2 on Haswell only if the compiler supports it
2018-12-16 13:09:19 +01:00
Arjan van de Ven
69d206440a
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
2018-12-16 00:19:41 +00:00
Martin Kroeker
3843e3e017
use -maxv2 on haswell
2018-12-15 23:30:31 +01:00
Martin Kroeker
fbcb14a74b
should be core-avx2
2018-12-15 20:18:59 +01:00
Martin Kroeker
2a3190dc76
fix elseifeq and use older option core2-avx for compatibility
2018-12-15 20:17:44 +01:00
Martin Kroeker
1ebe5c0f49
Add -march=haswell to HASWELL part of DYNAMIC_ARCH build
2018-12-15 19:35:35 +01:00
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
...
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
2018-12-15 13:49:19 +00:00
Arjan van de Ven
00dc09ad19
Use the skylake sgemm beta code also for haswell
...
with a few small changes it's possible to use the skylake sgemm code
also for haswell, this gives a modest gain (10% range) for smallish
matrixes but does wonders for very skinny matrixes
2018-12-15 13:49:13 +00:00
Martin Kroeker
78d877b54b
Merge pull request #1914 from fenrus75/smallmatrix
...
Add a "sgemm direct" mode for small matrixes
2018-12-13 19:08:14 +01:00
Arjan van de Ven
cdc668d82b
Add a "sgemm direct" mode for small matrixes
...
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.
This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.
But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.
This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.
What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
Martin Kroeker
87718807f0
Merge pull request #1910 from martin-frbg/issue1909
...
Fix for DYNAMIC_ARCH builds made on a AVX512-capable host
2018-12-12 14:56:25 +01:00
Martin Kroeker
51aec8e96b
make sure the added march=skylake-avx512 does not cause problems on Windows
2018-12-11 22:47:32 +01:00
Martin Kroeker
06f7d78d70
Add -march=skylake-avx512 to SkylakeX part of DYNAMIC_ARCH builds
2018-12-11 21:10:38 +01:00
Martin Kroeker
38cc638591
Avoid adding blanket march=skylake-avx512 to dynamic_arch builds
2018-12-11 21:09:26 +01:00
Martin Kroeker
0bf6d74e5f
Fix typo in previous commit for arm dynamic arch
2018-12-07 19:37:33 +01:00
Martin Kroeker
133c278ee5
Add DYNAMIC_CORE list for ARM64
...
cf #1908
2018-12-07 17:42:23 +01:00
Martin Kroeker
2b355592e3
Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH
...
cf. #1908
2018-12-07 16:25:55 +01:00
Martin Kroeker
ff3eb1d474
Merge pull request #1904 from martin-frbg/issue1870
...
Fix cmake parsing of GEMM kernels for ARMV8
2018-12-06 23:01:23 +01:00
Martin Kroeker
0b09516678
Fix missing parameter in popen call
2018-12-06 18:33:05 +01:00
Martin Kroeker
7639f2e1f0
Rewrite the conditional for OSX to fix cmake parsing on others
...
The Makefile variable parser in utils.cmake currently does not handle conditionals. Having the definitions for non-OSX last will at least make cmake builds work again on non-OSX platforms.
2018-12-06 14:04:27 +01:00
Martin Kroeker
2fc712469d
Avoid creating spurious non-suffixed c/zgemm_kernels
...
Plain cgemm_kernel and zgemm_kernel are not used anywhere, only cgemm_kernel_b etc.
Needlessly building them (without any define like NN, CN, etc.) just happened to work on most platforms, but not on arm64. See #1870
2018-12-06 13:56:06 +01:00
Martin Kroeker
6ba30e270d
Fix typo that broke CNRM2 on ARMV8 since 0.3.0
...
must have happened in my #1449
2018-12-06 13:42:25 +01:00
Martin Kroeker
bf23518e36
Merge pull request #1903 from rengolin/armv8
...
Fix two mistakes on Arm64 builds
2018-12-05 22:10:53 +01:00
Renato Golin
31a490ea88
Fix two mistakes on Arm64 builds
...
* Falkor is an ARMv8.0 with ARMv8.1 features, and chosing armv8.1-a for
march generates instructions it cannot cope with. Reverting it back
to armv8-a.
* ThunderX2's build was left with a #define VULCAN, which made it miss
the right compiler flags in Makefile.arm64, although it did create
the right library in the end.
2018-12-05 18:51:38 +00:00
Martin Kroeker
701ea88347
Use p2align instead of align for OSX compatibility
...
fixes #1902
2018-12-03 13:06:43 +01:00
Martin Kroeker
721c56c224
Merge pull request #1899 from brada4/fbsd12
...
Add mutually supported architecture mappings for FreeBSD12 ports
2018-12-03 12:50:27 +01:00
Martin Kroeker
c5f8aeff2d
Merge branch 'develop' into fbsd12
2018-12-03 12:50:14 +01:00
Martin Kroeker
8278cbe7f8
Merge pull request #1894 from pkubaj/patch-2
...
Use correct ARCH name on BSD powerpc64
2018-12-03 12:48:53 +01:00
Martin Kroeker
ea6d1b96bd
Update Makefile.system
2018-12-03 08:59:10 +01:00
Martin Kroeker
360374be62
Update with the changes from 0.3.4
2018-12-02 23:44:13 +01:00
Martin Kroeker
f5acaad8f0
Increment version to 0.3.5.dev
2018-12-02 23:43:15 +01:00
Martin Kroeker
93fa6b7b76
Increment version to 0.3.5.dev
2018-12-02 23:42:33 +01:00
Martin Kroeker
c0827a7164
Update with changes from 0.3.4
2018-12-02 23:41:17 +01:00
Martin Kroeker
86cff4effc
Merge pull request #1900 from xianyi/develop
...
Update from develop for 0.3.4
2018-12-02 23:40:21 +01:00
Martin Kroeker
b028960aba
Merge branch 'release-0.3.0' into develop
2018-12-02 23:38:49 +01:00
Martin Kroeker
3c9e3faedb
fixup BSD naming of powerpc arch
2018-12-02 23:24:53 +01:00
Andrew
44c81fd135
oops
2018-12-02 20:27:53 +01:00
Andrew
26b3710485
Add architecture mappings for FreeBSD12
2018-12-02 12:07:41 +01:00
Andrew
84e614d0fd
init
2018-12-02 12:05:15 +01:00
Martin Kroeker
dceff5542c
Handle Android environments that identify as Linux ( #1898 )
...
* Handle Android environments that identify as Linux
termux terminal emulator does this, causing build failures through missed defines in common.h
2018-12-01 20:56:11 +01:00
Martin Kroeker
6c7b691083
Really revert xDOT changes from 1832
...
neglected to rebase #1892 on merging
2018-11-30 21:32:01 +01:00
Martin Kroeker
5f4c550c27
Merge pull request #1892 from martin-frbg/mipsdot
...
revert MIPS64 xDOT kernel changes from #1832
2018-11-30 21:28:21 +01:00
pkubaj
731b2722ba
Fix build on POWER, remove DragonFly, add NetBSD
...
__asm is complete on its own
DBSD developers state they will only support amd64, but NetBSD supports POWER.
2018-11-30 21:12:05 +01:00
pkubaj
f85ce54d4a
Use correct Makefile on powerpc64
...
FreeBSD uses powerpc64 name for POWER architecture. Use correct Makefile for this platform.
2018-11-30 16:05:49 +00:00
Andrew
2601cd58ab
remove surplus locking code , only enabled w x86, disabled or never enabled on all others
2018-11-30 11:38:19 +01:00
Martin Kroeker
95a5542e3c
Revert DOT kernel changes from #1834
...
as the failures seen on Loongson3A appear to be limited to DSDOT/SDSDOT (i.e. my hackish "fix" from #1684 )
2018-11-30 11:16:24 +01:00
Martin Kroeker
7a2e1bc804
Use generic kernel for DSDOT/SDSDOT
...
as discussed in #1834
2018-11-30 10:57:09 +01:00
Martin Kroeker
35653e38b3
Merge pull request #1834 from fengrl/develop
...
register push/pop command change
2018-11-30 10:48:46 +01:00
Martin Kroeker
71e25ae42f
Merge pull request #1890 from martin-frbg/issue1889
...
Include version number in openblas_get_config output
2018-11-29 15:47:35 +01:00
Martin Kroeker
97d7298973
call it OpenBLAS not just version
2018-11-29 11:52:08 +01:00
Martin Kroeker
de0d0ed52f
Improve formatting of config output
2018-11-29 11:28:19 +01:00
Martin Kroeker
081ceb3e02
Propagate version number for openblas_get_config
2018-11-29 00:12:04 +01:00
Martin Kroeker
a29ec458c2
propagate verison number for openblas_config_version
2018-11-29 00:10:49 +01:00
Martin Kroeker
816775e309
Add version information to openblas_get_config output
2018-11-29 00:06:44 +01:00
Martin Kroeker
b6363f4539
Merge pull request #1885 from brada4/freebsd
...
Fix freebsd clang compilation of skylakex
2018-11-25 22:20:13 +01:00
Andrew
19c4bdd8b3
Add return value so that freebsd system clang does not err out
2018-11-25 21:35:01 +01:00
Andrew
f049a4c84f
init
2018-11-25 21:34:09 +01:00
Martin Kroeker
f72fdf525c
Merge pull request #1875 from martin-frbg/issue1851
...
Serialize accesses to parallelized level3 functions from multiple cal…
2018-11-25 20:53:46 +01:00
Martin Kroeker
5393759a98
Merge pull request #1869 from martin-frbg/axpy0
...
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-25 20:52:49 +01:00
Martin Kroeker
5cf18e2875
Merge pull request #1878 from kiwifb/PGI_f_check
...
Correct link flags for PGI compiler.
2018-11-25 20:51:50 +01:00
Martin Kroeker
910050985a
Merge pull request #1876 from rengolin/armv8-cleanup
...
Simplifying ARMv8 build parameters
2018-11-25 20:51:24 +01:00
François Bissey
0184713e1a
Correct link flags for PGI compiler.
2018-11-21 14:24:56 +13:00
Martin Kroeker
45c3c459e1
Merge pull request #1868 from martin-frbg/aix_cpuid
...
Use prtconf to determine CPU type on AIX
2018-11-20 17:25:57 +01:00
Martin Kroeker
113cb00b95
fix missing parenthesis
2018-11-19 21:01:36 +01:00
Martin Kroeker
5192651706
Add CriticalSection handling instead of mutexes for Windows
2018-11-19 17:58:22 +01:00
Renato Golin
310ea55f29
Simplifying ARMv8 build parameters
...
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
2018-11-19 16:41:49 +00:00
Martin Kroeker
2e6fae2aad
Serialize accesses to parallelized level3 functions from multiple callers
...
for #1851
2018-11-19 14:02:50 +01:00
Martin Kroeker
368d14f8c8
Fix harmless typo
...
fixes #1872
2018-11-16 14:58:28 +01:00
Martin Kroeker
42bc2a9202
Fix copy-paste errors (POWER8/9 and extraneous return)
2018-11-16 12:10:44 +01:00
fengruilin
43bb386b10
fix dot problem on 64bit mips
2018-11-15 11:11:59 +08:00
Martin Kroeker
c171b8ad13
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-13 13:57:18 +01:00
Martin Kroeker
2f04cf22ac
Detect POWER9 as POWER8 on AIX and Linux
...
(already supported by the *BSD version)
2018-11-13 08:16:14 +01:00
Martin Kroeker
807f6e6922
Use prtconf to determine CPU type on AIX
...
for #1803
2018-11-12 18:52:29 +01:00
Martin Kroeker
ecbeb802a0
Merge pull request #1865 from martin-frbg/issue1844
...
Optimize gemv for small M, large N only if it can be done in a threadsafe manner
2018-11-12 17:30:44 +01:00
Martin Kroeker
2c5725cc39
Merge pull request #1864 from aytekinar/patch-1
...
Add ARM tests on Travis
2018-11-12 14:30:28 +01:00
Arda Aytekin
e3666931d8
Update .travis.yml
...
Updated `.travis.yml` file to add emulated tests for `ARMV6` and `ARMV8`
architectures with `gcc` and `clang`. Created prebuilt images with
required dependencies. Squashed layers into one.
2018-11-11 20:50:38 +01:00
Martin Kroeker
ae02a57261
Merge pull request #1866 from martin-frbg/issue1859
...
Fix argument in SLASET call to zero S
2018-11-10 19:23:31 +01:00
Martin Kroeker
a6a52a73f7
Fix argument in SLASET call to zero S
...
fixes #1859 in accordance with https://github.com/LAPACK-Reference/issue/296
2018-11-10 17:16:53 +01:00
Martin Kroeker
0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
...
otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
2018-11-10 15:45:54 +01:00
Martin Kroeker
4f43668eec
Merge pull request #2 from xianyi/develop
...
merge develop
2018-11-10 15:37:25 +01:00
Martin Kroeker
b0c15bacc1
Merge pull request #1863 from martin-frbg/aix_install3
...
Set LIBSONAME suffix to .a for AIX
2018-11-09 13:12:06 +01:00
Martin Kroeker
cfb0f5b0f8
Set LIBSONAME suffix to .a for AIX
...
another fix for #1803
2018-11-08 22:39:10 +01:00
Martin Kroeker
667fed579d
Merge pull request #1856 from rengolin/armv8-a57
...
[Arm64) Revert A53 detection as A57
2018-11-07 21:01:29 +01:00
Martin Kroeker
96d2f2c9b2
Merge pull request #1831 from brada4/hemv
...
disable threading in C/ZSWAP copying from S/DSWAP
2018-11-07 08:49:21 +01:00
Martin Kroeker
653e657a58
Merge pull request #1857 from brada4/fc-1847
...
Add gfortran -frecursive option from upstream and #1847
2018-11-07 08:48:31 +01:00
Martin Kroeker
5f8f0583d4
Merge branch 'develop' into fc-1847
2018-11-07 08:47:52 +01:00
Martin Kroeker
974a6a30f2
Merge pull request #1858 from brada4/buff-1847
...
Add minimum threshold for number of buffers
2018-11-07 08:46:55 +01:00
Andrew
9531d0e175
lets fit it in one 4k page
2018-11-06 17:51:24 +00:00
Andrew
40cce0e353
handle cmake too
2018-11-06 09:45:49 +00:00
Andrew
3fd41313fc
add low bound for number of buffers
2018-11-06 09:40:13 +00:00
Andrew
a931afe269
init
2018-11-06 09:39:05 +00:00
Andrew
7d3502b500
Add -frecursive gfortran option by default
2018-11-06 08:20:55 +00:00
Andrew
066f8065d1
init
2018-11-06 08:19:08 +00:00
Renato Golin
fb5b2177ca
[Arm64) Revert A53 detection as A57
...
This patch reverts the decision of treating A53 like A57, which was
based on an analysis done on server class hardware and is not
representative of all A53s out there.
Fixes #1855 .
2018-11-05 11:34:49 +00:00
Martin Kroeker
f1c02273cb
Merge pull request #1846 from fenrus75/threadsize
...
gemm/dgemm: add a way for an arch kernel to specify preferred sizes
2018-11-02 13:18:01 +01:00
Martin Kroeker
661035477c
Merge pull request #1850 from martin-frbg/issue1811
...
Restore Android/ARMv7 build fix from #778
2018-11-02 09:50:51 +01:00
Martin Kroeker
aa7e47aa0a
Merge pull request #1849 from martin-frbg/aix_install2
...
Use installbsd on AIX
2018-11-01 20:39:16 +01:00
Martin Kroeker
9c177d270b
Restore Android/ARMv7 build fix from #778
...
for #1811
2018-11-01 18:50:25 +01:00
Martin Kroeker
b025523197
Use installbsd on AIX
...
(and fix misplaced parenthesis from previous commit). See #1803
2018-11-01 18:26:08 +01:00
Martin Kroeker
5b50bd36f7
Merge pull request #1845 from martin-frbg/aix_install
...
Accomodate AIX install, which has different syntax
2018-11-01 09:53:10 +01:00
Arjan van de Ven
5b708e5eb1
sgemm/dgemm: add a way for an arch kernel to specify prefered sizes
...
The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.
this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.
The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)
2018-11-01 01:43:20 +00:00
Arjan van de Ven
dcc5d6291e
skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case
...
in the threading code there are cases where N or M can become 0,
and the optimized beta code did not handle this well, leading
to a crash
during the audit for the crash a few edge conditions on the if statements
were found and fixed as well
2018-11-01 01:42:09 +00:00
Martin Kroeker
7b5aea52bb
Accomodate AIX install, which has different syntax
...
for #1803
2018-10-31 21:50:34 +01:00
Martin Kroeker
f5595d0262
Merge pull request #1843 from martin-frbg/aix_numprocs
...
Add get_num_procs implementation for AIX
2018-10-31 21:25:15 +01:00
Martin Kroeker
326d394a0f
Add get_num_procs implementation for AIX
...
(and copy HAIKU implementation to the non-TLS version of the code as well)
2018-10-31 18:38:22 +01:00
Martin Kroeker
6af8e35a24
Merge pull request #1837 from embray/set-num-thread-after-fork
...
Ensure that blas_thread_init has been called in openblas_set_num_threads
2018-10-30 12:41:24 +01:00
Erik M. Bray
38cf5d9364
ensure that threading has been initialized in the first place before calling openblas_set_num_threads
2018-10-28 21:16:52 +00:00
Martin Kroeker
8a43baacb2
Merge pull request #1836 from martin-frbg/zen2core
...
Fix detection of Ryzen2 (missing CORE_ZEN)
2018-10-28 20:00:01 +01:00
Martin Kroeker
64ca44873b
Fix detection of Ryzen2 (missing CORE_ZEN)
2018-10-28 18:36:55 +01:00
fengrl
2d8064174c
register push/pop command change
...
64bit push/pop register command should be used. Otherwise, data will lost.
2018-10-26 17:55:15 +08:00
Martin Kroeker
76a66eaac8
Merge pull request #1829 from ashwinyes/develop_aarch64_dynamic_arch_support
...
Add DYNAMIC_ARCH support for ARM64
2018-10-23 18:14:28 +02:00
Andrew
2992e3886a
disable threading in C/ZSWAP copying from S/DSWAP
2018-10-22 23:21:49 +03:00
Ashwin Sekhar T K
d5aeff636f
ARM64: Enable DYNAMIC_ARCH
...
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt ).
If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K
af2837c392
ARM64: Remove #define ARMV8 for THUNDERX
2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K
e7b66cd36e
ARM64: Fix DYNAMIC_ARCH compilation for cores which dont use GEMM3M
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
...
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K
351a0c777c
ARM64: Remove XGENE1 references
...
Remove XGENE1 target as the implementation for the
same is incomplete. Moreover whoever wishes to use
on XGENE1 can use the generic ARMV8 target as there
are no XGENE1 specific optimizations in OpenBLAS.
2018-10-22 01:45:51 -07:00
Martin Kroeker
e3c262e5cf
Merge pull request #1825 from brada4/hemv
...
Delay _hemv threading in attempt to address #1820
2018-10-21 20:34:05 +02:00
Andrew
a293bdcd5e
re-arrange new code for readability
2018-10-20 21:37:53 +03:00
Andrew
c7bbf9c987
Attempt to tame _hemv threading #1820
2018-10-20 11:13:29 +03:00
Andrew
898a8dcaba
init
2018-10-20 10:55:04 +03:00
Martin Kroeker
71c6deed60
Merge pull request #1821 from ashwinyes/develop_aarch64_armv8neonkernels
...
Use ThunderX2 Neon Kernels for ARMV8 Target
2018-10-18 08:13:05 +02:00
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Ashwin Sekhar T K
caf339412f
ARM64: Remove dependency of THUNDERX2T99 Makefile on CORTEXA57 Makefile
2018-10-17 08:02:40 -07:00
Ashwin Sekhar T K
8001fdcd2a
ARM64: Remove dependency of THUNDERX Makefile on ARMV8 Makefile
2018-10-17 08:02:16 -07:00
Ashwin Sekhar T K
162e312832
ARM64: Remove dependency of CORTEXA57 Makefile on ARMV8 Makefile
2018-10-17 08:01:45 -07:00
Ashwin Sekhar T K
c3d93caa8d
ARM64: Remove dependency of XGENE1 Makefile on ARMV8 Makefile
2018-10-17 08:01:27 -07:00
Martin Kroeker
a71923514f
Merge pull request #1815 from fenrus75/sgemm_beta_fix
...
enable the SGEMM/SKX C based kernel
2018-10-14 19:57:34 +02:00
Arjan van de Ven
55b244ca0d
enable the SGEMM/SKX C based kernel
...
In QA the final bug was found so now the sklyakex sgemm C based kernel can
be activated....
2018-10-12 09:30:35 +00:00
Martin Kroeker
2263d3906c
Merge pull request #1812 from martin-frbg/issue1806-2
...
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake…
2018-10-11 21:51:31 +02:00
Martin Kroeker
81c9985c3a
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512
2018-10-11 11:03:27 +02:00
Martin Kroeker
56ebc7b53e
Merge pull request #1808 from martin-frbg/issue1806
...
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
2018-10-11 07:48:08 +02:00
Martin Kroeker
c5f88f5a57
Merge pull request #1807 from xianyi/revert-1798-cmake-avx512
...
Revert "Add -march=skylake-avx512 when required"
2018-10-11 07:47:53 +02:00
Martin Kroeker
8a11ec19d1
Syntax fix
2018-10-10 23:47:35 +02:00
Martin Kroeker
fa53b903db
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
...
Should fix 1806 and #1801
2018-10-10 19:22:01 +02:00
Martin Kroeker
84bcdf9c66
Revert "Add -march=skylake-avx512 when required"
2018-10-10 19:15:32 +02:00
Martin Kroeker
8f7e986184
Merge pull request #1802 from martin-frbg/issue1801
...
Use avx512 workaround with msys2/mingw64 as well
2018-10-10 08:52:53 +02:00
Martin Kroeker
d0e83666ad
Merge pull request #1804 from fenrus75/sgemm
...
Add a C+intrinsics version of the SGEMM/skylakex kernel
2018-10-10 08:50:44 +02:00
Arjan van de Ven
d4bad73834
Add a C+intrinsics version of the SGEMM/skylakex kernel
...
for most sizes this is 1.2x to 1.4x faster than the current code
2018-10-10 01:49:22 +00:00
Martin Kroeker
065763adde
Merge pull request #1800 from fengrl/patch-1
...
Update common_mips64.h for the 1st loop of blas_memory_alloc
2018-10-09 10:56:37 +02:00
Martin Kroeker
210b03b543
Merge pull request #1792 from martin-frbg/cmakesuffix
...
Improve CMake help output and add SYMBOLPREFIX and -SUFFIX options
2018-10-09 10:34:52 +02:00
Martin Kroeker
6234a32656
Use cygwin compilation workaround for avx512 on msys2/mingw64 as well
2018-10-09 10:31:59 +02:00
Martin Kroeker
c0d7cd3dac
Merge pull request #1799 from martin-frbg/issue1796
...
Handle conflicting usage of ARCH in at least some BSD environments
2018-10-09 08:20:52 +02:00
Martin Kroeker
667f0cc1cb
Merge pull request #1793 from fenrus75/ncopy
...
Add optimized *copy versions for skylakex
2018-10-09 08:19:14 +02:00
fengrl
d4c8853a02
Update common_mips64.h
2018-10-09 11:20:16 +08:00
Martin Kroeker
d3d58f8ee5
Catch conflicting usage of ARCH in at least some BSD environments
...
fixes #1796
2018-10-08 22:29:35 +02:00
Martin Kroeker
697dc1baf8
Use override for ARCH in make.inc
...
in case a conflicting setting of ARCH (for architecture) gets pulled in from the environment
(originally suggested by dloghin in #1753 )
2018-10-08 22:26:59 +02:00
Martin Kroeker
a9b51b8448
Merge pull request #1798 from martin-frbg/cmake-avx512
...
Add -march=skylake-avx512 when required
2018-10-08 21:15:17 +02:00
Martin Kroeker
eba394c711
Add -march=skylake-avx512 when required
...
fixes #1797
2018-10-08 19:18:12 +02:00
Arjan van de Ven
582c589727
dgemm/skylakex: replace discrete mul/add with fma
...
very minor gains since it's not super hot code, but general principles
2018-10-06 23:13:26 +00:00
Arjan van de Ven
adbf6afa25
Add vector optimizations for ncopy as well for dgemm/skylakex
2018-10-06 21:18:12 +00:00
Arjan van de Ven
32bec8afbb
add a skylakex optimized dgemm beta function
2018-10-06 16:36:26 +00:00
Martin Kroeker
6e2c494556
Merge pull request #1791 from dev-zero/develop
...
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
2018-10-06 16:29:29 +02:00
Arjan van de Ven
20c5d668fe
dgemm/avx512 simplify and speed up the 4x4 kernel
2018-10-06 14:12:32 +00:00
Arjan van de Ven
6d43c51ccf
undo slow dgemm/skylake microoptimization
...
the compare is more costly than the work
2018-10-06 14:00:37 +00:00
Arjan van de Ven
d74dc39b0f
Add optimized *copy versions for skylakex
...
Add optimized n/t copy versions for skylakex; in the patch the
tcopy is also rewritten using intrinsics; the ncopy file
will be worked on in a future commit
2018-10-06 13:51:44 +00:00
Martin Kroeker
41951da6d4
Merge pull request #6 from xianyi/develop
...
merge develop
2018-10-06 14:36:36 +02:00
Martin Kroeker
474f7e9583
Add SYMBOLPREFIX and -SUFFIX options and improve help output
2018-10-06 14:28:04 +02:00
Tiziano Müller
79ea839b63
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
...
The problem is that OpenBLAS sets the LAPACKE_LIB and the TMGLIB to the
same object and uses the `ar` feature to update the archive file. If the
underlying filesystem does not have sub-second timestamp resolution and
the system is fast enough (or `ccache` is used), the timestamp of the
builds which should be added to the previously generated archive is the
same as the archive file itself and therefore `make` does not update the
archive.
Since OpenBLAS takes care to not run the different targets updating the
archive in parallel, the easiest solution is to declare the respective
targets `.PHONY`, forcing `make` to always update them.
fixes #1682
2018-10-06 14:10:05 +02:00
Martin Kroeker
f7f97c6148
Merge pull request #1789 from brada4/develop
...
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 20:42:37 +02:00
Martin Kroeker
6f22e1cfb8
Merge pull request #1788 from fenrus75/avx512-8x16
...
skylake dgemm: Add a 16x8 kernel
2018-10-05 20:40:38 +02:00
Arjan van de Ven
66b43affbc
Add a 24x8 kernel to the skylakex dgemm implementation
...
Minor gains for small matrixes, but at 512x512 and above the gain
gets more significant.
2018-10-05 13:22:21 +00:00
Arjan van de Ven
1938819c25
skylake dgemm: Add a 16x8 kernel
...
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
2018-10-05 13:11:35 +00:00
Andrew
bda3dbe2eb
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 15:47:55 +03:00
Andrew
c3e0f0eb38
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 15:41:52 +03:00
Martin Kroeker
a980953bd7
Merge pull request #1785 from brada4/develop
...
address #1782 2nd loop
2018-10-05 08:25:38 +02:00
Martin Kroeker
78c99d5231
Merge pull request #1784 from fenrus75/dgemm-avx512
...
Create a AVX512 enabled version of DGEMM
2018-10-05 08:03:27 +02:00
Martin Kroeker
b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch
2018-10-04 19:14:59 +02:00
Martin Kroeker
95f4e87579
Merge pull request #1787 from jeromerobert/develop
...
Fix unknown type name __WAIT_STATUS on RHEL5
2018-10-04 18:41:47 +02:00
Jerome Robert
b095f2fad6
Fix unknown type name __WAIT_STATUS on RHEL5
...
With glibc 2.5 one must have #define _XOPEN_SOURCE >= 500 to use wait.
But reading glibc code this is actually needed only if stdlib.h was
included before sys/wait.h. This was the case here through
openblas_utest.h. So changing include fix compilation on RHEL5 and
should ne hurt with more recent distro.
* Problem found when using with gcc 5.5 and 4.7.2 on RHEL5/CENTOS5
* Fix #1519
2018-10-04 14:37:08 +02:00
Martin Kroeker
02ef20a1e4
Merge pull request #1786 from martin-frbg/immintrin
...
Check for Immintrin.h presence in the AVX512 compatibility test as well
2018-10-04 09:07:09 +02:00
Martin Kroeker
4c3643ed7f
Check availability of immintrin.h in the AVX512 compatibility test
2018-10-04 07:36:49 +02:00
Martin Kroeker
591cca7cb0
Check availability of immintrin.h in the AVX512 compatibility test
2018-10-04 07:35:30 +02:00
Andrew
3439158dea
address #1782 2nd loop
2018-10-03 21:20:50 +02:00
Arjan van de Ven
45fe8cb0c5
Create a AVX512 enabled version of DGEMM
...
This patch adds dgemm_kernel_4x8_skylakex.c which is
* dgemm_kernel_4x8_haswell.s converted to C + intrinsics
* 8x8 support added
* 8x8 kernel implemented using AVX512
Performance is a work in progress, but already shows a 10% - 20%
increase for a wide range of matrix sizes.
2018-10-03 14:45:25 +00:00
Martin Kroeker
544b069e85
Merge pull request #1780 from martin-frbg/issue1774-2
...
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-29 09:27:47 +02:00
Martin Kroeker
9b2a7ad40d
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
second part of fix for #1774 , containing files missed in #1775
2018-09-28 23:05:15 +02:00
Martin Kroeker
10ce70701a
Merge pull request #1778 from fengrl/develop
...
test_axpy work error on LOONGSON3A platform #1777
2018-09-26 11:14:58 +02:00
fengruilin
6fc85a6359
test_axpy work error on LOONGSON3A platform #1777
2018-09-26 15:14:04 +08:00
Martin Kroeker
831c661386
Merge pull request #1775 from martin-frbg/issue1774
...
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-25 18:58:39 +02:00
Martin Kroeker
7e5df34e6a
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
fixes #1774
2018-09-25 09:41:58 +02:00
Martin Kroeker
4f45040b89
Merge pull request #1773 from martin-frbg/issue1767
...
Include thread numbers in failure message from blas_thread_init
2018-09-23 23:25:15 +02:00
Martin Kroeker
28aa94bf4b
Include thread numbers in failure message from blas_thread_init
...
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker
56e7c68810
Merge pull request #1771 from staticfloat/sf/ldflags
...
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
2018-09-22 13:11:39 +02:00
Martin Kroeker
cf6df9464c
Document the stub status of the QUAD_PRECiSION code ( #1772 )
...
* Document the stub status of the QUAD_PRECiSION code inherited from GotoBLAS2
in response to #1769
2018-09-22 12:31:37 +02:00
Elliot Saba
6f77af2eef
Add $(LDFLAGS) to $(CC) and $(FC) invocations within exports/Makefile
2018-09-21 09:19:51 +00:00
Martin Kroeker
4d183e5567
Merge pull request #1765 from martin-frbg/issue1761
...
Do not use the new TLS-enabled memory allocator for non-threaded builds, and disable TLS by default in gmake as well
2018-09-19 22:02:21 +02:00
Martin Kroeker
34d55fd165
Merge pull request #1764 from yurivict/64-suffix
...
Allow to install the 'interface64' version concurrently with the regular version
2018-09-19 18:16:38 +02:00
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
...
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
288aeea8a2
Fix default settings - USE_TLS and USE_SIMPLE_THREADED_LEVEL3 should both be off
2018-09-19 18:08:31 +02:00
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
...
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
...
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
ec0cac1669
Merge pull request #4 from xianyi/develop
...
Update branch
2018-09-16 12:36:49 +02:00
Yuri
2349e15149
Allow to install the 'interfare64' version concurrently with the regular version
2018-09-15 21:00:03 -07:00
Martin Kroeker
f3c262156e
Add an explicit cast to silence a warning
...
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8
Add explicit cast to silence a warning
...
for #1710
2018-09-13 14:23:31 +02:00
Martin Kroeker
fd081a91e4
Merge pull request #1759 from martin-frbg/lapack283
...
Remove an unused variable from several LAPACKE 2stage_work functions
2018-09-11 13:52:09 +02:00
Martin Kroeker
094f8c3b57
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:53:47 +02:00
Martin Kroeker
5cf090f516
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:52:30 +02:00
Martin Kroeker
58363542e7
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:51:17 +02:00
Martin Kroeker
3abc22a5bf
Merge pull request #1757 from brada4/develop
...
fix small typo in strmm_ LN
2018-09-09 22:55:15 +02:00
Andrew
1e531701b7
fix small typo
2018-09-09 16:52:25 +02:00
Martin Kroeker
5d42b6ea04
Merge pull request #1756 from martin-frbg/issue1754
...
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
2018-09-07 11:02:18 +02:00
Martin Kroeker
ba4f433321
Merge pull request #1749 from martin-frbg/issue1531
...
Fix ARMV8 cross-compilation for IOS
2018-09-07 11:02:01 +02:00
Martin Kroeker
4cf7315a5d
Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS
2018-09-06 21:41:54 +02:00
Martin Kroeker
b57af93792
just make CBLAS_LAYOUT an alias of the existing CBLAS_ORDER
...
to avoid having to change all instances of enum CBLAS_ORDER in this file
2018-09-06 16:54:31 +02:00
Martin Kroeker
8aeab0601e
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
...
fixes #1754
2018-09-06 16:39:52 +02:00
Martin Kroeker
1cb7b9015e
Conditional compilation of assembly files that IOS does not like
2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2
Fix paths to C kernels for nrm2
2018-09-04 10:51:19 +02:00
Martin Kroeker
9e2bb0c641
Update with the changes from 0.3.3
2018-08-31 00:21:13 +02:00
Martin Kroeker
dbfd7524cd
Update version to 0.3.4.dev
2018-08-31 00:19:21 +02:00
Martin Kroeker
2982ce505d
Update version to 0.3.4.dev
2018-08-31 00:18:37 +02:00
Martin Kroeker
fd8d1868a1
Updates for 0.3.3
2018-08-31 00:07:48 +02:00
Martin Kroeker
f0563f14ba
Version 0.3.3
2018-08-30 23:43:57 +02:00
Martin Kroeker
3197f86762
Version 0.3.3
2018-08-30 23:43:14 +02:00
Martin Kroeker
422a8fa953
Merge pull request #1747 from xianyi/develop
...
Merge develop into 0.3.x for 0.3.3
2018-08-30 23:42:19 +02:00
Martin Kroeker
5bac15adbd
Merge pull request #1746 from martin-frbg/issue1674
...
Assume cross-compilation if host and target os differ
2018-08-30 17:48:07 +02:00
Martin Kroeker
e17f969fa0
Assume cross-compilation if host and target os differ
...
fixes 1674
2018-08-30 13:28:46 +02:00
Martin Kroeker
e11126b26a
Merge pull request #1745 from martin-frbg/issue1743
...
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…
2018-08-29 07:43:58 +02:00
Martin Kroeker
74608e470d
Merge pull request #1744 from martin-frbg/lapack272
...
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
2018-08-28 22:58:58 +02:00
Martin Kroeker
f3fd44a731
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
...
fixes #1743
2018-08-28 21:34:07 +02:00
Martin Kroeker
9e917b16db
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
...
This could cause spurious "parameter has an illegal value" errors in DSYEVR and related routines, see https://github.com/Reference-LAPACK/lapack/issues/262
2018-08-28 21:11:54 +02:00
Martin Kroeker
8440a4cb1a
Merge pull request #1742 from martin-frbg/interim033
...
Add combination of old and new thread memory code selectable by new option USE_TLS
2018-08-28 08:02:15 +02:00
Martin Kroeker
b55690a659
typo fix
2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986
Rewrite glibc version check
2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd
Update memory.c
2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434
Merge branch 'develop' into interim033
2018-08-25 19:45:19 +02:00
Martin Kroeker
2caa2210bb
Add USE_TLS option to choose between old and new implementation of memory.c
2018-08-25 19:37:11 +02:00
Martin Kroeker
2a589c4b28
Add USE_TLS option to switch between old and new memory.c
2018-08-25 19:36:12 +02:00
Martin Kroeker
fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739
2018-08-25 19:35:16 +02:00
Martin Kroeker
52d3f7af50
Merge pull request #1738 from sharkcz/s390x
...
detect z14 arch on s390x
2018-08-16 09:46:34 +02:00
Dan Horák
5c6e020f49
detect z14 arch on s390x
2018-08-14 12:30:38 +02:00
maamountki
e6c0e39492
Optimize Zgemv
2018-08-13 12:23:40 +03:00
Martin Kroeker
d4d3113adc
Merge pull request #1731 from fenrus75/readme
...
add short blurb about avx512 and needed compiler to README
2018-08-13 00:01:37 +02:00
Martin Kroeker
375dff54fc
Merge pull request #1733 from fenrus75/dsymv
...
Add an AVX512 enabled DSYMV (L) function
2018-08-12 18:18:36 +02:00
Martin Kroeker
a5f165275a
Merge pull request #1732 from fenrus75/dgemv
...
Add an AVX512 enabled DGEMV (n) function
2018-08-12 18:17:42 +02:00
Martin Kroeker
8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot
...
Fix typo in sdot function
2018-08-12 18:17:01 +02:00
Martin Kroeker
1ee6d087c3
Merge pull request #1729 from fenrus75/dscal
...
Add an AVX512 enabled DSCAL function
2018-08-12 18:16:45 +02:00
Martin Kroeker
a95a784ab2
Merge pull request #1723 from maamountki/develop
...
Disable zgemv scale in gemv benchmark by default
2018-08-11 21:08:45 +02:00
Arjan van de Ven
9bec34cb67
Add an AVX512 enabled DSYMV (L) function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:46:24 +00:00
Arjan van de Ven
87bebdbd8a
Add an AVX512 enabled DGEMV (n) function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:38:12 +00:00
Arjan van de Ven
9493f26309
add short blurb about avx512 and needed compiler to README
2018-08-11 17:21:46 +00:00
Arjan van de Ven
36add7570a
Fix typo in sdot function
...
it looks like my previous pull request was short the final commit;
fix a typo in sdot
2018-08-11 17:16:45 +00:00
Arjan van de Ven
cacacc8007
Add an AVX512 enabled DSCAL function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:14:57 +00:00
Martin Kroeker
1a00ef3d27
Merge pull request #1725 from fenrus75/axpy
...
Add a AVX512 enabled SAXPY/DAXPY functions
2018-08-11 11:01:20 +02:00
Martin Kroeker
4c0d832ec3
Merge pull request #1724 from fenrus75/sdot
...
Add an AVX512 enabled SDOT function
2018-08-11 11:00:56 +02:00
Martin Kroeker
fc33cbc7bb
Merge pull request #1728 from martin-frbg/changelog
...
Add changes from the 0.3.x releases
2018-08-10 13:24:36 +02:00
Martin Kroeker
c52a831ae4
Add changes from the 0.3.x releases
...
fixes #1727
2018-08-10 13:23:47 +02:00
Arjan van de Ven
2e99873ff7
Add a AVX512 enabled SAXPY/DAXPY functions
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:58:32 +00:00
Arjan van de Ven
00abaa865b
Add an AVX512 enabled SDOT function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:33:43 +00:00
maamountki
33043f563f
Disable scal to benchmark zgemv separately by default
2018-08-10 01:54:18 +03:00
Martin Kroeker
66da7677bd
Merge pull request #1721 from fenrus75/ddot2
...
Add an AVX512 enabled DDOT function
2018-08-09 15:39:06 +02:00
Arjan van de Ven
7932ff3ea9
Add an AVX512 enabled DDOT function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-09 03:55:52 +00:00
Martin Kroeker
62f4c69708
Merge pull request #1717 from martin-frbg/issue1708
...
Add workaround for avx512 compilations on Cygwin
2018-08-06 22:05:47 +02:00
maamountki
453bfa7e71
[ZARCH] Restore detect() function
2018-08-06 20:03:49 +03:00
maamountki
23229011db
[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization
2018-08-06 18:20:40 +03:00
Martin Kroeker
73478664d4
Add workaround for avx512 compilations on Cygwin
...
fixes #1708
2018-08-06 16:40:32 +02:00
Martin Kroeker
ee955757f9
Merge pull request #1715 from stevengj/patch-1
...
fix blasabs for windows
2018-08-05 22:48:44 +02:00
Steven G. Johnson
48610a4524
fix blasabs for windows
...
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker
4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
...
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
e788102c10
Merge pull request #1709 from stevengj/patch-1
...
fabs -> fabsl
2018-08-04 23:51:10 +02:00
Martin Kroeker
165f00c159
fabs -> fabsl
2018-08-04 20:14:51 +02:00
Martin Kroeker
40c068a875
Introduce blasabs() to switch between abs() and labs() for INTERFACE64
2018-08-04 20:07:59 +02:00
Martin Kroeker
933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64
2018-08-04 20:06:49 +02:00
Steven G. Johnson
a4e321400b
fabs -> fabsl
...
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
2018-08-03 13:00:10 -04:00
Martin Kroeker
9e65430504
Merge pull request #1703 from wsttiger/cmake_fix
...
Set EXPORT_NAME to match OpenBLASConfig.cmake
2018-08-02 23:48:42 +02:00
Martin Kroeker
2cfa86b406
Merge pull request #1707 from extrowerk/haiku_support
...
Haiku supporting patches
2018-08-02 22:27:00 +02:00
Scott Thornton
2a9a9389ef
Added target_include_directories()
2018-08-02 14:58:52 -05:00
Zoltán Mizsei
6463bffd59
Haiku supporting patches
2018-08-02 20:49:14 +02:00
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
...
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
...
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Scott Thornton
8ebf541e97
Set EXPORT_NAME to match OpenBLASConfig.cmake
2018-07-30 15:18:29 -05:00
Martin Kroeker
b03ae3f4dc
Set version to 0.3.3.dev
2018-07-30 08:23:13 +02:00
Martin Kroeker
2cc8fb0ad2
Set version to 0.3.3.dev
2018-07-30 08:22:38 +02:00
Martin Kroeker
e8a68ef261
Merge pull request #1702 from xianyi/develop
...
Merge develop for 0.3.2
2018-07-30 07:25:01 +02:00
Martin Kroeker
64826a0d7d
Merge branch 'release-0.3.0' into develop
2018-07-29 22:37:09 +02:00
Martin Kroeker
25f2d25cfe
Merge pull request #1697 from martin-frbg/issue1696
...
Do not treat WIndows UWB builds as cross-compiling
2018-07-25 19:55:29 +02:00
Martin Kroeker
73131fa30a
Do not treat WIndows UWB builds as cross-compiling
2018-07-24 17:46:33 +02:00
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
...
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
...
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
...
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
...
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker
e71d70ba87
Merge pull request #1681 from martin-frbg/issue1671
...
Add cpu identification via mfpvr call for the BSDs
2018-07-16 22:47:05 +02:00
Martin Kroeker
d671870f5f
Merge pull request #1684 from martin-frbg/issue1672
...
Work around utest failures in the MIPS64 SICORTEX target
2018-07-16 22:46:49 +02:00
Martin Kroeker
4e103c822c
typo fix
2018-07-16 12:56:39 +02:00
Martin Kroeker
d2142760e0
Fix precision problem in DSDOT
2018-07-15 17:11:40 +02:00
Martin Kroeker
2fbfc64da8
Use C kernels for default c/zAXPY, xROT, c/zSWAP
2018-07-15 17:09:55 +02:00
Martin Kroeker
8d5b33b6be
Add cpu identification via mfpvr call for the BSDs
...
fixes #1671
2018-07-12 23:39:00 +02:00
Martin Kroeker
36aea5ce2d
Merge pull request #1680 from martin-frbg/snprint
...
Fix wrong redefinitions of snprintf for older MSVC
2018-07-12 14:05:13 +02:00
Martin Kroeker
1309711e24
Fix declaration of snprintf for older MSVC
...
_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)
2018-07-12 11:47:52 +02:00
Martin Kroeker
571e9de2ac
Fix definition of snprintf for MSVC
...
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
2018-07-12 11:42:25 +02:00
Martin Kroeker
448ed15115
Merge pull request #1678 from martin-frbg/issue1677
...
Define snprintf for older versions of MSVC
2018-07-12 09:21:34 +02:00
Martin Kroeker
045fb5ea2c
Define snprintf for older versions of MSVC
...
for #1677
2018-07-12 07:30:58 +02:00
Martin Kroeker
4dd70d98d7
Merge pull request #1667 from xianyi/revert-1642-develop
...
Revert "Rewrite &= -> = and simplify the initial blocking phase."
2018-07-04 08:27:21 +02:00
Martin Kroeker
504310eeb9
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
...
Add cpuid for AMD Ryzen 2
2018-07-04 08:19:40 +02:00
Martin Kroeker
ea1f39518f
Merge pull request #1663 from martin-frbg/issue1641
...
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
2018-07-04 08:19:11 +02:00
Martin Kroeker
5f2a3c05cd
Revert "Rewrite &= -> = and simplify the initial blocking phase."
2018-07-03 21:42:28 +02:00
Martin Kroeker
d0ec4325cf
Add cpuid for AMD Ryzen 2
2018-07-03 21:03:24 +02:00
Martin Kroeker
3f73e8b8cf
Add cpuid for AMD Ryzen 2
...
for #1664
2018-07-03 21:01:35 +02:00
Martin Kroeker
a83f01e0ee
Merge pull request #1662 from martin-frbg/cmake-avx512
...
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
2018-07-03 17:40:09 +02:00
Martin Kroeker
a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
...
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker
b74aef2816
Add -march=skylake-avx512 to AVX512 compile check and suppress its output
2018-07-03 14:41:44 +02:00
Martin Kroeker
a9fa805007
Merge pull request #1660 from martin-frbg/issue1659
...
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
2018-07-02 17:48:19 +02:00
Martin Kroeker
9d15a3bd16
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
...
fixes 1659
2018-07-02 14:40:41 +02:00
Martin Kroeker
bbf2124970
set version number to 0.3.2.dev
2018-07-01 12:01:51 +02:00
Martin Kroeker
1392eba488
set version number to 0.3.2.dev
2018-07-01 12:01:16 +02:00
Martin Kroeker
61659f8765
Merge pull request #1648 from martin-frbg/nofort
...
Handle NOFORTRAN=0
2018-07-01 11:56:40 +02:00
Martin Kroeker
cc92257ea6
Update Makefile
2018-06-27 00:09:21 +02:00
Martin Kroeker
2aba1b1658
Merge branch 'develop' into nofort
2018-06-27 00:07:32 +02:00
Martin Kroeker
8396e9e777
Handle NOFORTRAN=0
2018-06-27 00:00:27 +02:00