Commit Graph

5876 Commits

Author SHA1 Message Date
Martin Kroeker aaf1a17168
Apply current library name suffix 2020-08-02 17:58:33 +02:00
Martin Kroeker 53add6a80d
Apply library name suffix to openblas if any 2020-08-02 17:57:12 +02:00
Martin Kroeker 9eb897cc01
Merge pull request #75 from xianyi/develop
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker 7cead56258
Merge pull request #2753 from martin-frbg/issue2751
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker 6794ac3415
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed 2020-08-02 11:20:08 +02:00
Martin Kroeker ecf4b9e0fc
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition 2020-08-01 17:06:03 +02:00
Martin Kroeker dfe5d09641
Merge pull request #2756 from martin-frbg/issue2755
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker 60cd5e55fc
Protect against inadvertent activation of USE_CUDA 2020-08-01 12:31:39 +02:00
Martin Kroeker da9e2a7ada
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes 2020-07-31 16:03:33 +02:00
Martin Kroeker c88cbc5e0d
Merge pull request #2752 from kadler/cpuid_aix
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00
Kevin Adler 589c74aed3
Use systemcfg APIs for CPU detection on AIX
AIX libc already provides ready access to an integer that contains a bit
identifying the CPU it's running on, so there's no need to call a
program and grep its output. Additionally, prtconf is not available in
the PASE runtime, which provides an AIX emulation layer on the IBM i
operating system.

The AIX systemcfg.h also provides macro definitions like POWER_8,
POWER_9, etc for all the bits defining the CPUs as well as macros like
__power_8(), __power_9_andup() that return booleans, but I did not use
them. Since these macros depend on the level of the OS in which it is
built, they may not be defined and instead the associated hex literals
are used directly.
2020-07-30 21:06:40 -05:00
Martin Kroeker 104aa678b0
Fix inadvertent version number reversal to 0.3.9.dev caused by #2710 2020-07-30 11:40:52 +02:00
Martin Kroeker c6b48e0394
Merge pull request #2749 from martin-frbg/make_ppc
Reorganize OpenMP build options for POWER and allow compiling for POWER9 with old gcc
2020-07-30 11:35:53 +02:00
Martin Kroeker 4927251298
Merge pull request #2750 from RajalakshmiSR/dgemv_p10
dgemv optimization for POWER10
2020-07-30 10:13:19 +02:00
Rajalakshmi Srinivasaraghavan f77b6a83f4 dgemv optimization for POWER10
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1.  Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Martin Kroeker 39724e8128
Separate OpenMP handling and allow compilation of Power9 code with older gcc 2020-07-30 01:14:08 +02:00
Martin Kroeker 525db5401c
Merge pull request #74 from xianyi/develop
rebase
2020-07-30 01:04:09 +02:00
Martin Kroeker cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker 7c02f4b1f7
Merge pull request #2744 from martin-frbg/issue2738
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker 383262035d
Merge pull request #2740 from RajalakshmiSR/clang-power
Fix compilation issues with clang on POWER
2020-07-28 18:15:25 +02:00
Martin Kroeker 5fa581c87e
Put hint to use git develop rather than master branch in README 2020-07-28 14:22:41 +00:00
Martin Kroeker 12918358aa
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Martin Kroeker 200f5c44cc
Add AMD Renoir models and preliminary support for ZEN3 as ZEN2
also remap erroneous family 16 entry to BOBCAT and reclaim erroneous family 25 "Barcelona" for Zen3
2020-07-28 13:45:23 +00:00
Martin Kroeker 64e2e4aaf3
missing braces 2020-07-27 20:19:22 +00:00
Martin Kroeker 921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel 2020-07-27 19:54:46 +00:00
Rajalakshmi Srinivasaraghavan d557584b71 Fix compilation issues with clang on POWER
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Martin Kroeker a4ceb1ade9
Merge pull request #2737 from ashwinyes/add_thunderx3_target
ARM64: Add THUNDERX3T110 Target
2020-07-27 15:19:47 +02:00
Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker 49b83e00b7
Merge pull request #2735 from martin-frbg/move_potrf
Move potrf_parallel.c from lapack/getrf to lapack/potrf where it belongs
2020-07-26 19:54:11 +02:00
Martin Kroeker 769ed9ffad
Merge pull request #2734 from RajalakshmiSR/p10_fix
Fix to store results in correct order for POWER10 GEMM kernels
2020-07-25 09:02:32 +02:00
Martin Kroeker f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker 4fda217f99
Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Rajalakshmi Srinivasaraghavan 9be2688c78 Fix to store results in correct order for POWER10 GEMM kernels
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker 6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker 251a09ec90
Typo fix 2020-07-24 16:04:58 +00:00
Martin Kroeker 95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY 2020-07-24 10:13:46 +00:00
Martin Kroeker 3523bb778e
Merge pull request #2721 from martin-frbg/p8align
Fix alignment errors in the power8 saxpy kernel
2020-07-24 11:06:20 +02:00
Martin Kroeker a50d0e29c8
Merge pull request #2731 from martin-frbg/pgippc
Fixes for compilation on POWER with PGI compilers
2020-07-24 11:05:16 +02:00
Martin Kroeker bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 2020-07-23 20:40:13 +00:00
Martin Kroeker ca3561cab9
Add ifdefs around call to altivec microkernel 2020-07-23 18:30:42 +00:00
Martin Kroeker 21072e502a
Typo fix 2020-07-23 17:34:56 +00:00
Martin Kroeker 7c6e56b5df
Rewrite assignment to complex for better portability 2020-07-23 17:10:59 +02:00
Martin Kroeker 661c6bfa5a
Exclude altivec code paths if the compiler does not support them 2020-07-23 17:08:20 +02:00
Martin Kroeker 9796e552ea
Avoid undefining NAME,CNAME etc for pgcc as it makes it ignore the new defininitions 2020-07-23 17:03:28 +02:00
Martin Kroeker d6b6e5ccd7
Merge pull request #73 from xianyi/develop
rebase
2020-07-23 16:59:06 +02:00
Martin Kroeker 349b722d8d
Merge pull request #2729 from martin-frbg/issue2728
Unify BUFFER_SIZE settings for x86_64 again to fix DYNAMIC_ARCH crashes
2020-07-22 22:45:57 +02:00
Martin Kroeker 6c33764ca4
Unify BUFFER_SIZE settings for x86_64 again to fix potentially fatal mismatch in DYNAMIC_ARCH builds 2020-07-22 17:30:55 +00:00
Martin Kroeker d1b9613fd4
Merge pull request #2727 from wyphan/develop
Patch for building on POWERPC with PGI compilers (was Patch for building on Summit)
2020-07-21 17:06:53 +02:00
Martin Kroeker 3cfc74b1a0
Merge pull request #2726 from martin-frbg/2725-2
Add detection of stdatomic.h for cmake
2020-07-21 16:42:06 +02:00
Wileam Phan 9ae154ba89 Patch for building on Summit 2020-07-20 23:30:28 -04:00