Martin Kroeker
d64f1ef26b
Fix incorrect argument to SLASET
...
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:40:24 +02:00
Martin Kroeker
c62aad62e5
Fix incorrect calls to DLASET
...
Reference-LAPACK issue 429
2020-08-14 00:35:45 +02:00
Martin Kroeker
62f4c84f27
Merge pull request #76 from xianyi/develop
...
rebase
2020-08-11 13:25:12 +02:00
Martin Kroeker
7219c9cb87
Merge pull request #2764 from martin-frbg/lapacktests
...
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker
64259d521a
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL
2020-08-09 13:02:27 +02:00
Martin Kroeker
6f5ca44c1a
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3
2020-08-09 12:59:20 +02:00
Martin Kroeker
d28b3f2776
Create Jenkinsfile for OSUOSL PowerCI
2020-08-08 18:05:20 +02:00
Martin Kroeker
ba3f7b3acf
Merge pull request #2761 from RajalakshmiSR/Makefile_err
...
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00
Rajalakshmi Srinivasaraghavan
475b5c95b9
Remove extra symbol in Makefile
...
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker
cd60080d4a
Merge pull request #2758 from martin-frbg/undef_shift
...
Fix GCC ubsan warnings in x86_64 complex dot and gemv_t kernels
2020-08-03 23:30:26 +02:00
Martin Kroeker
4847bfdddd
Merge pull request #2757 from martin-frbg/cmake64
...
Fix lapack-tests linking to a suffixed libopenblas in cmake builds
2020-08-02 23:05:21 +02:00
Martin Kroeker
81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number
...
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number
...
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb
Multiply by two instead of left-shifting one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c
Multiply by two rather than left shift by one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Martin Kroeker
aaf1a17168
Apply current library name suffix
2020-08-02 17:58:33 +02:00
Martin Kroeker
53add6a80d
Apply library name suffix to openblas if any
2020-08-02 17:57:12 +02:00
Martin Kroeker
9eb897cc01
Merge pull request #75 from xianyi/develop
...
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker
7cead56258
Merge pull request #2753 from martin-frbg/issue2751
...
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker
6794ac3415
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed
2020-08-02 11:20:08 +02:00
Martin Kroeker
ecf4b9e0fc
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition
2020-08-01 17:06:03 +02:00
Martin Kroeker
dfe5d09641
Merge pull request #2756 from martin-frbg/issue2755
...
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker
60cd5e55fc
Protect against inadvertent activation of USE_CUDA
2020-08-01 12:31:39 +02:00
Martin Kroeker
da9e2a7ada
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-07-31 16:03:33 +02:00
Martin Kroeker
c88cbc5e0d
Merge pull request #2752 from kadler/cpuid_aix
...
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00
Kevin Adler
589c74aed3
Use systemcfg APIs for CPU detection on AIX
...
AIX libc already provides ready access to an integer that contains a bit
identifying the CPU it's running on, so there's no need to call a
program and grep its output. Additionally, prtconf is not available in
the PASE runtime, which provides an AIX emulation layer on the IBM i
operating system.
The AIX systemcfg.h also provides macro definitions like POWER_8,
POWER_9, etc for all the bits defining the CPUs as well as macros like
__power_8(), __power_9_andup() that return booleans, but I did not use
them. Since these macros depend on the level of the OS in which it is
built, they may not be defined and instead the associated hex literals
are used directly.
2020-07-30 21:06:40 -05:00
Martin Kroeker
104aa678b0
Fix inadvertent version number reversal to 0.3.9.dev caused by #2710
2020-07-30 11:40:52 +02:00
Martin Kroeker
c6b48e0394
Merge pull request #2749 from martin-frbg/make_ppc
...
Reorganize OpenMP build options for POWER and allow compiling for POWER9 with old gcc
2020-07-30 11:35:53 +02:00
Martin Kroeker
4927251298
Merge pull request #2750 from RajalakshmiSR/dgemv_p10
...
dgemv optimization for POWER10
2020-07-30 10:13:19 +02:00
Rajalakshmi Srinivasaraghavan
f77b6a83f4
dgemv optimization for POWER10
...
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1. Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Martin Kroeker
39724e8128
Separate OpenMP handling and allow compilation of Power9 code with older gcc
2020-07-30 01:14:08 +02:00
Martin Kroeker
525db5401c
Merge pull request #74 from xianyi/develop
...
rebase
2020-07-30 01:04:09 +02:00
Martin Kroeker
cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
...
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker
7c02f4b1f7
Merge pull request #2744 from martin-frbg/issue2738
...
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker
383262035d
Merge pull request #2740 from RajalakshmiSR/clang-power
...
Fix compilation issues with clang on POWER
2020-07-28 18:15:25 +02:00
Martin Kroeker
5fa581c87e
Put hint to use git develop rather than master branch in README
2020-07-28 14:22:41 +00:00
Martin Kroeker
12918358aa
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
...
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Martin Kroeker
200f5c44cc
Add AMD Renoir models and preliminary support for ZEN3 as ZEN2
...
also remap erroneous family 16 entry to BOBCAT and reclaim erroneous family 25 "Barcelona" for Zen3
2020-07-28 13:45:23 +00:00
Martin Kroeker
64e2e4aaf3
missing braces
2020-07-27 20:19:22 +00:00
Martin Kroeker
921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel
2020-07-27 19:54:46 +00:00
Rajalakshmi Srinivasaraghavan
d557584b71
Fix compilation issues with clang on POWER
...
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Martin Kroeker
a4ceb1ade9
Merge pull request #2737 from ashwinyes/add_thunderx3_target
...
ARM64: Add THUNDERX3T110 Target
2020-07-27 15:19:47 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
49b83e00b7
Merge pull request #2735 from martin-frbg/move_potrf
...
Move potrf_parallel.c from lapack/getrf to lapack/potrf where it belongs
2020-07-26 19:54:11 +02:00
Martin Kroeker
769ed9ffad
Merge pull request #2734 from RajalakshmiSR/p10_fix
...
Fix to store results in correct order for POWER10 GEMM kernels
2020-07-25 09:02:32 +02:00
Martin Kroeker
f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
...
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff )
the only changes since then were
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50
Change _STDC_VERSION__ to __STDC_VERSION__
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99
Delete potrf_parallel.c (moving it to ../potrf)
2020-07-25 06:42:39 +00:00
Rajalakshmi Srinivasaraghavan
9be2688c78
Fix to store results in correct order for POWER10 GEMM kernels
...
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker
6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694
...
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker
251a09ec90
Typo fix
2020-07-24 16:04:58 +00:00