Commit Graph

6519 Commits

Author SHA1 Message Date
Mayank Raj
a9939111d7 Update dgemv_thread_safety.cpp 2022-07-24 11:51:25 +05:30
Martin Kroeker
c43ec53bdd Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Martin Kroeker
b7c65d08cb Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
2022-07-19 10:25:01 +02:00
Martin Kroeker
fcbbd8c25c Merge pull request #3682 from XiWeiGu/develop
Fix dnrm2_tiny testcase failure
2022-07-19 10:24:28 +02:00
Rajalakshmi Srinivasaraghavan
a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan
432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
gxw
4dd05e526b LoongArch64: Fix dnrm2_tiny testcase failure 2022-07-15 11:18:59 +08:00
Martin Kroeker
7da799dc66 Merge pull request #3686 from martin-frbg/issue3685
Fix Fortran-less CTEST build option
2022-07-13 08:24:15 +02:00
Martin Kroeker
6e018b84c4 Fix function prototypes and INTERFACE64 support 2022-07-12 19:37:30 +02:00
Martin Kroeker
ccd87cc472 Fix switching between Fortran and C build 2022-07-12 19:35:31 +02:00
gxw
cce4b1d956 MIPS64: Fix dnrm2_tiny testcase failure 2022-07-11 19:18:38 +08:00
Martin Kroeker
7918ba11c2 Merge pull request #3680 from martin-frbg/issue3636-2
Guard against sysconf(__SC_NPROCESSORS_CONF) returning zero at runtime
2022-07-07 11:38:24 +02:00
Martin Kroeker
69148ae795 Guard against sysconf returning zero processors 2022-07-06 17:22:18 +02:00
Martin Kroeker
e9260f5451 Guard against system call returning zero processors 2022-07-06 17:21:10 +02:00
Martin Kroeker
4cfd6f110a Merge pull request #3678 from martin-frbg/issue3677
Eliminate uses of CREAL on left-hand side of assignments
2022-07-05 10:40:32 +02:00
Martin Kroeker
e12d474780 Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker
686e6d7c10 Merge pull request #3676 from martin-frbg/dnrm2-utest
Add DNRM2 regression test for issues 2998 and 3654
2022-07-04 08:37:18 +02:00
Martin Kroeker
c5041ae270 properly embed test_dnrm2 2022-07-03 23:48:30 +02:00
Martin Kroeker
8e6f719ad3 use huge_val not huge_valf for portability 2022-07-03 20:19:24 +02:00
Martin Kroeker
af88494f87 old systems may not have inf in math.h 2022-07-03 18:23:51 +02:00
Martin Kroeker
ee41b6eb24 Add DNRM2 regression test for issues 2998 and 3654 2022-07-03 17:56:49 +02:00
Martin Kroeker
bf8998a9f4 Merge pull request #3675 from martin-frbg/issue3654
workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0
2022-07-03 08:45:45 +02:00
Martin Kroeker
9e29598575 workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Martin Kroeker
3df3d622eb Merge pull request #3672 from imzhuhl/neoversen2_bf16
sbgemm support for ARM Neoverse N2
2022-07-01 12:13:42 +02:00
Martin Kroeker
407a1a242c Merge pull request #3670 from martin-frbg/osxvermin
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
2022-06-29 08:31:04 +02:00
Honglin Zhu
ec0d5c7a2a Add gfortran parameters 2022-06-29 10:17:05 +08:00
Honglin Zhu
123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Martin Kroeker
1fb4259077 Merge pull request #3673 from martin-frbg/azuredynmingw
AzureCI: drop cpus from the DYNAMIC_LIST for Windows/mingw to save time
2022-06-28 23:13:11 +02:00
Martin Kroeker
47a0e53196 mingw-dynamic arch: drop Haswell too 2022-06-28 21:40:04 +02:00
Martin Kroeker
c7b3ce010e drop NEHALEM from the DYNLIST for Windows/mingw to save time 2022-06-28 20:12:11 +02:00
Martin Kroeker
be5500e704 Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
2022-06-28 16:09:36 +02:00
Martin Kroeker
92275a7902 Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
2022-06-28 16:05:11 +02:00
Martin Kroeker
914c4d0fe8 Add C versions of the CBLAS test sources (#3656)
* Add C conversions of the CBLAS tests for NOFORTRAN=1 builds

* Enable CTEST without Fortran and fix passing of BUILD_vartype options to exports/gensymbol
2022-06-28 11:52:48 +02:00
Martin Kroeker
2857987ff6 Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs 2022-06-28 11:46:25 +02:00
VFerrari
2062280c6f Power: Enable SMALL_MATRIX OPT as default for dynamic arch 2022-06-25 03:47:03 -03:00
VFerrari
cac634fce3 POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker
9283c7c0b5 Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Martin Kroeker
9777c59d98 Merge pull request #3653 from RajalakshmiSR/dgemvp10
POWER10: convert dgemv inline assembly
2022-06-18 20:51:59 +02:00
Rajalakshmi Srinivasaraghavan
f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Martin Kroeker
7060ca5002 Merge pull request #3647 from martin-frbg/exports_3.10.0
Amend gensymbol with some LAPACK 3.10.0 additions
2022-06-10 08:58:00 +02:00
Martin Kroeker
72ea19d187 Amend some LAPACK 3.10.0 additions 2022-06-09 19:31:08 +02:00
Nursultan Zarlyk
1dfc4e6150 Replace with ARM64 intrinsics 2022-06-09 18:49:49 +02:00
Rajalakshmi Srinivasaraghavan
8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Martin Kroeker
bfd9c1b58c Merge pull request #3645 from martin-frbg/issue3644
Fix quotes around compiler args in C11 check
2022-06-08 19:29:07 +02:00
Martin Kroeker
79d98327e4 Fix quotes around compiler args in C11 check 2022-06-08 11:22:20 +02:00
Martin Kroeker
eb1faada19 Merge pull request #3643 from martin-frbg/fixgensymbol
Fix LAPACK path in new gensymbol script
2022-06-08 11:18:46 +02:00
Xianyi Zhang
5e9a912591 Merge branch 'develop' into risc-v 2022-06-06 14:12:09 +08:00