Commit Graph

6505 Commits

Author SHA1 Message Date
Martin Kroeker
4cfd6f110a Merge pull request #3678 from martin-frbg/issue3677
Eliminate uses of CREAL on left-hand side of assignments
2022-07-05 10:40:32 +02:00
Martin Kroeker
e12d474780 Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker
686e6d7c10 Merge pull request #3676 from martin-frbg/dnrm2-utest
Add DNRM2 regression test for issues 2998 and 3654
2022-07-04 08:37:18 +02:00
Martin Kroeker
c5041ae270 properly embed test_dnrm2 2022-07-03 23:48:30 +02:00
Martin Kroeker
8e6f719ad3 use huge_val not huge_valf for portability 2022-07-03 20:19:24 +02:00
Martin Kroeker
af88494f87 old systems may not have inf in math.h 2022-07-03 18:23:51 +02:00
Martin Kroeker
ee41b6eb24 Add DNRM2 regression test for issues 2998 and 3654 2022-07-03 17:56:49 +02:00
Martin Kroeker
bf8998a9f4 Merge pull request #3675 from martin-frbg/issue3654
workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0
2022-07-03 08:45:45 +02:00
Martin Kroeker
9e29598575 workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Martin Kroeker
3df3d622eb Merge pull request #3672 from imzhuhl/neoversen2_bf16
sbgemm support for ARM Neoverse N2
2022-07-01 12:13:42 +02:00
Martin Kroeker
407a1a242c Merge pull request #3670 from martin-frbg/osxvermin
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
2022-06-29 08:31:04 +02:00
Honglin Zhu
ec0d5c7a2a Add gfortran parameters 2022-06-29 10:17:05 +08:00
Honglin Zhu
123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Martin Kroeker
1fb4259077 Merge pull request #3673 from martin-frbg/azuredynmingw
AzureCI: drop cpus from the DYNAMIC_LIST for Windows/mingw to save time
2022-06-28 23:13:11 +02:00
Martin Kroeker
47a0e53196 mingw-dynamic arch: drop Haswell too 2022-06-28 21:40:04 +02:00
Martin Kroeker
c7b3ce010e drop NEHALEM from the DYNLIST for Windows/mingw to save time 2022-06-28 20:12:11 +02:00
Martin Kroeker
be5500e704 Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
2022-06-28 16:09:36 +02:00
Martin Kroeker
92275a7902 Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
2022-06-28 16:05:11 +02:00
Martin Kroeker
914c4d0fe8 Add C versions of the CBLAS test sources (#3656)
* Add C conversions of the CBLAS tests for NOFORTRAN=1 builds

* Enable CTEST without Fortran and fix passing of BUILD_vartype options to exports/gensymbol
2022-06-28 11:52:48 +02:00
Martin Kroeker
2857987ff6 Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs 2022-06-28 11:46:25 +02:00
VFerrari
2062280c6f Power: Enable SMALL_MATRIX OPT as default for dynamic arch 2022-06-25 03:47:03 -03:00
VFerrari
cac634fce3 POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker
9283c7c0b5 Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Martin Kroeker
9777c59d98 Merge pull request #3653 from RajalakshmiSR/dgemvp10
POWER10: convert dgemv inline assembly
2022-06-18 20:51:59 +02:00
Rajalakshmi Srinivasaraghavan
f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Martin Kroeker
7060ca5002 Merge pull request #3647 from martin-frbg/exports_3.10.0
Amend gensymbol with some LAPACK 3.10.0 additions
2022-06-10 08:58:00 +02:00
Martin Kroeker
72ea19d187 Amend some LAPACK 3.10.0 additions 2022-06-09 19:31:08 +02:00
Nursultan Zarlyk
1dfc4e6150 Replace with ARM64 intrinsics 2022-06-09 18:49:49 +02:00
Rajalakshmi Srinivasaraghavan
8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Martin Kroeker
bfd9c1b58c Merge pull request #3645 from martin-frbg/issue3644
Fix quotes around compiler args in C11 check
2022-06-08 19:29:07 +02:00
Martin Kroeker
79d98327e4 Fix quotes around compiler args in C11 check 2022-06-08 11:22:20 +02:00
Martin Kroeker
eb1faada19 Merge pull request #3643 from martin-frbg/fixgensymbol
Fix LAPACK path in new gensymbol script
2022-06-08 11:18:46 +02:00
Xianyi Zhang
5e9a912591 Merge branch 'develop' into risc-v 2022-06-06 14:12:09 +08:00
Xianyi Zhang
f9715605ac Add PLCT to contributors. 2022-06-06 14:11:28 +08:00
Xianyi Zhang
3f88429bcf Merge branch 'risc-v_fix_intrinsic' into risc-v 2022-06-06 13:56:05 +08:00
Xianyi Zhang
968e1f51d8 Update RISC-V Intrinsic API. 2022-06-06 13:52:21 +08:00
Martin Kroeker
e9c3535208 Fix LAPACK path in new gensymbol script 2022-06-05 23:28:12 +02:00
Martin Kroeker
f150c97ceb Merge pull request #3641 from RajalakshmiSR/ppc_build
power10:  Fix build issues due to perl scripts conversion
2022-06-05 11:23:29 +02:00
Nursultan Zarlyk
1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Rajalakshmi Srinivasaraghavan
c98d63b637 power10: Fix build issues due to perl scripts conversion
Due to recent perl script conversion, there are some build
errors when compiling openblas with advance toolchain compilers.
2022-06-02 08:11:10 -05:00
Martin Kroeker
28a24a4d4f Merge pull request #3637 from martin-frbg/issue3636
Add fallback value for bogus sc_nprocessors_conf in getarch
2022-05-27 10:23:02 +02:00
Martin Kroeker
14ae22bf7a Add fallback value for bogus sc_nprocessors_conf 2022-05-27 00:29:17 +02:00
Martin Kroeker
771dc6a8d8 Merge pull request #3635 from martin-frbg/issue3634
Support compilation with the Intel ifx compiler
2022-05-26 11:57:53 +02:00
Martin Kroeker
19413624d0 Add Intel ifx compiler 2022-05-26 09:31:49 +02:00
Martin Kroeker
f56e4b620f Merge pull request #3633 from martin-frbg/perl_fallback
Add back original PERL-based build scripts and add option USE_PERL
2022-05-22 21:18:44 +02:00
Martin Kroeker
5cb0d23027 Support USE_PERL fallback for gensymbol 2022-05-22 18:36:24 +02:00
Martin Kroeker
f5a379bf77 Add USE_PERL fallback option for gensymbol script 2022-05-22 18:35:23 +02:00