Commit Graph

1899 Commits

Author SHA1 Message Date
Jiaxun Yang
50c4eeb97d alpha: Remove include of version.h
It will be defined by preprocessor argument.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-11 15:02:58 +01:00
Ivan Pribec
802e71bf05 Add const attribute to lsame 2022-08-08 15:15:52 +02:00
gxw
fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
Martin Kroeker
cd8e57040c Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-25 15:41:15 +02:00
Martin Kroeker
6c118b7977 Fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-24 17:42:31 +02:00
Martin Kroeker
c43ec53bdd Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Martin Kroeker
b7c65d08cb Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
2022-07-19 10:25:01 +02:00
Martin Kroeker
06ef015234 fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-19 10:19:27 +02:00
Rajalakshmi Srinivasaraghavan
a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan
432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
gxw
4dd05e526b LoongArch64: Fix dnrm2_tiny testcase failure 2022-07-15 11:18:59 +08:00
gxw
cce4b1d956 MIPS64: Fix dnrm2_tiny testcase failure 2022-07-11 19:18:38 +08:00
Martin Kroeker
e12d474780 Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker
9e29598575 workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Honglin Zhu
123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Martin Kroeker
be5500e704 Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
2022-06-28 16:09:36 +02:00
Martin Kroeker
92275a7902 Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
2022-06-28 16:05:11 +02:00
VFerrari
cac634fce3 POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker
9283c7c0b5 Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Rajalakshmi Srinivasaraghavan
f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Rajalakshmi Srinivasaraghavan
8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Xianyi Zhang
5e9a912591 Merge branch 'develop' into risc-v 2022-06-06 14:12:09 +08:00
Xianyi Zhang
968e1f51d8 Update RISC-V Intrinsic API. 2022-06-06 13:52:21 +08:00
Nursultan Zarlyk
1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Martin Kroeker
dc49edd4e6 Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH" 2022-05-20 11:23:30 +02:00
Rajalakshmi Srinivasaraghavan
b62173c5a0 POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
2022-05-12 11:17:33 -05:00
Martin Kroeker
84cb58b7fb Fix generator rules for ?laswp_ncopy and ?neg_tcopy 2022-04-30 15:28:38 +02:00
Martin Kroeker
05dcfa176e fix undefined prefetchsizes 2022-04-16 10:04:27 +02:00
Martin Kroeker
2bbb9f05c7 fix undefined prefetchsize 2022-04-16 10:00:10 +02:00
Martin Kroeker
115bc9b98f CortexX1 is ARMV8 like A7x 2022-03-28 17:28:29 +02:00
Martin Kroeker
b3b4672c30 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
Martin Kroeker
40302558ed Remove extraneous (and wrong) definition of sbgemm_r on x86_64 2022-03-23 20:05:32 +01:00
Caroline Newcombe
5cc1111383 fix unsafe read of Y in assembly kernel 2022-03-11 11:56:33 -06:00
Xianyi Zhang
45786b05da Merge branch 'develop' into risc-v 2022-02-28 11:48:02 +08:00
Wangyang Guo
225683218c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
Martin Kroeker
9c626e466e really fix definition of SHUFFLE_MAGIC_NO 2022-02-25 15:36:02 +01:00
Martin Kroeker
0698212c8c Remove stray $ 2022-02-25 15:33:02 +01:00
Martin Kroeker
9d7429406f Declare SHUFFLE_MAGIC_NO as const to placate clang 2022-02-25 10:05:36 +01:00
Martin Kroeker
d9894f45d3 Define sbgemm_r to fix DYNAMIC_ARCH builds 2022-02-25 10:04:00 +01:00
Martin Kroeker
522f809825 Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
2022-02-24 00:00:00 +01:00
Mosè Giordano
abbc947edb Fix compilation of Skylake AVX512 kernels with GCC 6 2022-02-23 22:51:59 +00:00
Martin Kroeker
c62f8e2c01 Prevent compiler attempts to use k0 as mask register 2022-02-23 20:12:20 +01:00
Martin Kroeker
80eb581c83 Fix non-portable u_int64_t 2022-02-23 20:10:59 +01:00
Martin Kroeker
73ffabe6ba Guard uses of _mm512_reduce_add_p? 2022-02-23 20:06:14 +01:00
Martin Kroeker
7656aba00e Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
addc2a7aaa Add proper defaults for IMIN/IMAX 2022-01-27 19:56:32 +01:00
Martin Kroeker
299d4d70a3 Add default KERNEL file for Elbrus E2K arch 2022-01-22 18:59:36 +01:00