Commit Graph

1957 Commits

Author SHA1 Message Date
Martin Kroeker b1d69fb3ac
Add MIPS64_GENERIC as a copy of GENERIC 2022-09-17 23:52:32 +02:00
gxw edea1bcfaf MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500 2022-09-17 16:43:22 +08:00
Martin Kroeker 101a2c77c3
Fix warnings 2022-09-15 09:19:19 +02:00
Martin Kroeker 23d59baaf1
Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well 2022-09-13 22:39:27 +02:00
gxw 365936ae1b MIPS64: Using the macro MTC rather than MTC1 2022-09-13 16:39:40 +08:00
Martin Kroeker 739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745)
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
2022-09-03 15:01:22 +02:00
Martin Kroeker bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang 50c4eeb97d alpha: Remove include of version.h
It will be defined by preprocessor argument.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-11 15:02:58 +01:00
Ivan Pribec 802e71bf05 Add const attribute to lsame 2022-08-08 15:15:52 +02:00
gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
Martin Kroeker cd8e57040c
Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-25 15:41:15 +02:00
Martin Kroeker 6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-24 17:42:31 +02:00
Martin Kroeker c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Martin Kroeker b7c65d08cb
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
2022-07-19 10:25:01 +02:00
Martin Kroeker 06ef015234
fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-19 10:19:27 +02:00
Rajalakshmi Srinivasaraghavan a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan 432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
gxw 4dd05e526b LoongArch64: Fix dnrm2_tiny testcase failure 2022-07-15 11:18:59 +08:00
gxw cce4b1d956 MIPS64: Fix dnrm2_tiny testcase failure 2022-07-11 19:18:38 +08:00
Martin Kroeker e12d474780
Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker 9e29598575
workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Honglin Zhu 123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu 55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu 04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Martin Kroeker be5500e704
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
2022-06-28 16:09:36 +02:00
Martin Kroeker 92275a7902
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
2022-06-28 16:05:11 +02:00
VFerrari cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker 9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Rajalakshmi Srinivasaraghavan f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Rajalakshmi Srinivasaraghavan 8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Xianyi Zhang 5e9a912591 Merge branch 'develop' into risc-v 2022-06-06 14:12:09 +08:00
Xianyi Zhang 968e1f51d8 Update RISC-V Intrinsic API. 2022-06-06 13:52:21 +08:00
Nursultan Zarlyk 1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Martin Kroeker dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH" 2022-05-20 11:23:30 +02:00
Rajalakshmi Srinivasaraghavan b62173c5a0 POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
2022-05-12 11:17:33 -05:00
Martin Kroeker 84cb58b7fb
Fix generator rules for ?laswp_ncopy and ?neg_tcopy 2022-04-30 15:28:38 +02:00
Martin Kroeker 05dcfa176e
fix undefined prefetchsizes 2022-04-16 10:04:27 +02:00
Martin Kroeker 2bbb9f05c7
fix undefined prefetchsize 2022-04-16 10:00:10 +02:00
Martin Kroeker 115bc9b98f
CortexX1 is ARMV8 like A7x 2022-03-28 17:28:29 +02:00
Martin Kroeker b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
Martin Kroeker 40302558ed
Remove extraneous (and wrong) definition of sbgemm_r on x86_64 2022-03-23 20:05:32 +01:00
Caroline Newcombe 5cc1111383 fix unsafe read of Y in assembly kernel 2022-03-11 11:56:33 -06:00
Xianyi Zhang 45786b05da Merge branch 'develop' into risc-v 2022-02-28 11:48:02 +08:00
Wangyang Guo 225683218c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
Martin Kroeker 9c626e466e
really fix definition of SHUFFLE_MAGIC_NO 2022-02-25 15:36:02 +01:00
Martin Kroeker 0698212c8c
Remove stray $ 2022-02-25 15:33:02 +01:00
Martin Kroeker 9d7429406f
Declare SHUFFLE_MAGIC_NO as const to placate clang 2022-02-25 10:05:36 +01:00
Martin Kroeker d9894f45d3
Define sbgemm_r to fix DYNAMIC_ARCH builds 2022-02-25 10:04:00 +01:00
Martin Kroeker 522f809825
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
2022-02-24 00:00:00 +01:00
Mosè Giordano abbc947edb Fix compilation of Skylake AVX512 kernels with GCC 6 2022-02-23 22:51:59 +00:00
Martin Kroeker c62f8e2c01
Prevent compiler attempts to use k0 as mask register 2022-02-23 20:12:20 +01:00
Martin Kroeker 80eb581c83
Fix non-portable u_int64_t 2022-02-23 20:10:59 +01:00
Martin Kroeker 73ffabe6ba
Guard uses of _mm512_reduce_add_p? 2022-02-23 20:06:14 +01:00
Martin Kroeker 7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker addc2a7aaa
Add proper defaults for IMIN/IMAX 2022-01-27 19:56:32 +01:00
Martin Kroeker 299d4d70a3
Add default KERNEL file for Elbrus E2K arch 2022-01-22 18:59:36 +01:00
Martin Kroeker 3492bea602
Create Makefile 2022-01-22 18:57:28 +01:00
Martin Kroeker 898cf5faf3
Add Elbrus e2k architecture support 2022-01-22 18:55:10 +01:00
Martin Kroeker c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank 19d435b1b3 update armv8sve + contributors 2022-01-18 08:28:31 +01:00
Bine Brank f158d59087 adapt CMake 2022-01-17 22:36:48 +01:00
Bine Brank b6a445cfd8 adapt Makefile for SVE trsm 2022-01-16 21:40:56 +01:00
Bine Brank 0fb6cc07bf fix ztrsm lt/ut copy 2022-01-16 21:39:57 +01:00
Bine Brank f1315288a8 add sve ztrsm 2022-01-15 22:27:25 +01:00
Bine Brank aaa2b1a861 fix sve dtrsm kernels 2022-01-15 21:02:14 +01:00
Bine Brank 8071e179f1 add remaining sve trsm copy kernels 2022-01-11 21:16:38 +01:00
Bine Brank f87468ac91 trsm_lncopy_sve 2022-01-10 21:45:37 +01:00
Bine Brank e8939b3d30 sve trsmRN and trsmRT 2022-01-10 20:42:20 +01:00
Bine Brank 098672b51b add trsm_kernel_LT_sve 2022-01-09 20:11:47 +01:00
Bine Brank be7e55880c sve trsm_kernel_LN 2022-01-09 19:40:04 +01:00
Martin Kroeker b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Bine Brank bb33446b40 fix makefile.L3 2022-01-06 10:26:11 +01:00
Bine Brank f33543d029 combine zchemm into single file 2022-01-05 14:42:37 +01:00
Bine Brank 0c91d043ae adapt CMake for SVE 2022-01-05 14:36:39 +01:00
Bine Brank 39ab219704 sve copy functions for cgemm chemm zsymm 2022-01-05 09:12:22 +01:00
Bine Brank 18102ae8c3 add cgemm ctrmm sve kernels 2022-01-05 09:09:18 +01:00
Bine Brank 87537b8c55 modify sve zgemmcopy kernels 2022-01-05 09:07:28 +01:00
Bine Brank d30157d891 update configuration of kernels for A64FX and ARMV8SVE 2022-01-05 09:00:54 +01:00
Bine Brank 07fa6fa3b1 configure Makefile for sve 2022-01-05 08:57:51 +01:00
Bine Brank 2e2c02b762 fix sve ztrmm kernel 2022-01-04 14:42:07 +01:00
Bine Brank 68c414d3a6 ztrmm sve copy functions 2022-01-04 14:40:59 +01:00
Bine Brank ce329ab686 add sve zhemm copy routines 2022-01-03 15:56:05 +01:00
Bine Brank 0140373802 add sve ztrmm 2022-01-02 19:15:33 +01:00
Bine Brank f7b6912868 ztrmm sve copy kernels 2021-12-30 21:00:16 +01:00
Bine Brank 40b14e4957 fix zgemm kernel 2021-12-29 11:42:04 +01:00
Bine Brank 6ec4aab875 zgemm sve copy routines 2021-12-26 17:05:46 +01:00
Bine Brank 878064f394 sve zgemm kernel 2021-12-26 08:44:05 +01:00
Bine Brank 683a7548bf added macros for sve zgemm kernels 2021-12-25 11:46:41 +01:00
Martin Kroeker 7b146e590c
fix function typecast 2021-12-24 20:01:52 +01:00
Martin Kroeker e9a0e52201
fix function typecast 2021-12-24 20:00:50 +01:00
Martin Kroeker d1ee6ff73f
fix function typecasts 2021-12-21 18:45:28 +01:00
Bine Brank e3c9947c0f prepare kernel for sve zgemm 2021-12-21 11:19:27 +01:00
gxw 8d9b9c6b2a loongarch64: Optimize dgemm_kernel 2021-12-21 09:33:06 +08:00
Wu Zhigang 92b7b949dd fix bug in zscal function
memset can not be used in zscal because of
the stride parameters.

Signed-off-by: Wu Zhigang <zhigang.wu@starfivetech.com>
2021-12-15 01:23:30 -08:00
Martin Kroeker b0a590f4fe
Merge pull request #3475 from wjc404/optimize-A53-dgemm
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 19:09:08 +01:00
Martin Kroeker f4d1f0333b
Merge pull request #3474 from rafaelcfsousa/rafael/cmake_power
Add CMake support for Power
2021-12-12 19:08:27 +01:00
Jia-Chen b610d2de37 optimize cgemm on ARM cortex A53 & cortex A55 2021-12-12 17:22:52 +08:00