Chris Sidebottom
eea006a688
Wrap SVE header with __has_include check
2022-12-01 12:07:55 +00:00
Chris Sidebottom
fd4f52c797
Add SVE implementation for sdot/ddot
...
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.
All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
lilianhuang
fdac8a97c1
Add sbgemm_ncopy_8 and sbgemm_tcopy_4
2022-11-29 04:46:14 -05:00
lilianhuang
135718eafc
Improve the performance of sbgemm_tcopy on neoversen2
2022-11-28 04:17:54 -05:00
Chris Sidebottom
4f7b77e08a
Remove unnecessary instructions from Advanced SIMD dot
...
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.
This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Martin Kroeker
f73cfb7e2c
change line endings from CRLF to LF
2022-11-17 09:39:56 +01:00
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
2022-11-16 22:24:01 +01:00
Bart Oldeman
6c1043eb41
Add [cz]scal microkernels for SKYLAKEX
...
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.
Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350
from #2610 (which is masked by
commit 086d87a30
). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker
c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793)
2022-10-31 16:57:03 +01:00
Martin Kroeker
65338a9493
Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
...
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
2022-10-30 18:56:10 +01:00
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
Bart Oldeman
e7e3aa2948
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
...
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f
for more information.
This uses the same syntax as used in 22aa81f
for zarch (s390x).
2022-10-27 18:16:43 -04:00
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2022-10-27 14:10:26 +08:00
Honglin Zhu
843e9fd0b9
Fix typo error
2022-10-26 17:06:33 +08:00
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
...
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
f6f35a4288
fix copyobj declarations to work with DYNAMIC_ARCH
2022-09-29 08:47:14 +02:00
Martin Kroeker
b1d69fb3ac
Add MIPS64_GENERIC as a copy of GENERIC
2022-09-17 23:52:32 +02:00
gxw
edea1bcfaf
MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500
2022-09-17 16:43:22 +08:00
Martin Kroeker
101a2c77c3
Fix warnings
2022-09-15 09:19:19 +02:00
Martin Kroeker
23d59baaf1
Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well
2022-09-13 22:39:27 +02:00
gxw
365936ae1b
MIPS64: Using the macro MTC rather than MTC1
2022-09-13 16:39:40 +08:00
Martin Kroeker
739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI ( #3745 )
...
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
2022-09-03 15:01:22 +02:00
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
...
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
...
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang
50c4eeb97d
alpha: Remove include of version.h
...
It will be defined by preprocessor argument.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-11 15:02:58 +01:00
Ivan Pribec
802e71bf05
Add const attribute to lsame
2022-08-08 15:15:52 +02:00
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 14:28:45 +08:00
Martin Kroeker
cd8e57040c
Merge pull request #3691 from martin-frbg/issue3679-sparc
...
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-25 15:41:15 +02:00
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-24 17:42:31 +02:00
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
...
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Martin Kroeker
b7c65d08cb
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
...
POWER10: dgemv builtin rename
2022-07-19 10:25:01 +02:00
Martin Kroeker
06ef015234
fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-19 10:19:27 +02:00
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
...
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
...
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
gxw
4dd05e526b
LoongArch64: Fix dnrm2_tiny testcase failure
2022-07-15 11:18:59 +08:00
gxw
cce4b1d956
MIPS64: Fix dnrm2_tiny testcase failure
2022-07-11 19:18:38 +08:00
Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
2022-07-05 00:01:09 +02:00
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
2022-07-02 23:47:17 +02:00
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
...
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
bc3728475f
format code
2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
...
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
2022-06-29 10:14:21 +08:00
Martin Kroeker
be5500e704
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
...
POWER: fix issues with the small matrix kernel
2022-06-28 16:09:36 +02:00
Martin Kroeker
92275a7902
Merge pull request #3642 from nursik/develop
...
Add ARM64 support for Windows
2022-06-28 16:05:11 +02:00
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
...
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
...
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
...
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
...
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Xianyi Zhang
5e9a912591
Merge branch 'develop' into risc-v
2022-06-06 14:12:09 +08:00
Xianyi Zhang
968e1f51d8
Update RISC-V Intrinsic API.
2022-06-06 13:52:21 +08:00
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
2022-06-02 16:53:54 +02:00
Martin Kroeker
dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
2022-05-20 11:23:30 +02:00
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
...
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
2022-05-12 11:17:33 -05:00
Martin Kroeker
84cb58b7fb
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
2022-04-30 15:28:38 +02:00
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
2022-04-16 10:04:27 +02:00
Martin Kroeker
2bbb9f05c7
fix undefined prefetchsize
2022-04-16 10:00:10 +02:00
Martin Kroeker
115bc9b98f
CortexX1 is ARMV8 like A7x
2022-03-28 17:28:29 +02:00
Martin Kroeker
b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
2022-03-27 15:29:20 +02:00
Martin Kroeker
40302558ed
Remove extraneous (and wrong) definition of sbgemm_r on x86_64
2022-03-23 20:05:32 +01:00
Caroline Newcombe
5cc1111383
fix unsafe read of Y in assembly kernel
2022-03-11 11:56:33 -06:00
Xianyi Zhang
45786b05da
Merge branch 'develop' into risc-v
2022-02-28 11:48:02 +08:00
Wangyang Guo
225683218c
Small Matrix: use proper inline asm input constraint for AVX512 mask
2022-02-28 03:22:31 +00:00
Martin Kroeker
9c626e466e
really fix definition of SHUFFLE_MAGIC_NO
2022-02-25 15:36:02 +01:00
Martin Kroeker
0698212c8c
Remove stray $
2022-02-25 15:33:02 +01:00
Martin Kroeker
9d7429406f
Declare SHUFFLE_MAGIC_NO as const to placate clang
2022-02-25 10:05:36 +01:00
Martin Kroeker
d9894f45d3
Define sbgemm_r to fix DYNAMIC_ARCH builds
2022-02-25 10:04:00 +01:00
Martin Kroeker
522f809825
Merge pull request #3542 from martin-frbg/issue3540
...
Fix compilation for CooperLake on Windows/clang
2022-02-24 00:00:00 +01:00
Mosè Giordano
abbc947edb
Fix compilation of Skylake AVX512 kernels with GCC 6
2022-02-23 22:51:59 +00:00
Martin Kroeker
c62f8e2c01
Prevent compiler attempts to use k0 as mask register
2022-02-23 20:12:20 +01:00
Martin Kroeker
80eb581c83
Fix non-portable u_int64_t
2022-02-23 20:10:59 +01:00
Martin Kroeker
73ffabe6ba
Guard uses of _mm512_reduce_add_p?
2022-02-23 20:06:14 +01:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
addc2a7aaa
Add proper defaults for IMIN/IMAX
2022-01-27 19:56:32 +01:00
Martin Kroeker
299d4d70a3
Add default KERNEL file for Elbrus E2K arch
2022-01-22 18:59:36 +01:00
Martin Kroeker
3492bea602
Create Makefile
2022-01-22 18:57:28 +01:00
Martin Kroeker
898cf5faf3
Add Elbrus e2k architecture support
2022-01-22 18:55:10 +01:00
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
...
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank
19d435b1b3
update armv8sve + contributors
2022-01-18 08:28:31 +01:00
Bine Brank
f158d59087
adapt CMake
2022-01-17 22:36:48 +01:00
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
2022-01-16 21:40:56 +01:00
Bine Brank
0fb6cc07bf
fix ztrsm lt/ut copy
2022-01-16 21:39:57 +01:00
Bine Brank
f1315288a8
add sve ztrsm
2022-01-15 22:27:25 +01:00
Bine Brank
aaa2b1a861
fix sve dtrsm kernels
2022-01-15 21:02:14 +01:00
Bine Brank
8071e179f1
add remaining sve trsm copy kernels
2022-01-11 21:16:38 +01:00
Bine Brank
f87468ac91
trsm_lncopy_sve
2022-01-10 21:45:37 +01:00
Bine Brank
e8939b3d30
sve trsmRN and trsmRT
2022-01-10 20:42:20 +01:00
Bine Brank
098672b51b
add trsm_kernel_LT_sve
2022-01-09 20:11:47 +01:00
Bine Brank
be7e55880c
sve trsm_kernel_LN
2022-01-09 19:40:04 +01:00
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
...
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-07 00:28:17 +00:00
Bine Brank
bb33446b40
fix makefile.L3
2022-01-06 10:26:11 +01:00
Bine Brank
f33543d029
combine zchemm into single file
2022-01-05 14:42:37 +01:00
Bine Brank
0c91d043ae
adapt CMake for SVE
2022-01-05 14:36:39 +01:00
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
Bine Brank
18102ae8c3
add cgemm ctrmm sve kernels
2022-01-05 09:09:18 +01:00
Bine Brank
87537b8c55
modify sve zgemmcopy kernels
2022-01-05 09:07:28 +01:00
Bine Brank
d30157d891
update configuration of kernels for A64FX and ARMV8SVE
2022-01-05 09:00:54 +01:00
Bine Brank
07fa6fa3b1
configure Makefile for sve
2022-01-05 08:57:51 +01:00
Bine Brank
2e2c02b762
fix sve ztrmm kernel
2022-01-04 14:42:07 +01:00
Bine Brank
68c414d3a6
ztrmm sve copy functions
2022-01-04 14:40:59 +01:00
Bine Brank
ce329ab686
add sve zhemm copy routines
2022-01-03 15:56:05 +01:00
Bine Brank
0140373802
add sve ztrmm
2022-01-02 19:15:33 +01:00
Bine Brank
f7b6912868
ztrmm sve copy kernels
2021-12-30 21:00:16 +01:00
Bine Brank
40b14e4957
fix zgemm kernel
2021-12-29 11:42:04 +01:00
Bine Brank
6ec4aab875
zgemm sve copy routines
2021-12-26 17:05:46 +01:00
Bine Brank
878064f394
sve zgemm kernel
2021-12-26 08:44:05 +01:00
Bine Brank
683a7548bf
added macros for sve zgemm kernels
2021-12-25 11:46:41 +01:00
Martin Kroeker
7b146e590c
fix function typecast
2021-12-24 20:01:52 +01:00
Martin Kroeker
e9a0e52201
fix function typecast
2021-12-24 20:00:50 +01:00
Martin Kroeker
d1ee6ff73f
fix function typecasts
2021-12-21 18:45:28 +01:00
Bine Brank
e3c9947c0f
prepare kernel for sve zgemm
2021-12-21 11:19:27 +01:00
gxw
8d9b9c6b2a
loongarch64: Optimize dgemm_kernel
2021-12-21 09:33:06 +08:00
Wu Zhigang
92b7b949dd
fix bug in zscal function
...
memset can not be used in zscal because of
the stride parameters.
Signed-off-by: Wu Zhigang <zhigang.wu@starfivetech.com>
2021-12-15 01:23:30 -08:00
Martin Kroeker
b0a590f4fe
Merge pull request #3475 from wjc404/optimize-A53-dgemm
...
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 19:09:08 +01:00
Martin Kroeker
f4d1f0333b
Merge pull request #3474 from rafaelcfsousa/rafael/cmake_power
...
Add CMake support for Power
2021-12-12 19:08:27 +01:00
Jia-Chen
b610d2de37
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 17:22:52 +08:00
Martin Kroeker
697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
...
Add sgemm part for Arm SVE
2021-12-11 20:35:22 +01:00
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
2021-12-11 16:37:23 +01:00
Bine Brank
774267fdac
adjust Makefile.L3 for SVE
2021-12-11 16:35:08 +01:00
Rafael Cardoso Fernandes Sousa
23a7561353
Fix error cmake (small kernels)
2021-12-09 09:57:39 -06:00
Martin Kroeker
5378046abd
roll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH
2021-12-06 19:43:54 +01:00
Bine Brank
a1fea1fe2a
sgemm v2x8 SVE kernel
2021-12-05 18:47:29 +01:00
Bine Brank
abe1ce3434
strmm sve v1x8 kernel
2021-12-05 14:03:08 +01:00
Martin Kroeker
54d321d742
Merge pull request #3466 from rafaelcfsousa/rafael/small_matrix_p10
...
[POWER] Add small matrix for sgemm/dgemm on Power10
2021-12-03 12:12:20 +01:00
Martin Kroeker
0882db30a2
Merge pull request #3455 from cenewcombe/develop
...
Fix unsafe read during final iteration of zsymv_L_sse2.S
2021-12-03 10:01:20 +01:00
Bine Brank
0de36f7b5c
trmm sve copy fucntions for single precision
2021-11-29 21:25:05 +01:00
Rafael Cardoso Fernandes Sousa
c78fdcc80d
[POWER] Add support for SMALL_MATRIX_OPT
2021-11-28 12:41:16 -06:00
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
...
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Martin Kroeker
bcfbdc81b2
Merge pull request #3459 from rafaelcfsousa/fix_cmake
...
Fix issues when building OpenBLAS with cmake
2021-11-26 15:19:24 +01:00
Bine Brank
1af73ce38e
Adapt CMake for SVE
2021-11-26 10:35:01 +01:00
Martin Kroeker
e7fca060db
Merge pull request #3457 from wjc404/optimize-A53-dgemm
...
MOD: optimize zgemm on cortex-A53/cortex-A55
2021-11-26 10:30:47 +01:00
Jia-Chen
5c1cd5e0c2
MOD: add comments to a53 zgemm kernel
2021-11-25 22:48:48 +08:00
Rafael Cardoso Fernandes Sousa
d5c9353f1b
Modify the order that cmake set the KERNEL variables (generic now is fallback)
2021-11-24 20:08:35 -06:00
Jia-Chen
9f59b19fcd
MOD: optimize zgemm on cortex-A53/cortex-A55
2021-11-24 21:51:45 +08:00
Bine Brank
531a28b6a0
removed unused code (compiler warnings)
2021-11-22 10:12:34 +01:00
Bine Brank
9b9cb90bb1
modify Makefile for SVE copy
2021-11-22 09:54:20 +01:00
Bine Brank
9388f05a3c
configure SVE Makefile
2021-11-21 18:33:43 +01:00
Bine Brank
b58d4f31ab
some clean-up & commentary
2021-11-21 14:56:27 +01:00
Martin Kroeker
b7df500106
Add generic mips32 target
2021-11-20 17:31:51 +01:00
Bine Brank
e6ed4be02e
symm SVE copy rutines
2021-11-20 16:35:29 +01:00
Caroline Newcombe
feeb8283a5
Fix unsafe read during final iteration of zsymv_L_sse2.S
2021-11-19 14:29:32 -06:00
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
2021-11-18 21:14:43 +08:00
Bine Brank
3c7eed0e53
add remaining trmm copy rutines for SVE
2021-11-14 16:00:10 +01:00
Bine Brank
7d996b1c36
dtrmm_utcopy sve function
2021-11-13 18:48:53 +01:00
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
2021-11-07 20:37:51 +01:00
Bine Brank
7093372e32
add ARMV8SVE target
2021-11-01 22:53:21 +01:00
Bine Brank
a8fbdbac34
fix sve dgemm kernel + sve dtrmm
2021-10-31 10:24:25 +01:00
Bine Brank
746b4f0f17
added SVE ncopy and tcopy
2021-10-30 12:11:44 +02:00
Bine Brank
1a10d3e09d
add sve dgemm prototype
2021-10-27 16:37:18 +02:00