Martin Kroeker
7d506984fa
fix assignment of default CSUM kernel
2024-02-25 17:57:11 +01:00
Martin Kroeker
12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s)
2024-02-25 17:55:36 +01:00
Martin Kroeker
8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)
2024-02-24 23:57:50 +01:00
Martin Kroeker
be5e18c6f9
Add kernel definitions for CSUM and ZSUM
2024-02-24 23:55:43 +01:00
gxw
990507e3b8
LoongArch64: Opt zgemv with LASX
2024-02-22 11:58:02 +08:00
gxw
d51ffec3a2
LoongArch64: Opt cgemv with LASX
2024-02-22 11:56:04 +08:00
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
2024-02-21 15:28:47 +08:00
Sergei Lewis
ba17758c02
fix axpy implementations where y has a stride of 0
2024-02-16 16:00:38 +00:00
Dmitry Mikushin
d0f5dc763b
Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500
2024-02-12 02:29:58 +01:00
Sergei Lewis
ff1523163f
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
2024-02-09 12:59:14 +00:00
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
...
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker
b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
...
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker
6d8a273cca
Handle zero increment(s) in C910V ?AXPBY ( #4483 )
...
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker
dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
...
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker
dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
...
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw
adde725321
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-04 14:44:47 +08:00
gxw
7bc93d95a1
LoongArch64: Opt {c/z}axpby
2024-02-04 11:23:31 +08:00
gxw
1e1f487dc7
LoongArch64: Fixed {s/d}axpby
2024-02-04 09:41:37 +08:00
Martin Kroeker
4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels
2024-02-04 01:05:03 +01:00
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
2024-02-01 18:46:07 -06:00
austinpagan
461cf9083c
Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code
2024-02-01 12:40:04 -06:00
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
2024-02-01 12:24:25 -06:00
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2024-02-01 08:11:43 -06:00
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
2024-02-01 11:29:41 +00:00
Sergei Lewis
a3b0ef6596
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
2024-02-01 10:32:00 +00:00
Martin Kroeker
d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
...
utest: Add tests for zscal
2024-01-31 14:19:19 +01:00
gxw
969601a1dc
X86_64: Fixed bug in zscal
...
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker
98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
...
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2024-01-30 09:13:16 -06:00
gxw
83ce97a4ca
LoongArch64: Handle NAN and INF
2024-01-30 17:17:30 +08:00
gxw
a79d117405
LoogArch64: Fixed bug for {s/d}amin
2024-01-30 11:32:57 +08:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker
4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
...
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw
276e3ebf9e
LoongArch64: Add dzamax and dzamin opt
2024-01-26 10:03:50 +08:00
Martin Kroeker
a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
...
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov
9c49a81d54
Resolve conflicts
2024-01-23 19:08:53 +03:00
kseniyazaytseva
e1afb23811
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
...
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar
deecfb1a39
Merge branch 'risc-v' into img-riscv64-zvl128b
2024-01-19 12:26:38 +00:00
kseniyazaytseva
5222b5fc18
Added axpby kernels for GENERIC RISC-V target
2024-01-18 23:22:26 +03:00
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
...
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker
88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
...
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00
Sergei Lewis
9edb805e64
fix builds with t-head toolchains that use old versions of the intrinsics spec
2024-01-16 14:33:08 +00:00
Martin Kroeker
0d2e486edf
Handle NAN and INF
2024-01-15 11:18:59 +01:00
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
...
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
...
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker
20413ee6ec
Update zscal.c
2024-01-12 13:11:13 +01:00
Martin Kroeker
b57627c27f
Handle NAN and INF
2024-01-12 12:03:08 +01:00
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
2024-01-12 00:12:46 +01:00
Martin Kroeker
7df363e1e2
temporarily disable the MSA C/ZSCAL kernels
2024-01-12 00:08:52 +01:00
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2024-01-08 14:16:09 -06:00
Martin Kroeker
1c31f56e5a
Handle NAN
2024-01-08 16:11:25 +01:00
Martin Kroeker
7ee1ee38e2
Handle NaN in input
2024-01-08 14:20:07 +01:00
Martin Kroeker
f637e12713
Handle INF and NAN
2024-01-08 09:52:38 +01:00
Martin Kroeker
25b0c48082
Update zscal.c
2024-01-08 09:49:18 +01:00
Martin Kroeker
5e7f714e93
Update zscal.c
2024-01-08 08:17:40 +01:00
Martin Kroeker
cf8b03ae8b
Use NAN rather than SNAN for portability
2024-01-07 23:09:57 +01:00
Martin Kroeker
f0808d856b
Handle NAN in input
2024-01-07 20:27:29 +01:00
Martin Kroeker
acf17a825d
Handle NAN in input
2024-01-07 20:26:16 +01:00
Martin Kroeker
c9df62e883
Fix handling of NAN
2024-01-07 17:49:40 +01:00
Martin Kroeker
def4996170
Fix handling of NAN and INF arguments
2024-01-07 15:29:42 +01:00
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
...
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2023-12-29 17:30:57 +08:00
guxiwei
e771be185e
Optimize copy functions with lsx.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2023-12-29 17:30:57 +08:00
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2023-12-29 17:30:57 +08:00
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2023-12-29 17:30:57 +08:00
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
...
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2023-12-19 13:11:06 +00:00
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
...
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2023-12-15 09:58:44 +01:00
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2023-12-12 09:32:49 -06:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2023-12-07 14:36:26 +08:00
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2023-12-07 14:36:26 +08:00
yancheng
d4c96a35a8
loongarch64: Add optimizations for axpy and axpby.
2023-12-07 14:36:26 +08:00
yancheng
360acc0a41
loongarch64: Add optimizations for swap.
2023-12-07 14:36:26 +08:00
yancheng
174c25766b
loongarch64: Add optimizations for copy.
2023-12-07 14:36:26 +08:00
yancheng
49829b2b7d
loongarch64: Add optimizations for iamin.
2023-12-07 14:36:07 +08:00
yancheng
be83f5e4e0
loongarch64: Add optimizations for iamax.
2023-12-07 14:36:07 +08:00
yancheng
e3fb2b5afa
loongarch64: Add optimizations for imin.
2023-12-07 14:36:07 +08:00
yancheng
e46b48e372
loongarch64: Add optimizations for imax.
2023-12-07 14:36:07 +08:00
yancheng
702fc1d56d
loongarch64: Add optimization for min.
2023-12-07 14:36:07 +08:00
yancheng
346b384d1c
loongarch64: Add optimization for max.
2023-12-07 14:36:07 +08:00
yancheng
ff2ecc6cda
loongarch64: Add optimization for amin.
2023-12-07 14:36:07 +08:00
yancheng
265b5f2e80
loongarch64: Add optimizations for amax.
2023-12-07 14:36:07 +08:00
yancheng
993ede7c70
loongarch64: Add optimizations for scal.
2023-12-07 14:36:07 +08:00
Octavian Maghiar
4a12cf53ec
[RISC-V] Improve RVV kernel generator LMUL usage
...
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
...
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Martin Kroeker
39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
...
Add some refines and optimizations for LoongArch.
2023-11-29 08:22:25 +01:00
Shiyou Yin
9fe07d82fd
loongarch: Add LSX optimization for dot.
2023-11-28 20:24:18 +08:00
Shiyou Yin
13b8c44b44
loongarch: Add optimization for dsdot kernel.
2023-11-28 20:24:16 +08:00
Shiyou Yin
3def6a8143
loongarch: Add LASX optimization for dot.
2023-11-28 20:24:14 +08:00
Bart Oldeman
c34e2cf380
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
...
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.
For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker
22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC ( #4327 )
...
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2023-11-19 00:04:31 +01:00
Bart Oldeman
f8ad5344c2
Fix casum fallback kernel.
...
This kernel is only used on Skylake+ if the kernel with AVX512
intrinsics can't be used, but used the variable x1 incorrectly
in the tail end of the loop, as it is still at the initial
value instead of where x points to.
This caused 55 "other error"s in the LAPACK tests
(https://github.com/OpenMathLib/OpenBLAS/issues/4282 )
This change makes casum.c as similar as possible as zasum.c,
because zasum.c does this correctly.
2023-11-17 23:53:56 +00:00
Martin Kroeker
04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915
2023-11-04 23:48:59 +01:00
Martin Kroeker
9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well
2023-11-04 22:10:06 +01:00
Martin Kroeker
3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
2023-11-03 14:55:31 +01:00
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
...
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan
9f42570e33
POWER: Increase macro size limit for AIX
...
This patch increases the macro size limit from 4096 to 16384 to
allow compiling larger assembly files in AIX.
Tested with GCC and IBM Open XL C.
2023-10-12 12:37:40 -05:00
Martin Kroeker
9f49aef91b
Merge pull request #4255 from RajalakshmiSR/AIX-P10
...
POWER10: Fix compilation issues with Open XL C
2023-10-12 18:59:17 +02:00
Martin Kroeker
e7d05402e0
Fix up S/D GEMM copy function definitions after #4009
2023-10-12 14:24:53 +02:00
Rajalakshmi Srinivasaraghavan
71d733e5f7
POWER: Avoid m4 conversions for C files
...
This patch removes intermediate m4 conversions used in sbgemm
compilation as it is not needed for .c files.
Tested on AIX with gcc and IBM Open XL C.
2023-10-11 17:18:42 -05:00
Rajalakshmi Srinivasaraghavan
82fc29a57a
POWER10: Fallback to POWER8 functions
...
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions. Tested on AIX using gcc and Open XL C.
2023-10-11 17:04:42 -05:00
Rajalakshmi Srinivasaraghavan
db0805906b
powerpc: Fix build errors with Open XL C
...
This patch fixes errors when using Open XL C compiler on AIX.
Tested with gcc/xlf and ibm-clang/xlf compiler combinations.
2023-10-04 14:04:03 -05:00
Martin Kroeker
675cd551da
fix improper function prototypes (empty parentheses)
2023-09-30 12:56:38 +02:00
gxw
d15e0a055c
LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH
2023-09-27 10:05:27 +08:00
gxw
4670eb1462
LoongArch64: Add dtrsm kernel
2023-09-26 15:45:14 +08:00
gxw
f2cf929374
LoongArch64: Add sgemv kernel
2023-09-04 14:28:37 +08:00
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
...
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
...
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
Martin Kroeker
9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
...
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2023-08-26 10:17:27 +02:00
Martin Kroeker
88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
...
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2023-08-26 00:32:18 +02:00
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2023-08-26 00:30:17 +02:00
Martin Kroeker
7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers
2023-08-25 18:25:51 +02:00
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2023-08-25 17:22:51 +02:00
Martin Kroeker
8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler
2023-08-25 16:47:32 +02:00
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
Martin Kroeker
12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
...
Fix cooperlake and sapphire rapids march flags on clang
2023-08-21 15:46:35 +02:00
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
...
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
Ian McInerney
8a8a8479be
Fix cooperlake and sapphire rapids march flags on clang
...
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:01:50 +02:00
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:00:18 +02:00
Martin Kroeker
c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:58:57 +02:00
Martin Kroeker
a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:56:52 +02:00
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:55:17 +02:00
Martin Kroeker
7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:53:33 +02:00