Martin Kroeker
7ca835a82c
address clang array overflow warning
2024-08-10 13:44:56 +02:00
Martin Kroeker
f1c9803f9a
add proper return statement
2024-08-04 00:14:31 +02:00
Martin Kroeker
60abcc3991
add proper return statement
2024-08-04 00:13:31 +02:00
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
...
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker
dfbc2348a8
fix NAN handling
2024-07-20 18:27:15 +02:00
Martin Kroeker
c064319ecb
fix alpha=NAN case
2024-07-20 17:42:31 +02:00
Martin Kroeker
c2ffd90e8c
make NAN handling depend on dummy2 parameter
2024-07-20 17:31:00 +02:00
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
2024-07-12 16:09:30 +08:00
Martin Kroeker
68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x
2024-06-22 21:08:57 +02:00
Martin Kroeker
0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x
2024-06-22 21:07:43 +02:00
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
2024-06-21 09:35:56 +02:00
Martin Kroeker
dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 ( #4748 )
...
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker
1abafcd9b2
handle corner cases involving NAN and/or INF
2024-06-06 23:59:43 +02:00
Martin Kroeker
020b3e1682
fix handling of INF arguments
2024-06-01 00:51:18 +02:00
Martin Kroeker
ce130f11d2
Update zscal.c
2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93
more fixes for infinite x
2024-05-31 14:34:49 +02:00
Martin Kroeker
ad2b5c67c8
fix another corner case involving infinity
2024-05-31 01:06:58 +02:00
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
...
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
...
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
Martin Kroeker
8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)
2024-02-24 23:57:50 +01:00
Martin Kroeker
be5e18c6f9
Add kernel definitions for CSUM and ZSUM
2024-02-24 23:55:43 +01:00
gxw
969601a1dc
X86_64: Fixed bug in zscal
...
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
...
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
2024-01-12 00:12:46 +01:00
Martin Kroeker
cf8b03ae8b
Use NAN rather than SNAN for portability
2024-01-07 23:09:57 +01:00
Martin Kroeker
def4996170
Fix handling of NAN and INF arguments
2024-01-07 15:29:42 +01:00
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2023-12-15 09:58:44 +01:00
Bart Oldeman
c34e2cf380
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
...
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.
For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker
22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC ( #4327 )
...
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2023-11-19 00:04:31 +01:00
Bart Oldeman
f8ad5344c2
Fix casum fallback kernel.
...
This kernel is only used on Skylake+ if the kernel with AVX512
intrinsics can't be used, but used the variable x1 incorrectly
in the tail end of the loop, as it is still at the initial
value instead of where x points to.
This caused 55 "other error"s in the LAPACK tests
(https://github.com/OpenMathLib/OpenBLAS/issues/4282 )
This change makes casum.c as similar as possible as zasum.c,
because zasum.c does this correctly.
2023-11-17 23:53:56 +00:00
Martin Kroeker
9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well
2023-11-04 22:10:06 +01:00
Martin Kroeker
675cd551da
fix improper function prototypes (empty parentheses)
2023-09-30 12:56:38 +02:00
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2023-08-25 17:22:51 +02:00
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:01:50 +02:00
Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2023-08-04 12:25:34 +02:00
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2023-07-29 12:24:30 +02:00
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2023-07-29 12:23:57 +02:00
Martin Kroeker
9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09
2023-07-29 11:50:16 +02:00
Martin Kroeker
fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09
2023-07-29 11:48:38 +02:00
Martin Kroeker
2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-26 19:39:11 +02:00
Honglin Zhu
a76afdc047
Compatible with older version of GNU make
2023-05-20 13:58:23 +08:00
Honglin Zhu
0b83088887
spr dynamic arch support
2023-05-19 10:48:18 +08:00
Honglin Zhu
f249ccb741
Fix spr sbgemm error
2023-05-19 10:48:18 +08:00
Martin Kroeker
84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-20 23:24:52 +02:00
Martin Kroeker
c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:44 +02:00
Martin Kroeker
c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:14 +02:00
Martin Kroeker
66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:45 +02:00
Martin Kroeker
bb6d6735bf
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:15 +02:00
Martin Kroeker
d18efaed20
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:43:43 +02:00
Martin Kroeker
99f6d31ed5
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:42:55 +02:00