Commit Graph

6717 Commits

Author SHA1 Message Date
Martin Kroeker
eba1112e38 Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742) 2022-11-13 15:03:39 +01:00
Martin Kroeker
23cfe58ee3 Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742) 2022-11-13 14:55:45 +01:00
Martin Kroeker
6dcf737c5d Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742) 2022-11-13 14:51:39 +01:00
Martin Kroeker
8c99d5d1b6 Merge pull request #3796 from martin-frbg/gemmt
Add a trivial GEMMT implementation based on a looped GEMV
2022-11-12 19:06:05 +01:00
Martin Kroeker
b53b0f6bb6 Merge pull request #3802 from martin-frbg/relafix
Fix cmake compilation of ReLAPACK and expose its INCLUDE_ALL option
2022-11-12 15:11:31 +01:00
Martin Kroeker
9a31faf420 Merge pull request #3811 from martin-frbg/issue3805
Improve gcc arch option selecting for Neoverse cpus
2022-11-10 10:57:33 +01:00
Martin Kroeker
e326ef9f0f Merge pull request #3812 from bartoldeman/cscal-zscal-skylakex
Add [cz]scal microkernels for SKYLAKEX
2022-11-10 08:00:27 +01:00
Martin Kroeker
827a9c6079 Merge pull request #3814 from martin-frbg/traviswait-3
Travis Ci: Increase the wait time for ppc jobs again
2022-11-10 08:00:02 +01:00
Martin Kroeker
d141cf341f Increase the wait time for ppc jobs again 2022-11-09 20:31:30 +01:00
Martin Kroeker
aad79ab516 Merge pull request #3813 from martin-frbg/azuredynosx
AzureCi: Limit cpu models in OSX_dynarch_cmake to keep it from running out of time
2022-11-09 20:29:17 +01:00
Martin Kroeker
09dd90ca09 Limit cpu models in OSX_dynarch_cmake 2022-11-09 15:35:57 +01:00
Martin Kroeker
f14435cb4b Merge pull request #3810 from martin-frbg/fix3800
Add fallbacks to RaptorLake entry from PR3800
2022-11-09 15:28:12 +01:00
Bart Oldeman
6c1043eb41 Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.

Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker
be546ec1ad Add gcc options for Neoverse cpus 2022-11-09 11:00:41 +01:00
Martin Kroeker
c957ad684e Bump gcc requirement for NeoverseN2 and V1 to 10.4 2022-11-09 10:46:43 +01:00
Martin Kroeker
1865b15240 Add fallbacks to RaptorLake entry 2022-11-09 10:31:30 +01:00
Martin Kroeker
e6204d254f Update CMakeLists.txt 2022-11-08 16:21:11 +01:00
Martin Kroeker
2e64722681 Update Makefile.rule 2022-11-08 16:20:17 +01:00
Martin Kroeker
aa2a2d9c01 Conditionally compile files that may get replaced by ReLAPACK 2022-11-08 12:04:46 +01:00
Martin Kroeker
1b77764182 Conditionally leave out bits of LAPACK to be overridden by ReLAPACK 2022-11-08 12:02:59 +01:00
Martin Kroeker
fcda11c1ae Revert special handling of GEMMT 2022-11-05 23:48:50 +01:00
Martin Kroeker
4743d80c22 Merge pull request #3800 from thrasibule/raptorlake
add raptor lake ids
2022-11-05 18:05:48 +01:00
Martin Kroeker
5d02f2e83e Merge pull request #3806 from martin-frbg/dyn_coop
Fix OPENBLAS_CORETYPE=COOPERLAKE not working in DYNAMIC_ARCH builds
2022-11-03 21:37:39 +01:00
Martin Kroeker
da6e426b13 fix Cooperlake not selectable via environment variable 2022-11-03 18:13:35 +01:00
Martin Kroeker
c970717157 fix missing t in xgemmt rule
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker
62a44c9c5d Merge pull request #3804 from martin-frbg/issue3803
Remove excess initializer (leftover from rework of PR 3793)
2022-10-31 20:42:33 +01:00
Martin Kroeker
c9d78dc3b2 Remove excess initializer (leftover from rework of PR 3793) 2022-10-31 16:57:03 +01:00
Martin Kroeker
65338a9493 Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
2022-10-30 18:56:10 +01:00
Martin Kroeker
ea6c5f3cf5 Add option RELAPACK_REPLACE 2022-10-30 12:55:23 +01:00
Martin Kroeker
d39978cd7f Fix includes 2022-10-30 12:53:19 +01:00
Martin Kroeker
ce7ea72de1 Fix include paths 2022-10-30 12:50:51 +01:00
Martin Kroeker
3ebf5d219d handle INCLUDE_ALL and optional function prefixes 2022-10-30 12:49:07 +01:00
Martin Kroeker
a082d54035 Rename to avoid conflict with OpenBLAS' toplevel config.h 2022-10-30 12:47:01 +01:00
Martin Kroeker
eeebaf2294 move INCLUDE_ALL to (c)make options 2022-10-30 12:45:54 +01:00
Martin Kroeker
06b022b139 Fix ReLAPACK source selection 2022-10-30 12:42:36 +01:00
Martin Kroeker
03bd1157d8 Merge pull request #3793 from imzhuhl/new_sbgemm
New sbgemm implementation for Neoverse N2
2022-10-30 12:09:46 +01:00
Guillaume Horel
e27ad3a6cc add raptor lake ids 2022-10-28 11:45:43 -04:00
Honglin Zhu
79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Bart Oldeman
e7e3aa2948 x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.

Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.

This uses the same syntax as used in 22aa81f for zarch (s390x).
2022-10-27 18:16:43 -04:00
Honglin Zhu
4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Martin Kroeker
e7fd8d21a6 Add GEMMT based on looped GEMV 2022-10-26 15:33:58 +02:00
Honglin Zhu
843e9fd0b9 Fix typo error 2022-10-26 17:06:33 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
8c10f0abba Merge pull request #3794 from bartoldeman/benchmark-align-malloc
Benchmarks: align malloc'ed buffers.
2022-10-21 16:13:58 +02:00
Bart Oldeman
9e6b060bf3 Fix comment.
It stores the pointer, not an offset (that would be an alternative approach).
2022-10-20 20:11:09 -04:00
Bart Oldeman
9959a60873 Benchmarks: align malloc'ed buffers.
Benchmarks should allocate with cacheline (often 64 bytes) alignment
to avoid unreliable timings. This technique, storing the offset in the
byte before the pointer, doesn't require C11's aligned_alloc for
compatibility with older compilers.

For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is
not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).
2022-10-20 13:28:20 -04:00
Martin Kroeker
ad424fce08 Merge pull request #3791 from martin-frbg/issue3790
Fix pkgconfig file generation for INTERFACE64 builds
2022-10-19 07:11:33 +02:00
Martin Kroeker
5f72415f10 Suffix the pkgconfig file itself in INTERFACE64 builds 2022-10-18 20:29:24 +02:00
Martin Kroeker
747ade5adf fix INTERFACE64/USE64BITINT reporting 2022-10-18 17:28:07 +02:00
Martin Kroeker
8bacea1254 Pass libsuffix to openblas.pc and fix passing of INTERFACE64/USE64BITINT flag 2022-10-18 16:18:29 +02:00