Martin Kroeker
0c2aa0bed7
Fix implicit conversions and unused variables (Reference-LAPACK PR 703)
2022-11-13 20:29:08 +01:00
Martin Kroeker
2226a82f2e
Fix leading dimension check of eigen-/Schur vectors (Reference-LAPACK PR 665)
2022-11-13 17:50:49 +01:00
Martin Kroeker
645633e321
Fix leading dimension check of eigen-/Schur vectors (Reference-LAPACK PR 665)
2022-11-13 17:48:02 +01:00
Martin Kroeker
ee6643bc6b
Merge pull request #3815 from martin-frbg/lapack690
...
Fix workspace calculation in the left-looking variant of GEQRF (Reference-LAPACK PR690)
2022-11-13 16:26:31 +01:00
Martin Kroeker
90d7451df5
Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742)
2022-11-13 15:10:00 +01:00
Martin Kroeker
eba1112e38
Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742)
2022-11-13 15:03:39 +01:00
Martin Kroeker
23cfe58ee3
Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742)
2022-11-13 14:55:45 +01:00
Martin Kroeker
6dcf737c5d
Add NaN check functions for trapezoidal matrices (Reference-LAPACK PR738+742)
2022-11-13 14:51:39 +01:00
Martin Kroeker
3e2d52c502
Fix workspace calculation in GEQRF/GERQF (Reference-LAPACK PR 638)
2022-11-13 13:00:52 +01:00
Martin Kroeker
cb48c29b6f
Fix workspace calculation (Reference-LAPACK PR690)
2022-11-13 12:49:59 +01:00
Martin Kroeker
8c99d5d1b6
Merge pull request #3796 from martin-frbg/gemmt
...
Add a trivial GEMMT implementation based on a looped GEMV
2022-11-12 19:06:05 +01:00
Martin Kroeker
b53b0f6bb6
Merge pull request #3802 from martin-frbg/relafix
...
Fix cmake compilation of ReLAPACK and expose its INCLUDE_ALL option
2022-11-12 15:11:31 +01:00
Martin Kroeker
9a31faf420
Merge pull request #3811 from martin-frbg/issue3805
...
Improve gcc arch option selecting for Neoverse cpus
2022-11-10 10:57:33 +01:00
Martin Kroeker
e326ef9f0f
Merge pull request #3812 from bartoldeman/cscal-zscal-skylakex
...
Add [cz]scal microkernels for SKYLAKEX
2022-11-10 08:00:27 +01:00
Martin Kroeker
827a9c6079
Merge pull request #3814 from martin-frbg/traviswait-3
...
Travis Ci: Increase the wait time for ppc jobs again
2022-11-10 08:00:02 +01:00
Martin Kroeker
d141cf341f
Increase the wait time for ppc jobs again
2022-11-09 20:31:30 +01:00
Martin Kroeker
aad79ab516
Merge pull request #3813 from martin-frbg/azuredynosx
...
AzureCi: Limit cpu models in OSX_dynarch_cmake to keep it from running out of time
2022-11-09 20:29:17 +01:00
Martin Kroeker
09dd90ca09
Limit cpu models in OSX_dynarch_cmake
2022-11-09 15:35:57 +01:00
Martin Kroeker
f14435cb4b
Merge pull request #3810 from martin-frbg/fix3800
...
Add fallbacks to RaptorLake entry from PR3800
2022-11-09 15:28:12 +01:00
Bart Oldeman
6c1043eb41
Add [cz]scal microkernels for SKYLAKEX
...
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.
Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350
from #2610 (which is masked by
commit 086d87a30
). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker
be546ec1ad
Add gcc options for Neoverse cpus
2022-11-09 11:00:41 +01:00
Martin Kroeker
c957ad684e
Bump gcc requirement for NeoverseN2 and V1 to 10.4
2022-11-09 10:46:43 +01:00
Martin Kroeker
1865b15240
Add fallbacks to RaptorLake entry
2022-11-09 10:31:30 +01:00
Martin Kroeker
e6204d254f
Update CMakeLists.txt
2022-11-08 16:21:11 +01:00
Martin Kroeker
2e64722681
Update Makefile.rule
2022-11-08 16:20:17 +01:00
Martin Kroeker
aa2a2d9c01
Conditionally compile files that may get replaced by ReLAPACK
2022-11-08 12:04:46 +01:00
Martin Kroeker
1b77764182
Conditionally leave out bits of LAPACK to be overridden by ReLAPACK
2022-11-08 12:02:59 +01:00
Martin Kroeker
fcda11c1ae
Revert special handling of GEMMT
2022-11-05 23:48:50 +01:00
Martin Kroeker
4743d80c22
Merge pull request #3800 from thrasibule/raptorlake
...
add raptor lake ids
2022-11-05 18:05:48 +01:00
Martin Kroeker
5d02f2e83e
Merge pull request #3806 from martin-frbg/dyn_coop
...
Fix OPENBLAS_CORETYPE=COOPERLAKE not working in DYNAMIC_ARCH builds
2022-11-03 21:37:39 +01:00
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
2022-11-03 18:13:35 +01:00
Martin Kroeker
c970717157
fix missing t in xgemmt rule
...
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker
62a44c9c5d
Merge pull request #3804 from martin-frbg/issue3803
...
Remove excess initializer (leftover from rework of PR 3793)
2022-10-31 20:42:33 +01:00
Martin Kroeker
c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793)
2022-10-31 16:57:03 +01:00
Martin Kroeker
65338a9493
Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
...
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
2022-10-30 18:56:10 +01:00
Martin Kroeker
ea6c5f3cf5
Add option RELAPACK_REPLACE
2022-10-30 12:55:23 +01:00
Martin Kroeker
d39978cd7f
Fix includes
2022-10-30 12:53:19 +01:00
Martin Kroeker
ce7ea72de1
Fix include paths
2022-10-30 12:50:51 +01:00
Martin Kroeker
3ebf5d219d
handle INCLUDE_ALL and optional function prefixes
2022-10-30 12:49:07 +01:00
Martin Kroeker
a082d54035
Rename to avoid conflict with OpenBLAS' toplevel config.h
2022-10-30 12:47:01 +01:00
Martin Kroeker
eeebaf2294
move INCLUDE_ALL to (c)make options
2022-10-30 12:45:54 +01:00
Martin Kroeker
06b022b139
Fix ReLAPACK source selection
2022-10-30 12:42:36 +01:00
Martin Kroeker
03bd1157d8
Merge pull request #3793 from imzhuhl/new_sbgemm
...
New sbgemm implementation for Neoverse N2
2022-10-30 12:09:46 +01:00
Guillaume Horel
e27ad3a6cc
add raptor lake ids
2022-10-28 11:45:43 -04:00
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
Bart Oldeman
e7e3aa2948
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
...
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f
for more information.
This uses the same syntax as used in 22aa81f
for zarch (s390x).
2022-10-27 18:16:43 -04:00
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2022-10-27 14:10:26 +08:00
Martin Kroeker
e7fd8d21a6
Add GEMMT based on looped GEMV
2022-10-26 15:33:58 +02:00
Honglin Zhu
843e9fd0b9
Fix typo error
2022-10-26 17:06:33 +08:00
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
...
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00