Commit Graph

5876 Commits

Author SHA1 Message Date
H.J. Lu 53ee0b76bb x86: Enable Intel CET
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
Martin Kroeker dc6b04c375
Merge pull request #3206 from martin-frbg/lapack480535
Import packing improvements to LAPACK xLAQR from Reference-LAPACK (PR 480+535)
2021-04-30 21:42:44 +02:00
pnp 3d4ccd2a13 fix for build error 2021-04-30 12:25:33 -04:00
pnp c59652f0ce optimize on sgemv_n for small n 2021-04-30 12:14:58 -04:00
Martin Kroeker 87d2e314db
Import packing improvements in LAPACK xLAQR from Reference-LAPACK PR 480+535 2021-04-30 13:50:55 +02:00
Martin Kroeker 3a30c12019
Merge pull request #25 from xianyi/develop
rebase
2021-04-30 13:47:17 +02:00
Martin Kroeker c9a82f54d1
Merge pull request #3204 from martin-frbg/lapack506
Correct INFO value returned by SLASQ2/DLASQ2 (Reference-LAPACK 506)
2021-04-30 13:25:48 +02:00
Martin Kroeker 444cb78be5
correct INFO value (Reference-LAPACK 506) 2021-04-30 09:26:54 +02:00
Martin Kroeker 171c20e3b6
Merge pull request #3202 from martin-frbg/issue3201
Fix division by zero in the non-x86 codepath of C/ZROTG
2021-04-29 18:58:27 +02:00
Martin Kroeker c5fb91f1bc
Fix division by zero in the non-x86 codepath 2021-04-29 09:47:18 +02:00
Martin Kroeker 9a36a283d3
Merge pull request #3199 from martin-frbg/lapack537
Add LAPACKE fixes from Reference-LAPACK PR 537
2021-04-29 05:39:50 +02:00
Martin Kroeker 7e35d25ea0
Merge pull request #3198 from martin-frbg/lapack539
Apply fixes from Reference-LAPACK PR468 and 539 for array declarations in ?ORGBR/?UNGBR
2021-04-29 05:39:35 +02:00
Martin Kroeker 3704f5e5b0
Add missing break statements in the ?lascl functions 2021-04-28 20:56:55 +02:00
Martin Kroeker 6b76066632
Add const qualifiers 2021-04-28 20:55:37 +02:00
Martin Kroeker 2b01132515
Clean up misdeclaration of the dummy stand-in for A in ?ORGBR/?UNGBR workspace queries (Reference-LAPACK PR 468 and 530) 2021-04-28 19:20:08 +02:00
Martin Kroeker 8e95a1e18d
Merge pull request #3195 from martin-frbg/lapack536
Apply lapack-testing fix from Reference-LAPACK PR536
2021-04-28 18:17:25 +02:00
Wangyang Guo aa7b3dc3db GEMM: skylake: improve the performance when m is small 2021-04-28 13:56:06 +00:00
Martin Kroeker 13a29d13fd
Apply lapack-testing fix from Reference-LAPACK PR536
fixes changing back from a single OMP thread for error exit testing to the originally requested number of threads for computational tests
2021-04-27 15:48:22 +02:00
Martin Kroeker a6c2cb8417
Merge pull request #3193 from martin-frbg/lapack538
Apply lapack-testing fixes from Reference-LAPACK PR538
2021-04-27 15:40:51 +02:00
Martin Kroeker d511a7bb4f
Merge pull request #3191 from martin-frbg/issue3188
Delay creation of the (soft)link until after the library has been built
2021-04-27 13:35:16 +02:00
Martin Kroeker 3526ff2507
Apply fixes from Reference-LAPACK PR538 2021-04-27 12:52:49 +02:00
Martin Kroeker adcfe7b789
Merge pull request #3190 from martin-frbg/issue3128-2
Replace spurious AVX512 requirement in the Haswell drot microkernel with an AVX2/FMA3 guard
2021-04-27 06:36:28 +02:00
damonyu ceb44bef14 update the intrinsic api to the offical name. 2021-04-27 11:12:29 +08:00
damonyu1989 ed473267df
Merge pull request #1 from xianyi/develop
update
2021-04-27 10:53:59 +08:00
Martin Kroeker 0608bc5d82
delay creation of the softlink until after the library has been created 2021-04-26 22:32:23 +02:00
Martin Kroeker 3d511f0e66
replace spurious avx512 requirement with fma check 2021-04-26 21:55:30 +02:00
Martin Kroeker 0b8a436af9
Add mixed clang/ifort build on OSX to Azure CI (#3185)
* Add mixed clang/ifort build on OSX to the Azure CI config based on https://github.com/oneapi-src/oneapi-ci
(and remove debugging tools from the clang+gfortran job)

* Remove extraneous libgfortran dependency of ifort builds

* remove FEXTRALIB from link line of shared library as ifort keeps track of dependencies (and they are different for a .dylib than what f_check got for an executable)
2021-04-22 02:11:20 +02:00
Martin Kroeker 352efdd13a
Merge pull request #24 from xianyi/develop
rebase
2021-04-20 21:30:28 +02:00
Martin Kroeker 4855af02a3
Merge pull request #3184 from martin-frbg/ctestfix
Fix obscure ctest crashes on OSX and add OSX builds to Azure CI
2021-04-20 07:31:07 +02:00
Martin Kroeker 94a5a1f0f1
Add OSX build variations to Azure CI 2021-04-19 22:27:08 +02:00
Martin Kroeker 751d127d7c
Include cblas_test.h to achieve int/long size change with INTERFACE64 2021-04-19 22:26:34 +02:00
Martin Kroeker fc101b67e5
Merge pull request #23 from xianyi/develop
rebase
2021-04-19 22:24:12 +02:00
Martin Kroeker b0239a05fd
Merge pull request #3183 from martin-frbg/2715-x
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
2021-04-16 14:52:12 +02:00
Martin Kroeker 623d580b4c
Restore __volatile__ keyword 2021-04-16 10:27:32 +02:00
Martin Kroeker 974acb39ff
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
POWER10: Improve dgemm performance
2021-04-14 22:43:02 +02:00
Rajalakshmi Srinivasaraghavan 2379abaa5e POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
2021-04-13 22:30:06 -05:00
Martin Kroeker 3caf781d7c
Merge pull request #3179 from RajalakshmiSR/zgemvp10
POWER10: Optimized zgemv
2021-04-11 10:01:09 +02:00
Rajalakshmi Srinivasaraghavan 55bb9f639a POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
2021-04-10 19:00:24 -05:00
Martin Kroeker 0dba04bb58
Merge pull request #3178 from martin-frbg/fix2864
Fix unwanted fallback to implicit typing in slanv2/dlanv2
2021-04-09 13:38:05 +02:00
Martin Kroeker e96f5e3c65
Fix implicit typing of new variable TWO 2021-04-09 10:04:15 +02:00
Martin Kroeker 558724e99f
Fix implicit typing of new variable TWO 2021-04-09 10:03:31 +02:00
Martin Kroeker 067c96a873
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
2021-04-07 08:22:42 +02:00
Martin Kroeker 4b380c0b40
Merge pull request #3175 from LYP951018/develop
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
2021-04-07 08:22:28 +02:00
Martin Kroeker 2dfb24730d
Use "old" compute(24) function with clang due to register limitations 2021-04-06 19:58:32 +02:00
刘雨培 725432efaa pass NO_AVX512 macro def 2021-04-07 00:10:41 +08:00
Martin Kroeker a2216ef19f
Merge pull request #3173 from martin-frbg/dyna-sse3
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
2021-04-05 13:39:17 +02:00
Martin Kroeker 5332cbae18
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds 2021-04-04 23:12:17 +02:00
Martin Kroeker 209b026e46
Merge pull request #3172 from martin-frbg/lapack477-final
Copy missing fixes from the final revision of Reference-LAPACK PR477
2021-04-04 20:19:09 +02:00
Martin Kroeker 1ae607beca
Update Makefile.x86_64 2021-04-04 12:31:22 +02:00
Martin Kroeker d393f1923f
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
2021-04-03 22:18:15 +02:00