Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 3d511f0e66
replace spurious avx512 requirement with fma check 2021-04-26 21:55:30 +02:00
Martin Kroeker 0b8a436af9
Add mixed clang/ifort build on OSX to Azure CI (#3185)
* Add mixed clang/ifort build on OSX to the Azure CI config based on https://github.com/oneapi-src/oneapi-ci
(and remove debugging tools from the clang+gfortran job)

* Remove extraneous libgfortran dependency of ifort builds

* remove FEXTRALIB from link line of shared library as ifort keeps track of dependencies (and they are different for a .dylib than what f_check got for an executable)
2021-04-22 02:11:20 +02:00
Martin Kroeker 352efdd13a
Merge pull request #24 from xianyi/develop
rebase
2021-04-20 21:30:28 +02:00
Martin Kroeker 4855af02a3
Merge pull request #3184 from martin-frbg/ctestfix
Fix obscure ctest crashes on OSX and add OSX builds to Azure CI
2021-04-20 07:31:07 +02:00
Martin Kroeker 94a5a1f0f1
Add OSX build variations to Azure CI 2021-04-19 22:27:08 +02:00
Martin Kroeker 751d127d7c
Include cblas_test.h to achieve int/long size change with INTERFACE64 2021-04-19 22:26:34 +02:00
Martin Kroeker fc101b67e5
Merge pull request #23 from xianyi/develop
rebase
2021-04-19 22:24:12 +02:00
Martin Kroeker b0239a05fd
Merge pull request #3183 from martin-frbg/2715-x
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
2021-04-16 14:52:12 +02:00
Martin Kroeker 623d580b4c
Restore __volatile__ keyword 2021-04-16 10:27:32 +02:00
Martin Kroeker 974acb39ff
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
POWER10: Improve dgemm performance
2021-04-14 22:43:02 +02:00
Rajalakshmi Srinivasaraghavan 2379abaa5e POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
2021-04-13 22:30:06 -05:00
Martin Kroeker 3caf781d7c
Merge pull request #3179 from RajalakshmiSR/zgemvp10
POWER10: Optimized zgemv
2021-04-11 10:01:09 +02:00
Rajalakshmi Srinivasaraghavan 55bb9f639a POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
2021-04-10 19:00:24 -05:00
Martin Kroeker 0dba04bb58
Merge pull request #3178 from martin-frbg/fix2864
Fix unwanted fallback to implicit typing in slanv2/dlanv2
2021-04-09 13:38:05 +02:00
Martin Kroeker e96f5e3c65
Fix implicit typing of new variable TWO 2021-04-09 10:04:15 +02:00
Martin Kroeker 558724e99f
Fix implicit typing of new variable TWO 2021-04-09 10:03:31 +02:00
Martin Kroeker 067c96a873
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
2021-04-07 08:22:42 +02:00
Martin Kroeker 4b380c0b40
Merge pull request #3175 from LYP951018/develop
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
2021-04-07 08:22:28 +02:00
Martin Kroeker 2dfb24730d
Use "old" compute(24) function with clang due to register limitations 2021-04-06 19:58:32 +02:00
刘雨培 725432efaa pass NO_AVX512 macro def 2021-04-07 00:10:41 +08:00
Martin Kroeker a2216ef19f
Merge pull request #3173 from martin-frbg/dyna-sse3
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
2021-04-05 13:39:17 +02:00
Martin Kroeker 5332cbae18
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds 2021-04-04 23:12:17 +02:00
Martin Kroeker 209b026e46
Merge pull request #3172 from martin-frbg/lapack477-final
Copy missing fixes from the final revision of Reference-LAPACK PR477
2021-04-04 20:19:09 +02:00
Martin Kroeker 1ae607beca
Update Makefile.x86_64 2021-04-04 12:31:22 +02:00
Martin Kroeker d393f1923f
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
2021-04-03 22:18:15 +02:00
Martin Kroeker 081d5ae971
Fix typo and potentially undefined variables
(copies fixes made in Reference-LAPACK PR 477 after the initial cherrypick)
2021-04-03 22:11:14 +02:00
Martin Kroeker 0492f0f3f9
Merge pull request #22 from xianyi/develop
rebase
2021-04-03 21:58:36 +02:00
Martin Kroeker 147e0a75fd
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
2021-04-03 19:49:47 +02:00
Martin Kroeker ee068af843
Merge pull request #3171 from RajalakshmiSR/BE_p10
POWER10:  Adding check for little endian
2021-04-01 21:20:24 +02:00
Rajalakshmi Srinivasaraghavan 2dbcddd83d POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
2021-03-31 21:32:42 -05:00
CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.

Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
2021-03-31 15:44:25 -04:00
Martin Kroeker 903fd85c85
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
2021-03-27 12:40:42 +01:00
Martin Kroeker d57c681a6d
Fix compilation on older OSX versions 2021-03-26 22:29:29 +01:00
Martin Kroeker d7efe5857c
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
2021-03-24 14:05:34 +01:00
Martin Kroeker 8fd694c18f
Update .travis.yml 2021-03-24 10:36:29 +01:00
Martin Kroeker e69b0b1771
Update azure-pipelines.yml 2021-03-24 10:34:24 +01:00
Martin Kroeker 9dc0bfd617
Update azure-pipelines.yml 2021-03-24 08:54:30 +01:00
Martin Kroeker e6664ec2c9
Update azure-pipelines.yml 2021-03-24 08:41:48 +01:00
Martin Kroeker dbb33f412f
Update azure-pipelines.yml 2021-03-24 08:30:48 +01:00
Martin Kroeker 70b89a6205
Add OSX build to Azure 2021-03-24 07:50:35 +01:00
Martin Kroeker 07b144855a
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
2021-03-24 06:56:10 +01:00
Martin Kroeker 292a0aed66
Fix xcode12 build and add OSX/OpenMP 2021-03-24 06:55:14 +01:00
Martin Kroeker 42f0201e21
Merge pull request #20 from xianyi/develop
rebase
2021-03-22 17:53:43 +01:00
Martin Kroeker 22db876d48
Merge pull request #3158 from austinpagan/Gemm.CZPQ
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 20:53:21 +01:00
Martin Kroeker bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
2021-03-19 15:23:12 +01:00
Martin Kroeker 7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
2021-03-19 15:22:48 +01:00
Gordon Fossum 198adea961 Changed default P/Q values for CGEMM and ZGEMM (Power10 only) 2021-03-19 10:05:23 -04:00
Martin Kroeker 86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler 2021-03-19 11:47:58 +01:00
Martin Kroeker ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:46:25 +01:00
Martin Kroeker d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:44:31 +01:00