Commit Graph

5592 Commits

Author SHA1 Message Date
Martin Kroeker 558cd543bf
Merge pull request #3093 from martin-frbg/fix3064
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 22:21:28 +01:00
Martin Kroeker bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 2021-01-30 16:46:25 +01:00
Martin Kroeker 35086cb501
Merge pull request #3092 from RajalakshmiSR/cscal_p10
Optimize cscal function for POWER10
2021-01-30 16:23:37 +01:00
Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-29 13:51:43 -06:00
Martin Kroeker 7745439312
Merge pull request #3091 from martin-frbg/lapack477-2
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
2021-01-29 13:37:23 +01:00
Martin Kroeker c4b5abbe43
fix data type 2021-01-29 10:45:36 +01:00
Martin Kroeker f87842483e
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477) 2021-01-29 09:56:12 +01:00
Martin Kroeker 3dbb32c734
Merge pull request #11 from xianyi/develop
rebase
2021-01-29 09:52:21 +01:00
xoviat 609ea80276 enable testing 2021-01-27 16:39:52 -06:00
xoviat 3dfecaaf7c require nofortran to be set on msvc 2021-01-27 16:39:15 -06:00
xoviat 3165c915b6 fix test helpers 2021-01-27 15:24:49 -06:00
xoviat 457ccc42c9
Merge branch 'develop' into msvc 2021-01-27 14:15:59 -06:00
Martin Kroeker 00880c720a
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
2021-01-27 19:11:55 +01:00
Martin Kroeker 856bc36533
Add exceptional shift to fix rare convergence problems 2021-01-27 13:41:45 +01:00
Martin Kroeker fe71887b68
Merge pull request #10 from xianyi/develop
rebase
2021-01-27 13:39:26 +01:00
Martin Kroeker 10094bd885
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
2021-01-27 13:25:45 +01:00
Martin Kroeker eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
2021-01-26 20:11:42 +01:00
Martin Kroeker 85be43e0df
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
2021-01-26 15:13:35 +01:00
Martin Kroeker 0cb9e9fc8d
Remove the VORTEX support bits again for now 2021-01-25 19:02:21 +01:00
Martin Kroeker cb61d3b46b
Add DYNAMIC_LIST support for ARM64 2021-01-25 13:13:20 +01:00
Alex Henrie 113840da12 Fix null pointer check in blas_memory_alloc 2021-01-24 22:20:44 -07:00
Martin Kroeker deb2e66bcc
Add DYNAMIC_LIST support for ARM64 2021-01-24 23:18:52 +01:00
Martin Kroeker 9b2d69aa80
Add DYNAMIC_LIST option for ARM64 2021-01-24 23:18:01 +01:00
Martin Kroeker e3ff4cdd23
Merge pull request #9 from xianyi/develop
rebase
2021-01-24 23:14:45 +01:00
Martin Kroeker 0745ba43a4
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
2021-01-24 19:03:40 +01:00
Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
xoviat 2e8d6e8690 add functions for embedded 2021-01-23 22:12:17 -06:00
Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00
Martin Kroeker d6905403e3
Merge pull request #3068 from alexhenrie/scan-build
scan-build fixes
2021-01-23 19:06:29 +01:00
Martin Kroeker 411926b572
Merge pull request #3079 from RajalakshmiSR/rotp10
Optimize s/drot function for POWER10
2021-01-22 08:26:00 +01:00
Rajalakshmi Srinivasaraghavan 439b93f6d2 Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-21 13:24:45 -06:00
Martin Kroeker d6cf67778c
Merge pull request #3075 from martin-frbg/issue3074
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
2021-01-21 08:51:30 +01:00
Martin Kroeker b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel 2021-01-20 21:34:36 +01:00
Martin Kroeker 6178974cd9
Update .drone.yml 2021-01-20 20:21:27 +01:00
Martin Kroeker 0b9e4d1278
Add gcc10/arm64 DYNAMIC_ARCH build 2021-01-20 18:30:05 +01:00
Martin Kroeker 63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
2021-01-20 15:41:04 +01:00
Martin Kroeker 3612d9a57a
Merge pull request #8 from xianyi/develop
rebase
2021-01-20 15:38:30 +01:00
xoviat b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Martin Kroeker 16dddb760e
Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
2021-01-16 15:47:34 +01:00
Rajalakshmi Srinivasaraghavan eff7c9166e Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-15 13:40:34 -06:00
Alex Henrie f1bf2603e6 Remove dead assignment to dflag in rotmg functions 2021-01-14 19:40:32 -07:00
Alex Henrie 6f32991eae Don't define the mode variable when not needed in gemm functions 2021-01-14 19:40:31 -07:00
Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 2021-01-14 19:40:31 -07:00
Martin Kroeker e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
2021-01-14 21:35:19 +01:00
Martin Kroeker 3628b22d49
Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
2021-01-14 16:47:59 +01:00
Martin Kroeker af2b0d0205
Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
2021-01-14 16:00:38 +01:00
Martin Kroeker 4bf988959a
Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
2021-01-14 15:59:53 +01:00
Martin Kroeker a0e4fb3a28
Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
2021-01-14 15:59:21 +01:00
Martin Kroeker 2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Albert Ziegenhagel e3f4063683 Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
2021-01-14 10:00:49 +01:00