Commit Graph

5511 Commits

Author SHA1 Message Date
Martin Kroeker cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
Fix get_num_procs()  in the USE_TLS branch for non-glibc systems
2021-02-18 15:45:25 +01:00
Martin Kroeker b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems 2021-02-18 11:14:05 +01:00
Martin Kroeker f9aaf22fc3
Merge pull request #3105 from martin-frbg/tigerlake
Recognize Intel Tiger Lake CPUID as SkylakeX
2021-02-12 13:29:53 +01:00
Martin Kroeker 35ff3c731d
Merge pull request #3106 from RajalakshmiSR/ppcbe
Fix build issue on POWER8 with DYNAMIC_ARCH
2021-02-12 13:29:23 +01:00
Rajalakshmi Srinivasaraghavan 63fa6c832e Fix build issue on POWER8 with DYNAMIC_ARCH
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
2021-02-11 21:28:03 -06:00
Martin Kroeker e4e5042e38
Recognize Intel Tiger Lake as SkylakeX 2021-02-11 20:17:11 +01:00
Martin Kroeker ae53e3e233
Recognize Intel Tiger Lake as SkylakeX 2021-02-11 20:16:27 +01:00
Martin Kroeker 074d9bff7f
Merge pull request #3104 from martin-frbg/issue3103
Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen
2021-02-11 15:42:47 +01:00
Martin Kroeker f36862603a
Merge pull request #3101 from jake-arkinstall/issue-3100
Addressed issue #3100 - removing an unnecessary write to the include directory
2021-02-11 15:42:18 +01:00
Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 2021-02-11 09:26:15 +01:00
Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 2021-02-11 09:25:36 +01:00
Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 2021-02-11 09:24:51 +01:00
Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 2021-02-11 09:24:16 +01:00
Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 2021-02-11 09:23:05 +01:00
Martin Kroeker a33f471065
Merge pull request #3102 from martin-frbg/issue3099
Strip pkgversion info from compiler version string before comparing
2021-02-11 08:56:46 +01:00
Martin Kroeker ece3ce581e
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation 2021-02-10 14:22:59 +01:00
Martin Kroeker 8189a98d85
Merge pull request #12 from xianyi/develop
rebase
2021-02-10 14:17:24 +01:00
Jake Arkinstall d7a77091a3 Addressed issue #3100, removing an unnecessary write to the include directory 2021-02-10 12:11:17 +00:00
Martin Kroeker 3e1e74fca6
Merge pull request #3094 from xoviat/patch-1
build openmp on appveyor
2021-02-02 13:36:17 +01:00
Martin Kroeker 33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker 95e19e2e23
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
2021-02-02 10:53:46 +01:00
Martin Kroeker 99ac042702
remove spurious lines (probably editor malfunction) 2021-02-01 21:02:53 +01:00
Martin Kroeker 774b9f8653
handle AppleClang in Cooperlake support condition 2021-02-01 20:18:53 +01:00
Martin Kroeker eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion) 2021-02-01 19:45:25 +01:00
xoviat 6fa9860dbe appveyor: cleanup and add openmp run 2021-01-31 16:32:04 -06:00
Martin Kroeker 0cc36770f1
Merge pull request #3073 from xoviat/embedded
add embedded option
2021-01-31 18:02:41 +01:00
Martin Kroeker 558cd543bf
Merge pull request #3093 from martin-frbg/fix3064
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 22:21:28 +01:00
Martin Kroeker bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 2021-01-30 16:46:25 +01:00
Martin Kroeker 35086cb501
Merge pull request #3092 from RajalakshmiSR/cscal_p10
Optimize cscal function for POWER10
2021-01-30 16:23:37 +01:00
Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-29 13:51:43 -06:00
Martin Kroeker 7745439312
Merge pull request #3091 from martin-frbg/lapack477-2
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
2021-01-29 13:37:23 +01:00
Martin Kroeker c4b5abbe43
fix data type 2021-01-29 10:45:36 +01:00
Martin Kroeker f87842483e
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477) 2021-01-29 09:56:12 +01:00
Martin Kroeker 3dbb32c734
Merge pull request #11 from xianyi/develop
rebase
2021-01-29 09:52:21 +01:00
Martin Kroeker 00880c720a
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
2021-01-27 19:11:55 +01:00
Martin Kroeker 856bc36533
Add exceptional shift to fix rare convergence problems 2021-01-27 13:41:45 +01:00
Martin Kroeker fe71887b68
Merge pull request #10 from xianyi/develop
rebase
2021-01-27 13:39:26 +01:00
Martin Kroeker 10094bd885
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
2021-01-27 13:25:45 +01:00
Martin Kroeker eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
2021-01-26 20:11:42 +01:00
Martin Kroeker 85be43e0df
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
2021-01-26 15:13:35 +01:00
Martin Kroeker 0cb9e9fc8d
Remove the VORTEX support bits again for now 2021-01-25 19:02:21 +01:00
Martin Kroeker cb61d3b46b
Add DYNAMIC_LIST support for ARM64 2021-01-25 13:13:20 +01:00
Alex Henrie 113840da12 Fix null pointer check in blas_memory_alloc 2021-01-24 22:20:44 -07:00
Martin Kroeker deb2e66bcc
Add DYNAMIC_LIST support for ARM64 2021-01-24 23:18:52 +01:00
Martin Kroeker 9b2d69aa80
Add DYNAMIC_LIST option for ARM64 2021-01-24 23:18:01 +01:00
Martin Kroeker e3ff4cdd23
Merge pull request #9 from xianyi/develop
rebase
2021-01-24 23:14:45 +01:00
Martin Kroeker 0745ba43a4
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
2021-01-24 19:03:40 +01:00
Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
xoviat 2e8d6e8690 add functions for embedded 2021-01-23 22:12:17 -06:00
Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00