Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 0a4c5c4c44
Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker 3210a42734
Report cpu as ARMV8 instead of just giving up on non-Linux hosts 2020-08-31 20:03:21 +02:00
Martin Kroeker 5feb087c05
Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Gengxin Xie 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 2020-08-31 14:39:08 +08:00
Gengxin Xie cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 2020-08-31 11:44:08 +08:00
Martin Kroeker 59e01b1aec
Merge pull request #2799 from RajalakshmiSR/p10_ger
POWER10: Avoid setting accumulators to zero in gemm kernels
2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan 317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels
For the first iteration, it is better to use xvf*ger instead of xvf*gerpp
builtins which helps to avoid setting accumulators to zero. This helps
to reduce few instructions.
2020-08-28 10:42:54 -05:00
Martin Kroeker 514a3d7d63
Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler 085aae8bdb
Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Martin Kroeker de63675717
Add early returns and fix sign errors in workspace calculations 2020-08-27 11:25:18 +02:00
Martin Kroeker d64cc2be81
Add early returns 2020-08-27 11:22:50 +02:00
Martin Kroeker c9b67141f0
Add early returns 2020-08-27 11:20:31 +02:00
Martin Kroeker 6797a3a1e0
Add early returns 2020-08-27 11:15:12 +02:00
Martin Kroeker 936966a42c
Make ILAENV and xGETRF2 functions available 2020-08-27 10:59:08 +02:00
Martin Kroeker 5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj 48a1364e10
Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing 0c1c903f1e Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-08-24 02:45:54 +08:00
Martin Kroeker a073fa870e
Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Martin Kroeker b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker 1840bc5b52
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:55:18 +02:00
Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker fb3d80c42a
Merge pull request #78 from xianyi/develop
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Martin Kroeker 6f4dc7445d
Fix typo 2020-08-19 16:36:55 +02:00
Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 2020-08-19 16:10:15 +02:00
Martin Kroeker bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1 2020-08-19 15:06:30 +02:00
Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker 2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel 6b731d917f Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker 5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker aa286e301b
Add typedef for bfloat16 if needed 2020-08-17 15:32:14 +02:00
Martin Kroeker 9f0ef9cdfc
Merge pull request #77 from xianyi/develop
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker 6bfc66663c
revert 2020-08-17 15:20:41 +02:00
Martin Kroeker a8c6fb9e1c
revert 2020-08-17 15:20:16 +02:00
Martin Kroeker 5ec8f716cf
revert 2020-08-17 15:19:40 +02:00
Martin Kroeker 82f8a0aeba
Update .drone.yml 2020-08-15 15:46:18 +02:00
Martin Kroeker d57d503c15
Update Makefile 2020-08-15 14:46:26 +02:00
Martin Kroeker 37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build 2020-08-15 13:38:05 +02:00
Martin Kroeker 6a93e3b2ba
Add simple sgemm preicsion test 2020-08-15 13:33:52 +02:00
Martin Kroeker 47ce1dd08f
Update gemm64.cpp 2020-08-15 13:31:28 +02:00
Martin Kroeker f5fcc5baec
Add trivial gemm test for multithread consistency 2020-08-15 13:30:29 +02:00
Martin Kroeker 597010a968
Fix incorrect argument to SLASET
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:41:56 +02:00