Commit Graph

4829 Commits

Author SHA1 Message Date
Martin Kroeker
f42e84d46c Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808)
* Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_

* Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.
2020-09-01 10:44:48 +02:00
Martin Kroeker
0a4c5c4c44 Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker
3210a42734 Report cpu as ARMV8 instead of just giving up on non-Linux hosts 2020-08-31 20:03:21 +02:00
Martin Kroeker
5feb087c05 Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Martin Kroeker
59e01b1aec Merge pull request #2799 from RajalakshmiSR/p10_ger
POWER10: Avoid setting accumulators to zero in gemm kernels
2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan
317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels
For the first iteration, it is better to use xvf*ger instead of xvf*gerpp
builtins which helps to avoid setting accumulators to zero. This helps
to reduce few instructions.
2020-08-28 10:42:54 -05:00
Martin Kroeker
514a3d7d63 Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler
085aae8bdb Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Martin Kroeker
5c6c2cd4f6 Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker
e54be4ba1c Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj
48a1364e10 Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing
0c1c903f1e Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-08-24 02:45:54 +08:00
Martin Kroeker
a073fa870e Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker
b2053239fc Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Martin Kroeker
b11bb6e728 Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker
1840bc5b52 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:55:18 +02:00
Martin Kroeker
7c0977c267 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker
fb3d80c42a Merge pull request #78 from xianyi/develop
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker
9ee21a0a39 Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker
bd3207b4b4 Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker
b8ebfc9335 Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker
7c1986640b fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker
71d33c952d Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker
6a3c074786 -march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker
430f741b30 -march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Martin Kroeker
6f4dc7445d Fix typo 2020-08-19 16:36:55 +02:00
Martin Kroeker
81fbe8d088 -march=cooperlake only available in gcc >= 10 2020-08-19 16:10:15 +02:00
Martin Kroeker
bb9cf766f5 make march=cooperlake option conditional on gcc >= 10.1 2020-08-19 15:06:30 +02:00
Martin Kroeker
75eeb265d7 [WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
2c72972570 Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel
6b731d917f Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker
5dcf47cd97 Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker
aa286e301b Add typedef for bfloat16 if needed 2020-08-17 15:32:14 +02:00
Martin Kroeker
9f0ef9cdfc Merge pull request #77 from xianyi/develop
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker
6bfc66663c revert 2020-08-17 15:20:41 +02:00
Martin Kroeker
a8c6fb9e1c revert 2020-08-17 15:20:16 +02:00
Martin Kroeker
5ec8f716cf revert 2020-08-17 15:19:40 +02:00
Martin Kroeker
82f8a0aeba Update .drone.yml 2020-08-15 15:46:18 +02:00
Martin Kroeker
d57d503c15 Update Makefile 2020-08-15 14:46:26 +02:00
Martin Kroeker
37ac23e8a3 Add simple MT sgemm precision test and INTERFACE64 build 2020-08-15 13:38:05 +02:00
Martin Kroeker
6a93e3b2ba Add simple sgemm preicsion test 2020-08-15 13:33:52 +02:00
Martin Kroeker
47ce1dd08f Update gemm64.cpp 2020-08-15 13:31:28 +02:00
Martin Kroeker
f5fcc5baec Add trivial gemm test for multithread consistency 2020-08-15 13:30:29 +02:00
Chen, Guobing
e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
efdd237a91 Add a dedicated POWER9 build to the Travis CI (#2774)
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker
4573cb2f43 Merge pull request #2765 from martin-frbg/issue2760
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker
2a4bb797db Merge pull request #2773 from martin-frbg/issue2770
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker
cbbe38bb88 Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker
619343278d Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0 2020-08-11 13:40:40 +02:00
Martin Kroeker
fee361ae64 fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00