Commit Graph

5120 Commits

Author SHA1 Message Date
Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Martin Kroeker b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker 1840bc5b52
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:55:18 +02:00
Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker fb3d80c42a
Merge pull request #78 from xianyi/develop
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Martin Kroeker 6f4dc7445d
Fix typo 2020-08-19 16:36:55 +02:00
Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 2020-08-19 16:10:15 +02:00
Martin Kroeker bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1 2020-08-19 15:06:30 +02:00
Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker 2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel 6b731d917f Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker 5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker aa286e301b
Add typedef for bfloat16 if needed 2020-08-17 15:32:14 +02:00
Martin Kroeker 9f0ef9cdfc
Merge pull request #77 from xianyi/develop
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker 6bfc66663c
revert 2020-08-17 15:20:41 +02:00
Martin Kroeker a8c6fb9e1c
revert 2020-08-17 15:20:16 +02:00
Martin Kroeker 5ec8f716cf
revert 2020-08-17 15:19:40 +02:00
Martin Kroeker 82f8a0aeba
Update .drone.yml 2020-08-15 15:46:18 +02:00
Martin Kroeker d57d503c15
Update Makefile 2020-08-15 14:46:26 +02:00
Martin Kroeker 37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build 2020-08-15 13:38:05 +02:00
Martin Kroeker 6a93e3b2ba
Add simple sgemm preicsion test 2020-08-15 13:33:52 +02:00
Martin Kroeker 47ce1dd08f
Update gemm64.cpp 2020-08-15 13:31:28 +02:00
Martin Kroeker f5fcc5baec
Add trivial gemm test for multithread consistency 2020-08-15 13:30:29 +02:00
Martin Kroeker 597010a968
Fix incorrect argument to SLASET
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:41:56 +02:00
Martin Kroeker d64f1ef26b
Fix incorrect argument to SLASET
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:40:24 +02:00
Martin Kroeker c62aad62e5
Fix incorrect calls to DLASET
Reference-LAPACK issue 429
2020-08-14 00:35:45 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker efdd237a91
Add a dedicated POWER9 build to the Travis CI (#2774)
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker 4573cb2f43
Merge pull request #2765 from martin-frbg/issue2760
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker 2a4bb797db
Merge pull request #2773 from martin-frbg/issue2770
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker 619343278d
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0 2020-08-11 13:40:40 +02:00
Martin Kroeker fee361ae64
fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Martin Kroeker 62f4c84f27
Merge pull request #76 from xianyi/develop
rebase
2020-08-11 13:25:12 +02:00
Marius Hillenbrand e115c97e05 s390x/SGEMM: adjust default P and Q to multiples of M
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:46 +02:00
Marius Hillenbrand 07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14
For small register blockings that are too small to fill up vector
registers with column vectors, we currently use a generic code block.
Replace that with instantiations of the generic code as individual
functions, so that the compiler can optimize each one separately.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:39 +02:00
Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.

Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:55:42 +02:00
Martin Kroeker 7219c9cb87
Merge pull request #2764 from martin-frbg/lapacktests
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker c9d32674ea
Add memory barrier to the blas_lock implementation for Linux
as recommended by cparrott73 in #2760
2020-08-09 19:17:04 +02:00
Martin Kroeker 64259d521a
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL 2020-08-09 13:02:27 +02:00
Martin Kroeker 6f5ca44c1a
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3 2020-08-09 12:59:20 +02:00
Martin Kroeker d28b3f2776
Create Jenkinsfile for OSUOSL PowerCI 2020-08-08 18:05:20 +02:00
Martin Kroeker ba3f7b3acf
Merge pull request #2761 from RajalakshmiSR/Makefile_err
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00