Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 4369e52555
Merge pull request #2677 from brada4/develop
address vs2019 C4293 warning
2020-06-25 08:31:17 +02:00
Gordon Fossum bb2f52844b powerpc: Optimized ZGEMM kernel for POWER10
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan 571eadb880 powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat df4ade070f Fix for #2671 2020-06-24 04:25:47 -05:00
User User-User e6b9275034 address vs2019 C4293 2020-06-24 09:12:23 +03:00
Martin Kroeker 53ea5bfece
Merge pull request #66 from xianyi/develop
rebase
2020-06-23 10:13:44 +02:00
Martin Kroeker 93592d1260
Merge pull request #2675 from wjc404/develop
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
Martin Kroeker 6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker 45d542c9d1
Merge pull request #65 from xianyi/develop
rebase
2020-06-21 12:41:01 +02:00
wjc404 086d87a302
AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
Martin Kroeker af501eb753
Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection
Zarch fix gcc detection
2020-06-17 17:55:25 +02:00
Martin Kroeker 0eb6c4dded
Merge pull request #2672 from mhillenibm/test_num_threads
cpp_thread_test: Change adjustment of concurrency on systems with <52 hw threads
2020-06-17 17:54:31 +02:00
Marius Hillenbrand de838c38ef cpp_thread_test/dgemv: fail early if concurrency is zero
The two test cases dgemv_tester and dgemm_tester accept the degree of
concurrency as command line argument (amongst others). Fail early if
value 0 has been specified, instead of later with less-clear symptoms.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:15:44 +02:00
Marius Hillenbrand 478898b37a cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems
... instead of (number of hw threads - 4) to avoid invalid numbers on
smaller systems. Currently, systems with 4 or fewer CPUs (e.g., small CI
VMs) would fail the test. Fixes one of the issues discussed in #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:08:48 +02:00
Marius Hillenbrand cde4690721 RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x
In gcc-7.1, the behavior of -dumpversion changed to be configured
at compile-time. On some distributions it only dumps the major version
(e.g., Ubuntu), so the current checks for the gcc minor version report
false negatives. As a replacement, gcc-7.1 introduced -dumpfullversion
which always prints the full version.

Update the gcc version detection in Makefile.system to employ
-dumpfullversion with gcc-7 and newer.

Posting this patch for discussion, since it emerged from discussions
around issue #2668 and PR #2669. It is not solving a problem right now,
but may be useful in the future.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:45:59 +02:00
Marius Hillenbrand 2389291766 Makefile.system: remove duplicate variable GCCVERSIONGT5
... to bring unified gcc version detection with common variables to the
one remaining spot in Makefile.system.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Marius Hillenbrand a2d13ea611 Fix gcc version detection for zarch
Employ common variables for gcc version detection and fix the broken
check for gcc >= 5.2.
Fixes #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Martin Kroeker 1bd3cd66c2
Increment version to 0.3.10.dev 2020-06-14 22:05:19 +02:00
Martin Kroeker 1c53e1366d
Increment version to 0.3.10.dev 2020-06-14 22:04:37 +02:00
Martin Kroeker 63b03efc2a
Merge pull request #2667 from xianyi/develop
Merge develop into 0.3.0 for 0.3.10 release
2020-06-14 22:03:04 +02:00
Martin Kroeker 95dbeff66d
Merge branch 'release-0.3.0' into develop 2020-06-14 22:02:45 +02:00
Martin Kroeker 3b673a24b7
Increment version to 0.3.10.dev 2020-06-14 21:57:52 +02:00
Martin Kroeker 1eb1979050
Increment version to 0.3.10.dev 2020-06-14 21:57:15 +02:00
Martin Kroeker efc53b6e7e
Merge pull request #2665 from martin-frbg/flang-fixes-2a
Fix spelling of flang option -Mrecursive, add -Kieee and workaround for AOCC optimizer bug
2020-06-14 21:56:08 +02:00
Martin Kroeker 72888497e2
Update with 0.3.10 changes 2020-06-14 21:55:31 +02:00
Martin Kroeker 7e3e006af6
Merge pull request #2666 from martin-frbg/blastest
Update BLAS tests to what netlib 3.9.0 uses
2020-06-14 18:28:37 +02:00
Martin Kroeker d906d14402
Merge pull request #2664 from ACSimon33/exported_symbols
Add missing exported symbols.
2020-06-14 18:27:03 +02:00
Martin Kroeker 3785c0e82b
Merge pull request #2663 from martin-frbg/issue2654
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-14 18:26:43 +02:00
Martin Kroeker f2d8879af6
Merge pull request #2661 from martin-frbg/issue2660
Report selected DYNAMIC_ARCH kernel rather than one of its aliases in gotoblas_corename
2020-06-14 18:25:37 +02:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 79cdcde717
Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang 2020-06-14 17:18:16 +02:00
Martin Kroeker 18a11137f1
Update BLAS tests to correspond to Reference-LAPACK 3.9.0
replaces calculation of machine precision with call to epsilon intrinsic and removes the requirement for previous output files to be removed before rerunning tests
2020-06-14 10:26:25 +02:00
Martin Kroeker 1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Martin Kroeker 0ed2adf0b2
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:01:20 +02:00
Martin Kroeker abf670757b
Respect predefined defaults for AR, AS, LD and RANLIB 2020-06-13 23:21:13 +02:00
Simon Märtens 41fc6f3cd2 Added missing exported symbols. 2020-06-13 22:37:39 +02:00
Martin Kroeker 007d9f97d7
Make gotoblas_corename report the name of the selected TARGET rather than its aliases 2020-06-13 19:25:28 +02:00
Martin Kroeker 63d26090f5
Merge pull request #64 from xianyi/develop
rebase
2020-06-13 19:14:47 +02:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 3a1b58d54a
Merge pull request #2653 from craft-zhang/cortex-a53
fix INIT8x4 of SGEMM on Arm Cortex-A53
2020-06-10 12:19:33 +02:00
Martin Kroeker f7659be4a0
Merge pull request #2652 from martin-frbg/flang-fixes
Fixes for compilation with flang binary release 20190329
2020-06-09 20:31:06 +02:00
ZhangDanfeng bc6fd20a40 fix INIT8x4
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-10 01:01:16 +08:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker ba2c5b404d
When building with flang, use it also for the final link step to get dependencies right 2020-06-09 16:09:34 +02:00
Martin Kroeker f07a80354b
Apply previously AOCC-specific workaround to all versions of flang 2020-06-09 16:07:03 +02:00
Martin Kroeker fdd1b50263
Merge pull request #63 from xianyi/develop
rebase
2020-06-09 15:54:30 +02:00
Leonard Lausen b98923f33a Test enforce -O1 for flang 2020-06-09 06:54:47 +00:00
Leonard Lausen 4cb1db0e3b Test flang build 2020-06-09 06:31:17 +00:00
Martin Kroeker 430e8b45fe
Merge pull request #2648 from martin-frbg/lapack411
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
2020-06-07 19:45:52 +02:00
Martin Kroeker 88fe85f4e0
Merge pull request #2647 from martin-frbg/aocc-flang
Small fixes for flang in general and the AMD AOCC version of it in particular
2020-06-07 19:45:11 +02:00