Commit Graph

6065 Commits

Author SHA1 Message Date
Martin Kroeker e30ad0e521
Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
Rajalakshmi Srinivasaraghavan d23419accc powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Matthew Treinish 2f9c10810c
Also set CPUTYPE in get_cpuname() 2020-06-25 15:53:56 -04:00
Matthew Treinish f37e941d52
Add support to driver/others/dynamic.c too 2020-06-25 11:56:49 -04:00
Matthew Treinish 2a91452bdd
Add cpu detection support for comet lake U
Comet Lake U CPUs have family: 6, model: 6, extended family: 0, and
extended model: 10 were not being correctly detected by GETARCH during
openblas builds and would show CORE=UNKNOWN and LIBCORE=unknown. This
commit adds the necessary information to cpuid_x86 to detect extended
family 10 model 6 and return the proper core information. It's
essentially just a skylake cpu, not skylake x, so I just took the used
the same return fields as skylake.
2020-06-25 11:32:09 -04:00
Martin Kroeker c854ef5471
Fix variable names in conditional 2020-06-25 13:29:52 +02:00
Martin Kroeker c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Martin Kroeker c592f0f80a
Fix utest build on AIX 2020-06-25 12:58:13 +02:00
Martin Kroeker 3f613b1301
Tentative changes for building on AIX 2020-06-25 12:57:00 +02:00
Martin Kroeker 72a0ec8e75
Fix reading of CPU name from prtconf output on AIX 2020-06-25 12:55:10 +02:00
Martin Kroeker 3446e58daf
Fix handling of uname output on AIX 2020-06-25 12:31:35 +02:00
Martin Kroeker 4ca8becc4b
Merge pull request #67 from xianyi/develop
rebase
2020-06-25 10:33:03 +02:00
Martin Kroeker dfe819f3bd
Merge pull request #2679 from RajalakshmiSR/P10_GEMM
POWER10 Optimized GEMM kernels
2020-06-25 08:31:38 +02:00
Martin Kroeker 4369e52555
Merge pull request #2677 from brada4/develop
address vs2019 C4293 warning
2020-06-25 08:31:17 +02:00
Gordon Fossum bb2f52844b powerpc: Optimized ZGEMM kernel for POWER10
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan 571eadb880 powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat df4ade070f Fix for #2671 2020-06-24 04:25:47 -05:00
User User-User e6b9275034 address vs2019 C4293 2020-06-24 09:12:23 +03:00
Martin Kroeker 53ea5bfece
Merge pull request #66 from xianyi/develop
rebase
2020-06-23 10:13:44 +02:00
Martin Kroeker 93592d1260
Merge pull request #2675 from wjc404/develop
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
Martin Kroeker 6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker 45d542c9d1
Merge pull request #65 from xianyi/develop
rebase
2020-06-21 12:41:01 +02:00
wjc404 086d87a302
AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
Martin Kroeker af501eb753
Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection
Zarch fix gcc detection
2020-06-17 17:55:25 +02:00
Martin Kroeker 0eb6c4dded
Merge pull request #2672 from mhillenibm/test_num_threads
cpp_thread_test: Change adjustment of concurrency on systems with <52 hw threads
2020-06-17 17:54:31 +02:00
Marius Hillenbrand de838c38ef cpp_thread_test/dgemv: fail early if concurrency is zero
The two test cases dgemv_tester and dgemm_tester accept the degree of
concurrency as command line argument (amongst others). Fail early if
value 0 has been specified, instead of later with less-clear symptoms.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:15:44 +02:00
Marius Hillenbrand 478898b37a cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems
... instead of (number of hw threads - 4) to avoid invalid numbers on
smaller systems. Currently, systems with 4 or fewer CPUs (e.g., small CI
VMs) would fail the test. Fixes one of the issues discussed in #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-17 16:08:48 +02:00
Marius Hillenbrand cde4690721 RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x
In gcc-7.1, the behavior of -dumpversion changed to be configured
at compile-time. On some distributions it only dumps the major version
(e.g., Ubuntu), so the current checks for the gcc minor version report
false negatives. As a replacement, gcc-7.1 introduced -dumpfullversion
which always prints the full version.

Update the gcc version detection in Makefile.system to employ
-dumpfullversion with gcc-7 and newer.

Posting this patch for discussion, since it emerged from discussions
around issue #2668 and PR #2669. It is not solving a problem right now,
but may be useful in the future.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:45:59 +02:00
Marius Hillenbrand 2389291766 Makefile.system: remove duplicate variable GCCVERSIONGT5
... to bring unified gcc version detection with common variables to the
one remaining spot in Makefile.system.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Marius Hillenbrand a2d13ea611 Fix gcc version detection for zarch
Employ common variables for gcc version detection and fix the broken
check for gcc >= 5.2.
Fixes #2668

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-06-16 15:06:03 +02:00
Martin Kroeker 1bd3cd66c2
Increment version to 0.3.10.dev 2020-06-14 22:05:19 +02:00
Martin Kroeker 1c53e1366d
Increment version to 0.3.10.dev 2020-06-14 22:04:37 +02:00
Martin Kroeker 63b03efc2a
Merge pull request #2667 from xianyi/develop
Merge develop into 0.3.0 for 0.3.10 release
2020-06-14 22:03:04 +02:00
Martin Kroeker 95dbeff66d
Merge branch 'release-0.3.0' into develop 2020-06-14 22:02:45 +02:00
Martin Kroeker 3b673a24b7
Increment version to 0.3.10.dev 2020-06-14 21:57:52 +02:00
Martin Kroeker 1eb1979050
Increment version to 0.3.10.dev 2020-06-14 21:57:15 +02:00
Martin Kroeker efc53b6e7e
Merge pull request #2665 from martin-frbg/flang-fixes-2a
Fix spelling of flang option -Mrecursive, add -Kieee and workaround for AOCC optimizer bug
2020-06-14 21:56:08 +02:00
Martin Kroeker 72888497e2
Update with 0.3.10 changes 2020-06-14 21:55:31 +02:00
Martin Kroeker 7e3e006af6
Merge pull request #2666 from martin-frbg/blastest
Update BLAS tests to what netlib 3.9.0 uses
2020-06-14 18:28:37 +02:00
Martin Kroeker d906d14402
Merge pull request #2664 from ACSimon33/exported_symbols
Add missing exported symbols.
2020-06-14 18:27:03 +02:00
Martin Kroeker 3785c0e82b
Merge pull request #2663 from martin-frbg/issue2654
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-14 18:26:43 +02:00
Martin Kroeker f2d8879af6
Merge pull request #2661 from martin-frbg/issue2660
Report selected DYNAMIC_ARCH kernel rather than one of its aliases in gotoblas_corename
2020-06-14 18:25:37 +02:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 79cdcde717
Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang 2020-06-14 17:18:16 +02:00
Martin Kroeker 18a11137f1
Update BLAS tests to correspond to Reference-LAPACK 3.9.0
replaces calculation of machine precision with call to epsilon intrinsic and removes the requirement for previous output files to be removed before rerunning tests
2020-06-14 10:26:25 +02:00
Martin Kroeker 1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Martin Kroeker 0ed2adf0b2
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:01:20 +02:00
Martin Kroeker abf670757b
Respect predefined defaults for AR, AS, LD and RANLIB 2020-06-13 23:21:13 +02:00
Simon Märtens 41fc6f3cd2 Added missing exported symbols. 2020-06-13 22:37:39 +02:00
Martin Kroeker 007d9f97d7
Make gotoblas_corename report the name of the selected TARGET rather than its aliases 2020-06-13 19:25:28 +02:00