Commit Graph

6983 Commits

Author SHA1 Message Date
Martin Kroeker 3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker a8f42ae85c
set cmake build type to Release 2020-06-03 15:28:59 +02:00
Martin Kroeker e6e2e531bc
revert clang pragma 2020-06-03 15:16:27 +02:00
Martin Kroeker 456dc04441
Update sgemm_kernel_16x4_skylakex_3.c 2020-06-03 15:15:41 +02:00
Martin Kroeker 89323458a9
preset optimization level for apple clang 2020-06-03 15:07:25 +02:00
Martin Kroeker e153bdeb70
Update dynamic_arch.yml 2020-06-03 13:46:43 +02:00
Martin Kroeker c2001f7756
Make cmake build verbose to see options in use 2020-06-03 12:18:15 +02:00
Martin Kroeker c2b3f0b3f6
Revert "keep Apple Clang from optimizing this" 2020-06-03 10:22:15 +02:00
Martin Kroeker f16e39554d
Change PPCG4 CGEMM_M to match kernel change 2020-06-03 09:15:29 +02:00
Martin Kroeker b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker 9f7358d7dc
Keep Apple Clang from optimizing this 2020-06-03 08:52:53 +02:00
Martin Kroeker 54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers) 2020-06-02 17:31:45 +02:00
Leonard Lausen 5a709b8340 Print CPU info in output 2020-06-01 20:51:11 +00:00
Leonard Lausen b31a68b835 Add Github Actions test for DYNAMIC_ARCH builds 2020-06-01 20:16:59 +00:00
Martin Kroeker 86552bf4c7
Update f_check 2020-05-31 15:22:12 +02:00
Martin Kroeker a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker 4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker 909897f13b
Document option USE_LOCKING 2020-05-31 12:37:57 +02:00
Martin Kroeker e79245acd9
Merge pull request #2635 from ilayn/patch-1
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat 76d2612e0c
BUG: Fix the loop range in ZHEEQUB.f 2020-05-30 14:11:11 +02:00
Martin Kroeker ced49466f0
Use the fortran compiler to link LAPACK-related benchmarks
to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.
2020-05-29 13:35:51 +02:00
Martin Kroeker 6e270f91ec
add support for RETURN_BY_STACK semantics, e.g. clang 2020-05-29 13:29:10 +02:00
Martin Kroeker 200296b0f4
remove libomp from link list only for pgfortran
at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default
2020-05-29 13:23:51 +02:00
Martin Kroeker dd7a650792
Merge pull request #59 from xianyi/develop
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker 4a4c50a7ce
Merge pull request #2627 from pkubaj/patch-1
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker d069780e63
Merge pull request #2626 from docularxu/working-gcc-version-detections
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj 33c8790603
Add powerpc (32-bit)
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu 06387ac0e6 make GCC version detection OS-independent
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.

The fix is to move GCC version detection into common part, not
wrapped by anything.

Signed-off-by: Guodong Xu <guodong.xu@linaro.com>
2020-05-25 10:57:03 +00:00
Martin Kroeker f1a18d245b
Merge pull request #2618 from craft-zhang/cortex-A53
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫 2a3aa91354 update CONTRIBUTORS.md, adding myself 2020-05-20 22:35:26 +08:00
张丹枫 ea5bdc3f72 split cortex-a53 param to match 8x8 kernel 2020-05-20 22:34:47 +08:00
张丹枫 9df79ae9a3 update sgemm and strmm kernel selecting strategy 2020-05-20 22:26:58 +08:00
张丹枫 a1fc6041cd use general register to speedup 2020-05-20 22:26:58 +08:00
张丹枫 edb423d772 align general register using to strmm_kernel_8x8 2020-05-20 22:26:58 +08:00
zhangdanfeng 0e6eb8c247 sgemm kernel use sgemm_kernel_8x8_cortexa53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
zhangdanfeng d475db29c6 optimized for cortex-a53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
Martin Kroeker 729ac6bd4a
Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
2020-05-20 14:51:04 +02:00
Marius Hillenbrand 89fe17f20e s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Marius Hillenbrand bdd795ed03 s390x/GEMM: replace 0-init with peeled first iteration
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Martin Kroeker e1038ea836
Merge pull request #2622 from martin-frbg/issue2619
Improve declaration of LAPACKE_get_nancheck
2020-05-19 23:07:22 +02:00
Martin Kroeker 6baa9a778d
Improve declaration of LAPACKE_get_nancheck 2020-05-19 17:59:31 +02:00
Martin Kroeker cf46c9f84e
Merge pull request #2617 from martin-frbg/issue2616
Add workaround for unhandled  gmake jobserver flags in c_check/f_check
2020-05-18 13:23:58 +02:00
Martin Kroeker 55602fce56
Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:28:14 +02:00
Martin Kroeker 3d5e159e7a
Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:26:57 +02:00
Martin Kroeker 2931feb575
Merge pull request #58 from xianyi/develop
rebase
2020-05-17 15:23:32 +02:00
Martin Kroeker 20245ded5f
Merge pull request #2615 from mhillenibm/z14_alignment_hints
s390x: improvise vector alignment hints for older compilers
2020-05-14 21:06:34 +02:00
Marius Hillenbrand 2840432e49 s390x: improvise vector alignment hints for older compilers
Introduce inline assembly so that we can employ vector loads with
alignment hints on older compilers (pre gcc-9), since these are still
used in distributions such as RHEL 8 and Ubuntu 18.04 LTS.

Informing the hardware about alignment can speed up vector loads. For
that purpose, we can encode hints about 8-byte or 16-byte alignment of
the memory operand into the opcodes. gcc-9 and newer automatically emit
such hints, where applicable. Add a bit of inline assembly that achieves
the same for older compilers. Since an older binutils may not know about
the additional operand for the hints, we explicitly encode the opcode in
hex.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-14 15:36:03 +02:00
Martin Kroeker ea78106c71
Merge pull request #2614 from mhillenibm/gemm_vec_z14
s390x: Improve performance of SGEMM and STRMM on z14 and newer
2020-05-13 15:09:23 +02:00
Marius Hillenbrand cb9dc36dd5 Update CONTRIBUTORS.md
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 16:14:00 +02:00
Marius Hillenbrand 1b0b4349a1 s390x/Z14: Change register blocking for SGEMM to 16x4
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.

On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00