Commit Graph

4571 Commits

Author SHA1 Message Date
Martin Kroeker
79cd69fea4 Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker
bb12c2c854 Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker
32c1c1e125 Update azure-pipelines.yml 2020-06-04 19:03:46 +02:00
Martin Kroeker
f1953b8b81 Update azure-pipelines.yml 2020-06-04 17:58:13 +02:00
Martin Kroeker
6e97df7b47 Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker
729303e5ed Merge pull request #2643 from craft-zhang/cortex-a53
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker
547965530f Merge pull request #2638 from leezu/actions
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng
9b7877ccf1 sgemm copy source init
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1 Insert prefetch
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:08:48 +08:00
Martin Kroeker
3eda3d34c3 Merge pull request #2641 from martin-frbg/ppcg4
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker
a8f42ae85c set cmake build type to Release 2020-06-03 15:28:59 +02:00
Martin Kroeker
e6e2e531bc revert clang pragma 2020-06-03 15:16:27 +02:00
Martin Kroeker
456dc04441 Update sgemm_kernel_16x4_skylakex_3.c 2020-06-03 15:15:41 +02:00
Martin Kroeker
89323458a9 preset optimization level for apple clang 2020-06-03 15:07:25 +02:00
Martin Kroeker
e153bdeb70 Update dynamic_arch.yml 2020-06-03 13:46:43 +02:00
Martin Kroeker
c2001f7756 Make cmake build verbose to see options in use 2020-06-03 12:18:15 +02:00
Martin Kroeker
c2b3f0b3f6 Revert "keep Apple Clang from optimizing this" 2020-06-03 10:22:15 +02:00
Martin Kroeker
f16e39554d Change PPCG4 CGEMM_M to match kernel change 2020-06-03 09:15:29 +02:00
Martin Kroeker
b1ee81228a Change complex DOT and ROT to generic kernels and switch CGEMM
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker
9f7358d7dc Keep Apple Clang from optimizing this 2020-06-03 08:52:53 +02:00
Martin Kroeker
54fa90fb25 Keep apple clang 11.0.3 from trying to optimize this (and running out of registers) 2020-06-02 17:31:45 +02:00
Leonard Lausen
5a709b8340 Print CPU info in output 2020-06-01 20:51:11 +00:00
Leonard Lausen
b31a68b835 Add Github Actions test for DYNAMIC_ARCH builds 2020-06-01 20:16:59 +00:00
Martin Kroeker
a349d48d89 Merge pull request #2636 from martin-frbg/issue2634
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker
4db00121dc Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker
909897f13b Document option USE_LOCKING 2020-05-31 12:37:57 +02:00
Martin Kroeker
e79245acd9 Merge pull request #2635 from ilayn/patch-1
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat
76d2612e0c BUG: Fix the loop range in ZHEEQUB.f 2020-05-30 14:11:11 +02:00
Martin Kroeker
dd7a650792 Merge pull request #59 from xianyi/develop
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker
4a4c50a7ce Merge pull request #2627 from pkubaj/patch-1
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker
d069780e63 Merge pull request #2626 from docularxu/working-gcc-version-detections
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj
33c8790603 Add powerpc (32-bit)
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu
06387ac0e6 make GCC version detection OS-independent
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.

The fix is to move GCC version detection into common part, not
wrapped by anything.

Signed-off-by: Guodong Xu <guodong.xu@linaro.com>
2020-05-25 10:57:03 +00:00
Martin Kroeker
f1a18d245b Merge pull request #2618 from craft-zhang/cortex-A53
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫
2a3aa91354 update CONTRIBUTORS.md, adding myself 2020-05-20 22:35:26 +08:00
张丹枫
ea5bdc3f72 split cortex-a53 param to match 8x8 kernel 2020-05-20 22:34:47 +08:00
张丹枫
9df79ae9a3 update sgemm and strmm kernel selecting strategy 2020-05-20 22:26:58 +08:00
张丹枫
a1fc6041cd use general register to speedup 2020-05-20 22:26:58 +08:00
张丹枫
edb423d772 align general register using to strmm_kernel_8x8 2020-05-20 22:26:58 +08:00
zhangdanfeng
0e6eb8c247 sgemm kernel use sgemm_kernel_8x8_cortexa53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
zhangdanfeng
d475db29c6 optimized for cortex-a53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
Martin Kroeker
729ac6bd4a Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
2020-05-20 14:51:04 +02:00
Marius Hillenbrand
89fe17f20e s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Marius Hillenbrand
bdd795ed03 s390x/GEMM: replace 0-init with peeled first iteration
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-20 10:23:35 +02:00
Martin Kroeker
e1038ea836 Merge pull request #2622 from martin-frbg/issue2619
Improve declaration of LAPACKE_get_nancheck
2020-05-19 23:07:22 +02:00
Martin Kroeker
6baa9a778d Improve declaration of LAPACKE_get_nancheck 2020-05-19 17:59:31 +02:00
Martin Kroeker
cf46c9f84e Merge pull request #2617 from martin-frbg/issue2616
Add workaround for unhandled  gmake jobserver flags in c_check/f_check
2020-05-18 13:23:58 +02:00
Martin Kroeker
55602fce56 Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:28:14 +02:00
Martin Kroeker
3d5e159e7a Ignore spurious all-numeric library names derived from mishandled jobserver flags 2020-05-17 15:26:57 +02:00
Martin Kroeker
2931feb575 Merge pull request #58 from xianyi/develop
rebase
2020-05-17 15:23:32 +02:00