Martin Kroeker
89091e6b64
Merge pull request #2645 from martin-frbg/misc_fixes
...
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker
522aaf53bf
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
...
Reference-LAPACK issue 411
2020-06-07 14:30:20 +02:00
Martin Kroeker
c3574ffe53
Merge pull request #2646 from wjc404/develop
...
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
Martin Kroeker
4e28dc6353
Use only -O1 with AMD AOCC version of flang
...
to prevent miscompilation of LAPACK codes and tests on Ryzen
2020-06-07 00:05:02 +02:00
Martin Kroeker
13c28889a2
Update "cosmetic fixes for non-C99 compilers"
2020-06-06 15:22:27 +02:00
wjc404
0e3ac4a06b
Add files via upload
2020-06-06 14:56:57 +08:00
Martin Kroeker
28915eed72
Cosmetic fixes for non-C99 compilers
2020-06-05 10:05:34 +02:00
Martin Kroeker
7f60fb6b91
Delete spurious copy of common_param.h
2020-06-05 10:04:16 +02:00
Martin Kroeker
0464e662ad
make blas_quickdivide unsigned and guard against miscompilation
2020-06-05 10:03:36 +02:00
Martin Kroeker
0f9a935a5a
Merge pull request #62 from xianyi/develop
...
rebase
2020-06-05 09:51:06 +02:00
Martin Kroeker
79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
...
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
2020-06-04 19:07:27 +02:00
Martin Kroeker
32c1c1e125
Update azure-pipelines.yml
2020-06-04 19:03:46 +02:00
Martin Kroeker
f1953b8b81
Update azure-pipelines.yml
2020-06-04 17:58:13 +02:00
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-04 14:45:31 +02:00
Martin Kroeker
729303e5ed
Merge pull request #2643 from craft-zhang/cortex-a53
...
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker
547965530f
Merge pull request #2638 from leezu/actions
...
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng
9b7877ccf1
sgemm copy source init
...
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1
Insert prefetch
...
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:08:48 +08:00
Martin Kroeker
3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
...
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker
a8f42ae85c
set cmake build type to Release
2020-06-03 15:28:59 +02:00
Martin Kroeker
e6e2e531bc
revert clang pragma
2020-06-03 15:16:27 +02:00
Martin Kroeker
456dc04441
Update sgemm_kernel_16x4_skylakex_3.c
2020-06-03 15:15:41 +02:00
Martin Kroeker
89323458a9
preset optimization level for apple clang
2020-06-03 15:07:25 +02:00
Martin Kroeker
e153bdeb70
Update dynamic_arch.yml
2020-06-03 13:46:43 +02:00
Martin Kroeker
c2001f7756
Make cmake build verbose to see options in use
2020-06-03 12:18:15 +02:00
Martin Kroeker
c2b3f0b3f6
Revert "keep Apple Clang from optimizing this"
2020-06-03 10:22:15 +02:00
Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
2020-06-03 09:15:29 +02:00
Martin Kroeker
b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
...
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker
9f7358d7dc
Keep Apple Clang from optimizing this
2020-06-03 08:52:53 +02:00
Martin Kroeker
54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers)
2020-06-02 17:31:45 +02:00
Leonard Lausen
5a709b8340
Print CPU info in output
2020-06-01 20:51:11 +00:00
Leonard Lausen
b31a68b835
Add Github Actions test for DYNAMIC_ARCH builds
2020-06-01 20:16:59 +00:00
Martin Kroeker
86552bf4c7
Update f_check
2020-05-31 15:22:12 +02:00
Martin Kroeker
a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
...
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker
4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)
2020-05-31 12:39:36 +02:00
Martin Kroeker
909897f13b
Document option USE_LOCKING
2020-05-31 12:37:57 +02:00
Martin Kroeker
e79245acd9
Merge pull request #2635 from ilayn/patch-1
...
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat
76d2612e0c
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:11:11 +02:00
Martin Kroeker
ced49466f0
Use the fortran compiler to link LAPACK-related benchmarks
...
to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.
2020-05-29 13:35:51 +02:00
Martin Kroeker
6e270f91ec
add support for RETURN_BY_STACK semantics, e.g. clang
2020-05-29 13:29:10 +02:00
Martin Kroeker
200296b0f4
remove libomp from link list only for pgfortran
...
at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default
2020-05-29 13:23:51 +02:00
Martin Kroeker
dd7a650792
Merge pull request #59 from xianyi/develop
...
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker
4a4c50a7ce
Merge pull request #2627 from pkubaj/patch-1
...
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker
d069780e63
Merge pull request #2626 from docularxu/working-gcc-version-detections
...
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj
33c8790603
Add powerpc (32-bit)
...
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu
06387ac0e6
make GCC version detection OS-independent
...
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.
The fix is to move GCC version detection into common part, not
wrapped by anything.
Signed-off-by: Guodong Xu <guodong.xu@linaro.com>
2020-05-25 10:57:03 +00:00
Martin Kroeker
f1a18d245b
Merge pull request #2618 from craft-zhang/cortex-A53
...
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫
2a3aa91354
update CONTRIBUTORS.md, adding myself
2020-05-20 22:35:26 +08:00
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
2020-05-20 22:34:47 +08:00