Commit Graph

6065 Commits

Author SHA1 Message Date
Martin Kroeker 63d26090f5
Merge pull request #64 from xianyi/develop
rebase
2020-06-13 19:14:47 +02:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 3a1b58d54a
Merge pull request #2653 from craft-zhang/cortex-a53
fix INIT8x4 of SGEMM on Arm Cortex-A53
2020-06-10 12:19:33 +02:00
Martin Kroeker f7659be4a0
Merge pull request #2652 from martin-frbg/flang-fixes
Fixes for compilation with flang binary release 20190329
2020-06-09 20:31:06 +02:00
ZhangDanfeng bc6fd20a40 fix INIT8x4
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-10 01:01:16 +08:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker ba2c5b404d
When building with flang, use it also for the final link step to get dependencies right 2020-06-09 16:09:34 +02:00
Martin Kroeker f07a80354b
Apply previously AOCC-specific workaround to all versions of flang 2020-06-09 16:07:03 +02:00
Martin Kroeker fdd1b50263
Merge pull request #63 from xianyi/develop
rebase
2020-06-09 15:54:30 +02:00
Leonard Lausen b98923f33a Test enforce -O1 for flang 2020-06-09 06:54:47 +00:00
Leonard Lausen 4cb1db0e3b Test flang build 2020-06-09 06:31:17 +00:00
Martin Kroeker 430e8b45fe
Merge pull request #2648 from martin-frbg/lapack411
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
2020-06-07 19:45:52 +02:00
Martin Kroeker 88fe85f4e0
Merge pull request #2647 from martin-frbg/aocc-flang
Small fixes for flang in general and the AMD AOCC version of it in particular
2020-06-07 19:45:11 +02:00
Martin Kroeker 89091e6b64
Merge pull request #2645 from martin-frbg/misc_fixes
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker 522aaf53bf
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
Reference-LAPACK issue 411
2020-06-07 14:30:20 +02:00
Martin Kroeker c3574ffe53
Merge pull request #2646 from wjc404/develop
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
Martin Kroeker 4e28dc6353
Use only -O1 with AMD AOCC version of flang
to prevent miscompilation of LAPACK codes and tests on Ryzen
2020-06-07 00:05:02 +02:00
Martin Kroeker 13c28889a2
Update "cosmetic fixes for non-C99 compilers" 2020-06-06 15:22:27 +02:00
wjc404 0e3ac4a06b
Add files via upload 2020-06-06 14:56:57 +08:00
Martin Kroeker 28915eed72
Cosmetic fixes for non-C99 compilers 2020-06-05 10:05:34 +02:00
Martin Kroeker 7f60fb6b91
Delete spurious copy of common_param.h 2020-06-05 10:04:16 +02:00
Martin Kroeker 0464e662ad
make blas_quickdivide unsigned and guard against miscompilation 2020-06-05 10:03:36 +02:00
Martin Kroeker 0f9a935a5a
Merge pull request #62 from xianyi/develop
rebase
2020-06-05 09:51:06 +02:00
Martin Kroeker 79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker 32c1c1e125
Update azure-pipelines.yml 2020-06-04 19:03:46 +02:00
Martin Kroeker f1953b8b81
Update azure-pipelines.yml 2020-06-04 17:58:13 +02:00
Martin Kroeker 6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker 729303e5ed
Merge pull request #2643 from craft-zhang/cortex-a53
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker 547965530f
Merge pull request #2638 from leezu/actions
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng 9b7877ccf1 sgemm copy source init
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:10:45 +08:00
ZhangDanfeng f82fa802d1 Insert prefetch
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:08:48 +08:00
Martin Kroeker 3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker a8f42ae85c
set cmake build type to Release 2020-06-03 15:28:59 +02:00
Martin Kroeker e6e2e531bc
revert clang pragma 2020-06-03 15:16:27 +02:00
Martin Kroeker 456dc04441
Update sgemm_kernel_16x4_skylakex_3.c 2020-06-03 15:15:41 +02:00
Martin Kroeker 89323458a9
preset optimization level for apple clang 2020-06-03 15:07:25 +02:00
Martin Kroeker e153bdeb70
Update dynamic_arch.yml 2020-06-03 13:46:43 +02:00
Martin Kroeker c2001f7756
Make cmake build verbose to see options in use 2020-06-03 12:18:15 +02:00
Martin Kroeker c2b3f0b3f6
Revert "keep Apple Clang from optimizing this" 2020-06-03 10:22:15 +02:00
Martin Kroeker f16e39554d
Change PPCG4 CGEMM_M to match kernel change 2020-06-03 09:15:29 +02:00
Martin Kroeker b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker 9f7358d7dc
Keep Apple Clang from optimizing this 2020-06-03 08:52:53 +02:00
Martin Kroeker 54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers) 2020-06-02 17:31:45 +02:00
Leonard Lausen 5a709b8340 Print CPU info in output 2020-06-01 20:51:11 +00:00
Leonard Lausen b31a68b835 Add Github Actions test for DYNAMIC_ARCH builds 2020-06-01 20:16:59 +00:00
Martin Kroeker 86552bf4c7
Update f_check 2020-05-31 15:22:12 +02:00
Martin Kroeker a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker 4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker 909897f13b
Document option USE_LOCKING 2020-05-31 12:37:57 +02:00