Commit Graph

4379 Commits

Author SHA1 Message Date
Martin Kroeker eecd8c3204
Merge pull request #2548 from gxw-loongson/develop
Add a GENERIC target for 64bit MIPS
2020-04-11 00:34:04 +02:00
Martin Kroeker ea85eb2e02
Merge pull request #2549 from martin-frbg/fixthreadtest
Match thread count in cpp_thread_test to host capability
2020-04-10 23:54:40 +02:00
Martin Kroeker 66f89c0aaf
Match thread count to machine capability 2020-04-10 22:06:44 +02:00
gxw 8d07cf9b67 Fix compilation problem on loongson platform
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Martin Kroeker 69f277f8ee
Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues (#2544)
* Add another memory barrier in memory.c to prevent races in memory slot allocation

* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
2020-04-08 11:04:51 +02:00
Martin Kroeker 3a6d51c2fd
Merge pull request #44 from xianyi/develop
Add a Z13 build to the Travis configuration (#2542)
2020-04-04 22:48:53 +02:00
Martin Kroeker 1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
Revert 42 z12ci to keep forked develop clean
2020-04-04 22:46:58 +02:00
Martin Kroeker a56c9ec52a Revert "Add IBM Z to Travis configuration (#42)"
This reverts commit 7972beb375.
2020-04-04 22:45:01 +02:00
Martin Kroeker 4ae6d1a01b
Add a Z13 build to the Travis configuration (#2542)
* Add IBM Z to Travis configuration
2020-04-03 16:02:11 +02:00
Martin Kroeker 7972beb375
Add IBM Z to Travis configuration (#42)
* Add IBM Z to Travis configuration
2020-04-03 15:59:18 +02:00
Martin Kroeker 7bd8624b79
Merge pull request #41 from xianyi/develop
rebase
2020-04-02 10:32:19 +02:00
Martin Kroeker 806f89166e
Make ARMV7 compile with xcode and add a CI job for it (#2537)
* Add an ARMV7 iOS build on Travis

* thread_local appears to be unavailable on ARMV7 iOS

* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH

* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker f059e614eb
Merge pull request #2536 from martin-frbg/recurs
Add "recursive" option for LAPACK builds with ifort or pgfort as well
2020-04-01 20:00:13 +02:00
Martin Kroeker e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well 2020-04-01 15:39:16 +02:00
Martin Kroeker a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
2020-03-31 20:53:13 +02:00
Martin Kroeker 144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well 2020-03-31 16:53:56 +02:00
Martin Kroeker 07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
2020-03-31 00:21:02 +02:00
Martin Kroeker 567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
Martin Kroeker 018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
2020-03-30 20:15:37 +02:00
Martin Kroeker 79fd006c58
Expose the support_avx512 function provided in dynamic.c 2020-03-26 21:25:39 +01:00
Martin Kroeker 8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 2020-03-26 21:12:56 +01:00
Martin Kroeker a986d42ea6
Merge pull request #39 from xianyi/develop
rebase
2020-03-26 21:06:51 +01:00
Martin Kroeker b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
2020-03-24 15:44:46 +01:00
Martin Kroeker 0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
add ctest for drotm and modified ctest for drot.
2020-03-24 15:44:27 +01:00
Martin Kroeker fe47dc8673
Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
related to #2526
2020-03-23 19:35:51 +01:00
Martin Kroeker 9f67d03d3b
Merge pull request #2527 from martin-frbg/gemmdirect
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-23 12:47:19 +01:00
shengyang 50f4fb2fbd add ctest for drotm and modified ctest for drot.
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-23 10:47:05 +08:00
Martin Kroeker 6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 2020-03-22 14:33:16 +01:00
Martin Kroeker 8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker 5f6f6a2c7d
Merge pull request #2525 from andreas-schwab/develop
Fix ARCHCONFIG for Neoverse-N1
2020-03-21 18:47:48 +01:00
Andreas Schwab 71cf2acdef Fix ARCHCONFIG for Neoverse-N1
../config_kernel.h:24:9: warning: missing whitespace after the macro name
   24 | #define ARMV8-march armv8.2-a
      |         ^~~~~
2020-03-21 17:35:42 +01:00
Martin Kroeker 1d9773b800
Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker a46a8c4956
Merge pull request #2518 from shengyang-3390/dev
add ctest for srotm and modified ctest for srot.
2020-03-20 23:00:06 +01:00
Martin Kroeker 7ae737e04c
Merge pull request #2519 from martin-frbg/issue2472
Fix cmake compilation with ifort on Windows
2020-03-20 22:57:44 +01:00
wjc404 64daad4365
Update param.h 2020-03-20 21:46:18 +00:00
wjc404 b8307768e2
Add files via upload 2020-03-21 05:42:10 +08:00
Martin Kroeker 6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
2020-03-20 01:08:10 +01:00
Martin Kroeker c0da205412
Merge pull request #38 from xianyi/develop
rebase
2020-03-20 01:05:22 +01:00
shengyang a06d78556d add ctest for srotm and modified ctest for srot.
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-18 14:17:32 +08:00
Martin Kroeker af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
2020-03-17 10:12:53 +01:00
wjc404 62b9608986
Update KERNEL.SKYLAKEX 2020-03-17 12:52:55 +08:00
Martin Kroeker 717c604aeb
Merge pull request #2515 from zelong-1024/develop
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
2020-03-16 21:59:55 +01:00
Martin Kroeker ce33da4cab
Merge pull request #2513 from aaawuanjun/develop
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
2020-03-16 21:58:55 +01:00
Martin Kroeker a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
2020-03-16 21:58:34 +01:00
wjc404 cdc0e9011e
Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
wjc404 fa049d49c2
AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
l00536773 d45c53ecf1 [OpenBLAS]: benchmark for her/her2 LEVEL2 functions
[description]: benchmark for her/her2
[solution]: added benchmark for her/her2, modified makefile in benchmark
[dts]:
2020-03-16 11:19:05 +08:00
Martin Kroeker c2840997db
Merge pull request #2508 from liujingjue/develop
[OpenBLAS]:fix the iamax benchmark error
2020-03-14 14:21:30 +01:00
Martin Kroeker c775458299
Merge pull request #2512 from martin-frbg/lapackh
Move declarations of lapack_complex_custom types outside the extern C
2020-03-14 13:27:40 +01:00