Commit Graph

4420 Commits

Author SHA1 Message Date
Martin Kroeker b0b02a080d
Add compiler options for MIPS32 24K/1004K 2020-04-19 06:50:51 +02:00
Martin Kroeker a1fc98dc57
rename 1004K, 24K to MIPS1004K, MIPS24K to avoid identifier naming problem 2020-04-18 23:50:23 +02:00
Martin Kroeker 00172d440b
Typo fix in MIPS24K addition 2020-04-18 21:16:49 +02:00
Martin Kroeker d712ea724c
Add MIPS24K support 2020-04-18 21:10:18 +02:00
Martin Kroeker 61bbae3ac1
Handle MIPS24K like P5600
and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)
2020-04-18 21:09:32 +02:00
Martin Kroeker 1c1ca2bc0a
Merge pull request #47 from xianyi/develop
rebase
2020-04-18 21:07:14 +02:00
Martin Kroeker 236a3d8ce6
Merge pull request #2563 from zelong-1024/develop
[OpenBLAS]: benchmark error of potrf
2020-04-16 11:45:32 +02:00
l00536773 6b7ef6543a [OpenBLAS]: benchmark error of potrf
[description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F.
[solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation.
[dts]:
2020-04-16 10:55:10 +08:00
Martin Kroeker 250e6f8039
Merge pull request #2557 from martin-frbg/dronebadge
Update and reformat README
2020-04-15 20:23:43 +02:00
Martin Kroeker 7a6d0016b0
Merge pull request #2556 from martin-frbg/epicdrone
Add a drone.io multithread test for x86_64
2020-04-15 20:23:17 +02:00
Martin Kroeker e8e8a6e608
Restore USE_OPENMP in the x86 thread test 2020-04-15 19:26:12 +02:00
Martin Kroeker 579811fb6a
Move all 19.04-based jobs back to ubuntu 18.04 2020-04-15 17:38:33 +02:00
Martin Kroeker 84a9614345
try x86_64 test without openmp 2020-04-14 19:18:35 +02:00
Martin Kroeker b969533703
Add drone.io badge, mention EMAG8180 support, reformat the DYNAMIC_ARCH paragraph 2020-04-14 10:53:28 +02:00
Martin Kroeker 0f08f3efa6
Add a multithread test for x86_64 2020-04-13 22:46:12 +02:00
Martin Kroeker c861b2a7bd
Merge pull request #2553 from martin-frbg/issue2444
Add a read memory barrier to the traversal of the buffer slot list
2020-04-13 21:28:59 +02:00
Martin Kroeker cf62adffbb
Merge pull request #2555 from martin-frbg/issue1137
Handle unaligned data in the SSE2 copy kernel
2020-04-13 18:29:56 +02:00
Martin Kroeker 3eec7d382c
ARMV7 does not support DMB ISHLD, use DMB ISH 2020-04-13 15:56:31 +02:00
Martin Kroeker 5b0093b5fe
Convert aligned moves to unaligned
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
2020-04-13 14:58:52 +02:00
Martin Kroeker f41600e66f
Add a read barrier in the traversing of the buffer list
Needed on systems with weak memory ordering - the inferior, partially working fix from #2544 was already removed in #2551
2020-04-13 12:34:02 +02:00
Martin Kroeker f5efecb7ca
Add (empty) read barrier definition 2020-04-13 12:24:10 +02:00
Martin Kroeker a52bdd9d7b
Add (empty) read barrier definition 2020-04-13 12:22:35 +02:00
Martin Kroeker db3226a646
Add (empty) read barrier definition 2020-04-13 12:18:48 +02:00
Martin Kroeker 69b6e258d8
Add (empty) read barrier definition 2020-04-13 12:17:41 +02:00
Martin Kroeker 3d4db4d002
Add read barrier definition 2020-04-13 12:16:44 +02:00
Martin Kroeker 99dde1d2c9
Add read barrier definition 2020-04-13 12:14:58 +02:00
Martin Kroeker ee6b3df02c
Add read barrier definition 2020-04-13 12:14:06 +02:00
Martin Kroeker 25e879fe92
Add (empty) read barrier definition 2020-04-13 12:12:54 +02:00
Martin Kroeker d237dc1360
Add read barrier definition 2020-04-13 12:11:58 +02:00
Martin Kroeker 8692456226
Add read barrier definition 2020-04-13 12:10:37 +02:00
Martin Kroeker d1d69e1b9a
Add read barrier definition 2020-04-13 12:09:24 +02:00
Martin Kroeker 20d0cb2f65
Merge pull request #46 from xianyi/develop
rebase
2020-04-13 12:06:40 +02:00
Martin Kroeker e7f0da9295
Merge pull request #2551 from martin-frbg/issue2538-2
Increase BUFFER_SIZEs and add a safeguard; supply GEMM_R for POWER8/9
2020-04-12 22:34:41 +02:00
Martin Kroeker e9bfa2291a
Fix parameter overflow 2020-04-12 19:47:02 +02:00
Martin Kroeker 2a28448a96
Add safeguards for sufficient BUFFER_SIZE 2020-04-12 19:45:36 +02:00
Martin Kroeker a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
As shown in #2538, default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker f73391c9c9
Merge pull request #45 from xianyi/develop
rebase
2020-04-12 19:39:05 +02:00
Martin Kroeker 7905383cb5
Merge pull request #2547 from sharvil/develop
Add API to set thread affinity on Linux.
2020-04-11 00:35:38 +02:00
Martin Kroeker a8cbd451bf
Merge pull request #2541 from bapt/develop
libname: treat FreeBSD and DragonFly like linux and sunos
2020-04-11 00:35:07 +02:00
Martin Kroeker eecd8c3204
Merge pull request #2548 from gxw-loongson/develop
Add a GENERIC target for 64bit MIPS
2020-04-11 00:34:04 +02:00
Martin Kroeker ea85eb2e02
Merge pull request #2549 from martin-frbg/fixthreadtest
Match thread count in cpp_thread_test to host capability
2020-04-10 23:54:40 +02:00
Martin Kroeker 66f89c0aaf
Match thread count to machine capability 2020-04-10 22:06:44 +02:00
gxw 8d07cf9b67 Fix compilation problem on loongson platform
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Sharvil Nanavati 7b4773b24d Add API to set thread affinity on Linux.
Issue: #2545
2020-04-08 12:49:35 -07:00
Martin Kroeker 69f277f8ee
Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues (#2544)
* Add another memory barrier in memory.c to prevent races in memory slot allocation

* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
2020-04-08 11:04:51 +02:00
Martin Kroeker 3a6d51c2fd
Merge pull request #44 from xianyi/develop
Add a Z13 build to the Travis configuration (#2542)
2020-04-04 22:48:53 +02:00
Martin Kroeker 1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
Revert 42 z12ci to keep forked develop clean
2020-04-04 22:46:58 +02:00
Martin Kroeker a56c9ec52a Revert "Add IBM Z to Travis configuration (#42)"
This reverts commit 7972beb375.
2020-04-04 22:45:01 +02:00
Martin Kroeker 4ae6d1a01b
Add a Z13 build to the Travis configuration (#2542)
* Add IBM Z to Travis configuration
2020-04-03 16:02:11 +02:00
Martin Kroeker 7972beb375
Add IBM Z to Travis configuration (#42)
* Add IBM Z to Travis configuration
2020-04-03 15:59:18 +02:00