Martin Kroeker
6e70621b0d
Merge pull request #2483 from aaawuanjun/develop
...
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-04 07:59:56 +01:00
Martin Kroeker
e6edb7431f
Merge pull request #2466 from AGSaidi/acq-rel-1
...
Switch blas_server to use acq/rel semantics
2020-03-04 07:59:31 +01:00
Darkness303
114dbec947
1.Add syr2 benchmark
...
2.Fixed some errors
2020-03-04 14:09:10 +08:00
Martin Kroeker
d68e4ba59b
Fix cut/paste glitch
2020-03-03 21:37:48 +01:00
Martin Kroeker
635c9e4e09
Restore initializers for mutex and conditional
2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan
2afc074803
Fix DYNAMIC_ARCH build for POWER9
...
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
wuanjun 00447568
5d6c688a7e
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:57 +08:00
wuanjun 00447568
87baf9cfe6
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:28 +08:00
wuanjun 00447568
c0ca7d6258
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 17:39:26 +08:00
wuanjun 00447568
f682d19ed4
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:37:33 +08:00
wuanjun 00447568
790d50fbba
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:13:49 +08:00
Martin Kroeker
59243d49ab
Merge pull request #2479 from Darkness303/develop
...
Fix potential index overflows at large matrix sizes in the benchmark codes
2020-03-03 08:46:49 +01:00
Martin Kroeker
d41f83e128
Merge pull request #2436 from marxin/improve-utest-coverage
...
Improve test coverage for utests.
2020-03-03 08:43:00 +01:00
Martin Kroeker
ee4ca7ca6b
Merge pull request #2481 from ChinouneMehdi/fix2480
...
Fix #2480
2020-03-02 21:21:29 +01:00
Martin Kroeker
e326c89ae8
Merge pull request #2478 from MacChen02/develop
...
Update benchmark statistical time function
2020-03-02 21:20:51 +01:00
مهدي شينون (Mehdi Chinoune)
21f6c4b5a9
fixes #2480
2020-03-02 17:22:28 +01:00
Martin Liska
7ca4ffdbdd
Improve test coverage for utests.
2020-03-02 13:38:17 +01:00
jianghesong
0f65c05cd1
fix core dumped error
2020-03-02 19:13:45 +08:00
MacChen02
917d243580
Update benchmark statistical time function
...
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
2020-03-02 14:36:27 +08:00
Ali Saidi
43c2e845ab
Switch blas_server to use acq/rel semantics
...
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
2020-03-02 02:52:49 +00:00
Martin Kroeker
33f76a6c37
Version 0.3.9
2020-03-02 00:10:20 +01:00
Martin Kroeker
960dec234f
Version 0.3.9
2020-03-02 00:09:49 +01:00
Martin Kroeker
6b92979f35
Merge pull request #2476 from xianyi/develop
...
Update from develop in preparation for 0.3.9
2020-03-02 00:08:32 +01:00
Martin Kroeker
014fc13995
Merge pull request #2475 from martin-frbg/039changes
...
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker
f1e05676a0
Merge pull request #2474 from martin-frbg/p9be
...
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker
d221c50f27
Add Ampere EMAG8180
2020-03-02 00:02:36 +01:00
Martin Kroeker
f14013da7f
Update with 0.3.9 changes
2020-03-02 00:01:22 +01:00
Martin Kroeker
4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now
2020-03-01 23:45:58 +01:00
Martin Kroeker
02d60c1563
Merge pull request #35 from xianyi/develop
...
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker
2e6963259b
Merge pull request #2471 from AGSaidi/l3-fix-2
...
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker
e94590e400
Merge pull request #2468 from AGSaidi/wfe
...
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker
69d4687142
Merge pull request #2464 from Darkness303/develop
...
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker
a731a9bbb9
Merge pull request #2467 from AGSaidi/rpcc
...
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker
1aa5907a2c
Merge pull request #2463 from martin-frbg/mingwfix
...
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
...
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
97ce6bbce2
Fix barriers in level3_thread
2020-02-29 17:45:17 +00:00
Martin Kroeker
a9aeb6745c
Merge pull request #2465 from AGSaidi/neoverse-n1
...
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi
0af9991cc9
Use wait-for-event to not spin in the blas_lock
2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c
Make rpcc() on arm64 get closer to what x86 returns
...
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi
c623a965f9
Add Neoverse-N1 core
...
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
j00520245
e1062400c4
New add syr benchmark
2020-02-28 16:36:53 +08:00
Martin Kroeker
a66f4d80c8
Apply MinGW AVX512 compilation fix to fortran options as well
...
original issue was #1708 , I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
2020-02-27 23:09:40 +01:00
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
2020-02-27 22:26:15 +08:00
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
2020-02-27 22:25:19 +08:00
Martin Kroeker
430ee31e66
Merge pull request #2447 from martin-frbg/issue2446
...
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
2020-02-27 15:07:02 +01:00
Xianyi Zhang
265ab484c8
Change default RISC-V 64-bit corename to RISCV64_GENERIC
...
e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc
2020-02-27 14:46:15 +08:00
Xianyi Zhang
44020a42a4
Fixed compile bug for RV64.
2020-02-27 14:29:42 +08:00
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
2020-02-27 13:53:49 +08:00
Martin Kroeker
8164fd1328
Always assume server-class cpu count for TSV110 and EMAG8180
2020-02-26 22:19:57 +01:00
Martin Kroeker
531c6b96d6
Merge pull request #34 from xianyi/develop
...
rebase
2020-02-26 22:16:28 +01:00