Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 2afb10975d
Merge pull request #2485 from Darkness303/develop
Add syr2 benchmark
2020-03-06 14:29:27 +01:00
Martin Kroeker dbef479227
Merge pull request #2469 from AGSaidi/acq-rel-2
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
2020-03-06 14:28:58 +01:00
Ali Saidi 208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
chenxuqiang 32c847df45 benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
2020-03-06 01:02:02 -05:00
shengyang e0df9485d4 Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
modified:   benchmark/Makefile
	new file:   benchmark/rotm.c
2020-03-05 10:05:59 +08:00
s00527847 0f1a2b12f9 add benchmark for spr/spr2 2020-03-04 15:50:19 -05:00
q00437336 233838b4bc change clock to CLOCK_PROCESS_CPUTIME_ID 2020-03-04 03:54:40 -05:00
l00546269 13f9afbd99 [OpenBLAS]:modifed the Makefile
[Description]:add c/fortran compiler version information in final note
2020-03-04 16:47:23 +08:00
q00437336 de74e11641 add benchmark for trsv 2020-03-04 03:23:22 -05:00
Martin Kroeker ad9e53154d
Merge pull request #2484 from RajalakshmiSR/power-dynamic
Fix DYNAMIC_ARCH build for POWER9
2020-03-04 08:06:06 +01:00
Martin Kroeker 6e70621b0d
Merge pull request #2483 from aaawuanjun/develop
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-04 07:59:56 +01:00
Martin Kroeker e6edb7431f
Merge pull request #2466 from AGSaidi/acq-rel-1
Switch blas_server to use acq/rel semantics
2020-03-04 07:59:31 +01:00
Darkness303 114dbec947 1.Add syr2 benchmark
2.Fixed some errors
2020-03-04 14:09:10 +08:00
Martin Kroeker d68e4ba59b
Fix cut/paste glitch 2020-03-03 21:37:48 +01:00
Martin Kroeker 635c9e4e09
Restore initializers for mutex and conditional 2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan 2afc074803 Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks.  This patch fixes some of the macros that are used
to check compiler version.  On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
wuanjun 00447568 5d6c688a7e Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 19:03:57 +08:00
wuanjun 00447568 87baf9cfe6 Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 19:03:28 +08:00
wuanjun 00447568 c0ca7d6258 Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop 2020-03-03 17:39:26 +08:00
wuanjun 00447568 f682d19ed4 [OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv 2020-03-03 17:37:33 +08:00
wuanjun 00447568 790d50fbba [OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv 2020-03-03 17:13:49 +08:00
Martin Kroeker 59243d49ab
Merge pull request #2479 from Darkness303/develop
Fix potential index overflows at large matrix sizes in the benchmark codes
2020-03-03 08:46:49 +01:00
Martin Kroeker d41f83e128
Merge pull request #2436 from marxin/improve-utest-coverage
Improve test coverage for utests.
2020-03-03 08:43:00 +01:00
Martin Kroeker ee4ca7ca6b
Merge pull request #2481 from ChinouneMehdi/fix2480
Fix #2480
2020-03-02 21:21:29 +01:00
Martin Kroeker e326c89ae8
Merge pull request #2478 from MacChen02/develop
Update benchmark statistical time function
2020-03-02 21:20:51 +01:00
مهدي شينون (Mehdi Chinoune) 21f6c4b5a9 fixes #2480 2020-03-02 17:22:28 +01:00
Martin Liska 7ca4ffdbdd
Improve test coverage for utests. 2020-03-02 13:38:17 +01:00
jianghesong 0f65c05cd1 fix core dumped error 2020-03-02 19:13:45 +08:00
MacChen02 917d243580
Update benchmark statistical time function
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
2020-03-02 14:36:27 +08:00
Ali Saidi 43c2e845ab Switch blas_server to use acq/rel semantics
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.

We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.

The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
2020-03-02 02:52:49 +00:00
Martin Kroeker 33f76a6c37
Version 0.3.9 2020-03-02 00:10:20 +01:00
Martin Kroeker 960dec234f
Version 0.3.9 2020-03-02 00:09:49 +01:00
Martin Kroeker 6b92979f35
Merge pull request #2476 from xianyi/develop
Update from develop in preparation for 0.3.9
2020-03-02 00:08:32 +01:00
Martin Kroeker 014fc13995
Merge pull request #2475 from martin-frbg/039changes
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker f1e05676a0
Merge pull request #2474 from martin-frbg/p9be
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker d221c50f27
Add Ampere EMAG8180 2020-03-02 00:02:36 +01:00
Martin Kroeker f14013da7f
Update with 0.3.9 changes 2020-03-02 00:01:22 +01:00
Martin Kroeker 4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now 2020-03-01 23:45:58 +01:00
Martin Kroeker 02d60c1563
Merge pull request #35 from xianyi/develop
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker 2e6963259b
Merge pull request #2471 from AGSaidi/l3-fix-2
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker e94590e400
Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker 69d4687142
Merge pull request #2464 from Darkness303/develop
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker a731a9bbb9
Merge pull request #2467 from AGSaidi/rpcc
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker 1aa5907a2c
Merge pull request #2463 from martin-frbg/mingwfix
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker ea8eec5d17
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi 97ce6bbce2 Fix barriers in level3_thread 2020-02-29 17:45:17 +00:00
Martin Kroeker a9aeb6745c
Merge pull request #2465 from AGSaidi/neoverse-n1
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi 0af9991cc9 Use wait-for-event to not spin in the blas_lock 2020-02-29 04:23:48 +00:00
Ali Saidi 19f3a4091c Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00