Commit Graph

4256 Commits

Author SHA1 Message Date
Ali Saidi
208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Martin Kroeker
014fc13995 Merge pull request #2475 from martin-frbg/039changes
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker
f1e05676a0 Merge pull request #2474 from martin-frbg/p9be
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker
d221c50f27 Add Ampere EMAG8180 2020-03-02 00:02:36 +01:00
Martin Kroeker
f14013da7f Update with 0.3.9 changes 2020-03-02 00:01:22 +01:00
Martin Kroeker
4f371b0fbf Use POWER8 kernels on big-endian POWER9 for now 2020-03-01 23:45:58 +01:00
Martin Kroeker
02d60c1563 Merge pull request #35 from xianyi/develop
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker
2e6963259b Merge pull request #2471 from AGSaidi/l3-fix-2
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker
e94590e400 Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker
69d4687142 Merge pull request #2464 from Darkness303/develop
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker
a731a9bbb9 Merge pull request #2467 from AGSaidi/rpcc
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker
1aa5907a2c Merge pull request #2463 from martin-frbg/mingwfix
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker
ea8eec5d17 Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
97ce6bbce2 Fix barriers in level3_thread 2020-02-29 17:45:17 +00:00
Martin Kroeker
a9aeb6745c Merge pull request #2465 from AGSaidi/neoverse-n1
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi
0af9991cc9 Use wait-for-event to not spin in the blas_lock 2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi
c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
j00520245
e1062400c4 New add syr benchmark 2020-02-28 16:36:53 +08:00
Martin Kroeker
a66f4d80c8 Apply MinGW AVX512 compilation fix to fortran options as well
original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
2020-02-27 23:09:40 +01:00
wjc404
dd22eb7621 Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
wjc404
2352331e60 Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
Martin Kroeker
430ee31e66 Merge pull request #2447 from martin-frbg/issue2446
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
2020-02-27 15:07:02 +01:00
Martin Kroeker
8164fd1328 Always assume server-class cpu count for TSV110 and EMAG8180 2020-02-26 22:19:57 +01:00
Martin Kroeker
531c6b96d6 Merge pull request #34 from xianyi/develop
rebase
2020-02-26 22:16:28 +01:00
wjc404
1b980001dd Update zgemm_kernel_4x2_haswell.c 2020-02-26 18:38:12 +08:00
wjc404
2515e1152f Update cgemm_kernel_8x2_haswell.c 2020-02-26 18:36:54 +08:00
Martin Kroeker
ddcbed6690 Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
2020-02-25 18:42:52 +01:00
Martin Kroeker
f8ec538c82 Add Ampere EMAG8180 2020-02-25 14:30:00 +01:00
Martin Kroeker
ca4f7dceff Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake 2020-02-24 20:23:18 +01:00
Martin Kroeker
1ddf9f1067 Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake 2020-02-24 20:16:18 +01:00
Martin Kroeker
4c5fac5a2b Typo fix 2020-02-24 20:15:04 +01:00
Martin Kroeker
320e2648cd Add EMAG8180 to DYNAMIC_CORE list for ARM64 2020-02-24 19:23:46 +01:00
Martin Kroeker
9b732696c6 Add DYNAMIC_ARCH support for ARMV8 EMAG8180 2020-02-24 19:20:00 +01:00
Martin Kroeker
c9dcb3d4a4 Merge pull request #2443 from aaawuanjun/develop
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 13:14:51 +01:00
Martin Kroeker
3bb7f0138e Merge pull request #2442 from martin-frbg/lapackpr390
Apply fix from Reference-LAPACK PR 390
2020-02-24 12:27:01 +01:00
wuanjun 00447568
c93ae92579 [OpenBlas]:benchmark/copy.c has time,x,y data loop problems 2020-02-24 11:23:39 +08:00
Martin Kroeker
87ac1ceb0b Apply fix from Reference-LAPACK PR390, NaN not propagating 2020-02-23 22:40:40 +01:00
Martin Kroeker
9e40c080f2 Apply fix from Reference-LAPACK PR390, NaN not propagating 2020-02-23 22:39:01 +01:00
wjc404
903854c168 Add files via upload 2020-02-22 23:40:02 +08:00
wjc404
a2ff577a30 Update KERNEL.ZEN 2020-02-22 23:39:43 +08:00
wjc404
97a32cb0a5 Update KERNEL.HASWELL 2020-02-22 23:39:20 +08:00
wjc404
f1746e7284 Delete sgemm_kernel_8x4_haswell_2.c 2020-02-22 23:38:48 +08:00
wjc404
f6fcbd7906 Fix performance bug when LDC is a multiple of 1024 2020-02-22 23:37:45 +08:00
Martin Kroeker
1e8410f18c Merge pull request #2441 from martin-frbg/ismin2
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
2020-02-22 11:21:03 +01:00
Martin Kroeker
07454bf4d5 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:58:15 +01:00
Martin Kroeker
4046985913 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:55:52 +01:00
Martin Kroeker
75577f95a7 Merge pull request #33 from xianyi/develop
rebase
2020-02-21 09:56:05 +01:00
Martin Kroeker
33d92c7a37 Merge pull request #2435 from martin-frbg/issue2433
Fix handling of ppc endianness
2020-02-21 00:01:58 +01:00
Martin Kroeker
e57b11acca Add preliminary support for EMAG8180 2020-02-19 19:00:28 +01:00