Martin Kroeker
|
f20c4edc33
|
Merge pull request #3288 from martin-frbg/getrf-2
Add lower threshold for multithreading in ?GETRF
|
2021-07-07 20:45:57 +02:00 |
Martin Kroeker
|
3cfdb1770c
|
Remove code that disabled EXTRALIB on RISCV C910V
|
2021-07-06 20:21:07 +02:00 |
Martin Kroeker
|
8186963d8c
|
Add lower limit for multithreading
|
2021-07-04 17:00:26 +02:00 |
Martin Kroeker
|
a4543e4918
|
Handle OPENBLAS_LOOP
|
2021-07-04 16:59:43 +02:00 |
Martin Kroeker
|
2376aa1e8c
|
Merge pull request #3289 from martin-frbg/issue3283
Update README to mention availability of the Windows binaries in the Releases section
|
2021-07-02 00:19:06 +02:00 |
Martin Kroeker
|
4620f98812
|
Mention availability of the Windows binaries in the Releases section
|
2021-07-01 19:24:35 +02:00 |
Martin Kroeker
|
726c44242b
|
Add lower threshold for multithreading
|
2021-07-01 17:41:05 +02:00 |
Martin Kroeker
|
dcfc5cf714
|
Handle OPENBLAS_LOOPS for more stable results
|
2021-07-01 17:39:37 +02:00 |
Martin Kroeker
|
06e3b07ecb
|
Handle OPENBLAS_LOOPS and OPENBLAS_TEST options
|
2021-07-01 17:38:45 +02:00 |
Martin Kroeker
|
623be6600a
|
Merge pull request #3284 from martin-frbg/potrf_potri
Add lower thresholds for multithreading in POTRF/POTRI and improve the related benchmark
|
2021-06-30 07:42:45 +02:00 |
Martin Kroeker
|
7ddc9d384c
|
Merge pull request #3287 from martin-frbg/appveyor-conda
Work around current conda/tqdm auto-update problem on Appveyor
|
2021-06-29 20:09:26 +02:00 |
Martin Kroeker
|
6ebcce229f
|
Work around current conda/tqdm auto-update problem
|
2021-06-29 17:17:34 +02:00 |
Martin Kroeker
|
1b5620b66e
|
Add lower threshold for multithreading in ?potrf and ?potri
|
2021-06-26 23:47:41 +02:00 |
Martin Kroeker
|
1f8bda71b9
|
Add OPENBLAS_LOOPS support to potrf/potrs/potri benchmark
|
2021-06-26 23:46:00 +02:00 |
Martin Kroeker
|
3be660c000
|
Add interface declarations for ?potri
|
2021-06-26 23:44:56 +02:00 |
Martin Kroeker
|
1a8b6134c2
|
Merge pull request #3278 from brada4/A55
Add CORTEXA55 cpuid 0xd05 support
|
2021-06-23 13:05:17 +02:00 |
Martin Kroeker
|
f0b822a709
|
Update cpuid_arm64.c
|
2021-06-23 10:11:01 +02:00 |
User User-User
|
130327e9af
|
OK
|
2021-06-22 23:58:59 +02:00 |
User User-User
|
750719528a
|
bugz
|
2021-06-20 16:40:43 +02:00 |
User User-User
|
91e2b11d3c
|
add to cmake listings too
|
2021-06-20 15:32:42 +02:00 |
User User-User
|
548aa522e5
|
remove misplaced file
|
2021-06-20 15:29:25 +02:00 |
User User-User
|
6423b282a1
|
dynamic_arch
|
2021-06-20 14:19:41 +02:00 |
User User-User
|
9335d42740
|
add gcc8 version matching
|
2021-06-19 22:21:39 +02:00 |
User User-User
|
39ef0880ae
|
copy conf
|
2021-06-19 21:49:58 +02:00 |
User User-User
|
b7da75e4fd
|
WiP CORTEX A55 support
|
2021-06-19 21:37:51 +02:00 |
Martin Kroeker
|
a7627c5afd
|
Merge pull request #3276 from martin-frbg/issue3274
Add workaround for another macro name collision with Windows 10 SDK winnt.h
|
2021-06-16 16:37:30 +02:00 |
Martin Kroeker
|
9499ab0d45
|
Merge pull request #3275 from martin-frbg/lapack580
Fix missing EXTERNAL declarations in LAPACK TESTING (LAPACK PR 580)
|
2021-06-16 13:41:38 +02:00 |
Martin Kroeker
|
307c4c0786
|
Fix typo
|
2021-06-16 13:41:16 +02:00 |
Martin Kroeker
|
e83df93975
|
Work around another recent macro name collision with winnt.h
|
2021-06-16 12:32:34 +02:00 |
Martin Kroeker
|
13fa9f737d
|
Modify defines for CR and RC to work around name collision on Windows
|
2021-06-16 12:17:25 +02:00 |
Martin Kroeker
|
5958ffc9b6
|
Declare DZASUM as EXTERNAL
|
2021-06-16 09:43:39 +02:00 |
Martin Kroeker
|
cd0e4aadb1
|
Declare ZDROT as EXTERNAL
|
2021-06-16 09:41:18 +02:00 |
Martin Kroeker
|
e2621ef93a
|
Declare SROT as EXTERNAL
|
2021-06-16 09:40:15 +02:00 |
Martin Kroeker
|
9e1b43ea9b
|
Declare DROT as EXTERNAL
|
2021-06-16 09:39:28 +02:00 |
Martin Kroeker
|
5269348178
|
Declare CSROT as EXTERNAL
|
2021-06-16 09:35:12 +02:00 |
Martin Kroeker
|
92e024bbb3
|
Declare SCASUM as EXTERNAL
|
2021-06-16 09:33:23 +02:00 |
Martin Kroeker
|
c4b464cac6
|
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
|
2021-06-15 22:58:48 +02:00 |
Gordon Fossum
|
e6dd44d989
|
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
|
2021-06-15 13:07:47 -05:00 |
Martin Kroeker
|
baf03a0937
|
Merge pull request #3252 from martin-frbg/more_shortcuts
Further shortcuts for (small) cases that do not need buffer allocation
|
2021-06-15 16:14:20 +02:00 |
Martin Kroeker
|
7aab5e826c
|
Merge pull request #3250 from martin-frbg/gemv-shortcut
Add shortcut for small-size S/D GEMV_N with increments of one
|
2021-06-15 14:50:14 +02:00 |
Martin Kroeker
|
29417adf4c
|
Merge pull request #3270 from ggouaillardet/topic/dznrm2_tx2
arm64: add the missing d9 register to the clobber list
|
2021-06-14 13:00:33 +02:00 |
Gilles Gouaillardet
|
9d292d37b2
|
arm64: add the missing d9 register to the clobber list
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
|
2021-06-14 17:01:28 +09:00 |
Martin Kroeker
|
2e8ff4a781
|
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
|
2021-06-10 18:05:47 +02:00 |
Martin Kroeker
|
dbba381dc3
|
Merge pull request #3260 from intelmy/sgemv_t_opt
Optimized sgemv_t for small N based on AVX512
|
2021-06-10 16:08:24 +02:00 |
Martin Kroeker
|
f61991d439
|
Merge pull request #3264 from RajalakshmiSR/sbgemmp10
POWER10: Fixes for sbgemm kernel
|
2021-06-10 16:07:47 +02:00 |
Martin Kroeker
|
efdbdd8f82
|
Add prefetch values for power3
|
2021-06-10 11:20:29 +02:00 |
Martin Kroeker
|
3906ef3b0f
|
Add prefetch values for power3
|
2021-06-10 11:19:40 +02:00 |
Martin Kroeker
|
8adf0971d8
|
Add prefetch values for power3
|
2021-06-10 11:18:22 +02:00 |
Martin Kroeker
|
08e2e60762
|
Add prefetch values for power3
|
2021-06-10 11:17:33 +02:00 |
Martin Kroeker
|
fb9e678235
|
Fix caxpy/zaxpy for big-endian
|
2021-06-10 11:15:48 +02:00 |