Martin Kroeker
ae2a0995cc
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker
7887c45077
Merge pull request #17 from xianyi/develop
...
rebase
2019-11-17 19:09:49 +01:00
Martin Kroeker
3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be
...
Fix PPC970 big-endian support
2019-11-17 18:22:24 +01:00
Martin Kroeker
b3ac6ee222
Define alternate kernels for big-endian PPC970
...
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
2019-11-17 15:19:39 +01:00
Martin Kroeker
6082e556cd
Use "generic" S/CGEMM unroll M on big-endian PPC970
...
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
2019-11-17 15:10:26 +01:00
Martin Kroeker
92315173d5
Merge pull request #2308 from martin-frbg/ctestfix
...
Fix potential issue in the c/z blas3 ctests
2019-11-15 08:33:17 +01:00
Martin Kroeker
351d12b94e
Fix potential spurious failure from uninitialized variable
2019-11-15 00:20:36 +01:00
Martin Kroeker
bf73aa141b
Fix potential spurious failure from uninitialized variable
2019-11-15 00:19:24 +01:00
Martin Kroeker
71e96163db
Merge pull request #2305 from wjc404/develop
...
AVX512 CGEMM & ZGEMM kernels
2019-11-12 07:38:37 +01:00
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
...
96-99% 1-thread performance of MKL2018
2019-11-11 20:04:52 +08:00
Martin Kroeker
4e466d739c
Merge pull request #15 from xianyi/develop
...
rebase
2019-11-09 18:52:08 +01:00
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
...
Optimize SGEMM on SKYLAKEX CPUs
2019-11-06 07:27:33 +01:00
wjc404
836c414e22
optimizations of software prefetching
2019-11-05 13:36:56 +08:00
Martin Kroeker
d403eb3c2f
Merge pull request #2302 from martin-frbg/ppc970
...
Disable three-operand DCBT on PPC970 regardless of operating system
2019-11-04 22:55:05 +01:00
Martin Kroeker
3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be
...
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
2019-11-04 22:54:28 +01:00
Martin Kroeker
9955f0996f
Merge pull request #2294 from martin-frbg/ios-cleanup
...
Remove obsolete workarounds for IOS on ARMV8
2019-11-04 22:53:58 +01:00
wjc404
430c11e135
Add files via upload
2019-11-04 20:10:12 +08:00
wjc404
fbacd2605d
optimizations via software prefetches
2019-11-04 19:37:19 +08:00
Martin Kroeker
6fa89b06a1
Use the two-operand form of DCBT on all PPC970 regardless of OS
...
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well
2019-11-03 22:55:31 +01:00
Martin Kroeker
68597002ea
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:42:46 +01:00
Martin Kroeker
d2a6285549
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:41:19 +01:00
Martin Kroeker
d999688d1a
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:39:06 +01:00
Martin Kroeker
928fe1b28e
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:37:27 +01:00
Martin Kroeker
ccc28c6d60
Merge pull request #13 from xianyi/develop
...
resync with upstream
2019-11-03 22:33:31 +01:00
wjc404
ae43b75a6a
Add files via upload
2019-11-02 10:09:19 +08:00
wjc404
54fc06fd70
Add files via upload
2019-11-02 10:06:13 +08:00
wjc404
1df9a2013d
new sgemm kernel for skylakex
2019-11-02 00:00:48 +08:00
wjc404
274ff5cdb8
update sgemm_q on skylakex cpus
2019-11-01 23:59:18 +08:00
Martin Kroeker
eb2eddf241
Merge pull request #2296 from kdunee/develop
...
Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty
2019-10-28 13:24:18 +01:00
k.dunikowski
8691825944
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty
2019-10-28 08:51:05 +01:00
Martin Kroeker
7dc8a76f60
Merge pull request #2293 from martin-frbg/pr2288
...
Add support for NetBSD by adding it to the existing xBSD conditionals
2019-10-25 23:46:39 +02:00
Martin Kroeker
df857551c0
Remove special parameter set for obsolete IOS/ARMV8 workaround
2019-10-25 23:07:00 +02:00
Martin Kroeker
85ccdce8c4
Remove the IOS fallbacks to generic C kernels
2019-10-25 23:02:37 +02:00
Martin Kroeker
aeabe0a83f
Fix regex to parse -R options with and without whitespace
...
Both forms are seen on NetBSD (#2288 )
2019-10-25 22:52:30 +02:00
Martin Kroeker
1b90989662
Add NetBSD to the xBSD conditionals
2019-10-25 12:52:49 +02:00
Martin Kroeker
e3e8b5cdca
Add NetBSD
2019-10-25 12:51:06 +02:00
Martin Kroeker
69b16a894d
Merge pull request #2292 from martin-frbg/g95fixes
...
Improve support for g95 and non-GNU ld
2019-10-25 10:35:17 +02:00
Martin Kroeker
6782e5767d
Merge pull request #2291 from martin-frbg/gensymbol
...
Fix netlib 3.7/3.8 function enumeration for linktest
2019-10-25 10:34:50 +02:00
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
...
Optimize RPCC function on ARM64
2019-10-25 09:56:30 +02:00
Martin Kroeker
4ae1610f37
Merge pull request #2290 from martin-frbg/cpuidfixes
...
Fixup x86 cpuid changes from #2283
2019-10-24 22:52:15 +02:00
Martin Kroeker
911c3e2f4b
Improve support for g95 and non-GNU ld
...
Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check
2019-10-24 22:43:27 +02:00
Martin Kroeker
fab49e49e5
Move most lapack 3.7/3.8 additions to the embedded_underscores list
...
to allow linktest to pass with a compiler that adds a second underscore to such names
2019-10-24 21:26:20 +02:00
Martin Kroeker
b687fba5bc
Disable direct clock register access on IOS and Android
...
as I find conflicting information on accessibility from non-priviledged processes
2019-10-24 21:18:17 +02:00
luzpaz
46a8c2519a
Remove prototype of unused, unimplemented function ( #2274 )
...
* Fix source typo
Found via `codespell -q 3 -L amin,als,ba,dum,mone,nd,nto,orign -S Changelog.txt,./lapack*`
* Remove beta-thread function per request
2019-10-24 18:56:53 +02:00
Martin Kroeker
e9437eebd2
Restore Goldmont ID and improve QEMU support
...
#2283 had inadvertently removed Goldmont+, and cpuid was reporting a mix of Core2 and Pentium2 for some QEMU configurations
2019-10-24 18:45:27 +02:00
Martin Kroeker
3a39062cfc
Merge pull request #12 from xianyi/develop
...
resync with upstream
2019-10-24 18:40:13 +02:00
Martin Kroeker
eaa0be1313
Merge pull request #2286 from wjc404/develop
...
AVX512 DGEMM kernel
2019-10-20 12:44:19 +02:00
wjc404
6ff013bae0
native support for icopy_4
...
90% MKL 1-thread performance.
2019-10-19 03:54:44 +08:00
wjc404
0d669e04bb
Update dgemm_kernel_8x8_skylakex.c
2019-10-18 15:00:17 +08:00
wjc404
17cdd9f9e1
some correction
2019-10-18 14:58:07 +08:00