Commit Graph

1390 Commits

Author SHA1 Message Date
Martin Kroeker
2271c3506b Work around excessive LAPACK test failures on Skylake-X
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
2020-05-09 23:49:18 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc Fix cmake compilation issue - POWER9
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Ashwin Sekhar T K
8353cb245a ARM64: Improve DAXPY for ThunderX2
Improve performance of DAXPY for ThunderX2
when the vector fits in L1 Cache.
2020-05-07 09:22:50 -07:00
Martin Kroeker
90dba9f716 Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
As discussed on the original PR #2329, the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
2020-05-05 10:44:50 +02:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
06208c8d01 Limit this fix to ELFv2 builds 2020-04-22 14:16:40 +02:00
Martin Kroeker
f5c4c28b98 Work around POWER8BE bugs on FreeBSD (ELFv2)
for #2299
2020-04-21 17:17:17 +02:00
Martin Kroeker
fa42588e1f Merge pull request #2565 from martin-frbg/mips24k
Support MIPS32 24K family as P5600
2020-04-20 17:13:53 +02:00
Martin Kroeker
e55ec82bb9 Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
Martin Kroeker
7353ea5afc Delete KERNEL.24K 2020-04-19 15:44:19 +02:00
Martin Kroeker
6a04efb122 Rename KERNEL files to include MIPS prefix 2020-04-19 15:43:54 +02:00
Martin Kroeker
d712ea724c Add MIPS24K support 2020-04-18 21:10:18 +02:00
Rajalakshmi Srinivasaraghavan
22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16 Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan
a87793e03c Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan
ff010f496e Build shgemm for all architecture 2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
5b0093b5fe Convert aligned moves to unaligned
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
2020-04-13 14:58:52 +02:00
Martin Kroeker
e9bfa2291a Fix parameter overflow 2020-04-12 19:47:02 +02:00
gxw
8d07cf9b67 Fix compilation problem on loongson platform
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Martin Kroeker
806f89166e Make ARMV7 compile with xcode and add a CI job for it (#2537)
* Add an ARMV7 iOS build on Travis

* thread_local appears to be unavailable on ARMV7 iOS

* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH

* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
c6af9bbb32 Merge pull request #2534 from martin-frbg/issue2496
Fix zero initialization for beta=0 case
2020-03-31 20:53:13 +02:00
Martin Kroeker
144be81ca1 fix initialization to zero in the NEON SGEMM_BETA kernel as well 2020-03-31 16:53:56 +02:00
Martin Kroeker
07cdd5d05c Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
2020-03-31 00:21:02 +02:00
Martin Kroeker
567d2760e6 Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
wjc404
b8307768e2 Add files via upload 2020-03-21 05:42:10 +08:00
Martin Kroeker
af8a619e1f Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
2020-03-17 10:12:53 +01:00
wjc404
62b9608986 Update KERNEL.SKYLAKEX 2020-03-17 12:52:55 +08:00
Martin Kroeker
a1b181cea2 Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
2020-03-16 21:58:34 +01:00
wjc404
cdc0e9011e Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
wjc404
fa049d49c2 AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
s00548429
bec7923a0d Fix the functional bugs for zamax. 2020-03-09 15:36:50 +08:00
Rajalakshmi Srinivasaraghavan
2afc074803 Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks.  This patch fixes some of the macros that are used
to check compiler version.  On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
Martin Kroeker
4f371b0fbf Use POWER8 kernels on big-endian POWER9 for now 2020-03-01 23:45:58 +01:00
Martin Kroeker
ea8eec5d17 Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
wjc404
dd22eb7621 Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
wjc404
2352331e60 Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
wjc404
1b980001dd Update zgemm_kernel_4x2_haswell.c 2020-02-26 18:38:12 +08:00
wjc404
2515e1152f Update cgemm_kernel_8x2_haswell.c 2020-02-26 18:36:54 +08:00
Martin Kroeker
ddcbed6690 Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
2020-02-25 18:42:52 +01:00
wjc404
903854c168 Add files via upload 2020-02-22 23:40:02 +08:00
wjc404
a2ff577a30 Update KERNEL.ZEN 2020-02-22 23:39:43 +08:00
wjc404
97a32cb0a5 Update KERNEL.HASWELL 2020-02-22 23:39:20 +08:00
Martin Kroeker
07454bf4d5 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:58:15 +01:00
Martin Kroeker
4046985913 Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:55:52 +01:00
Martin Kroeker
e57b11acca Add preliminary support for EMAG8180 2020-02-19 19:00:28 +01:00
Martin Kroeker
0b39cf95b0 Fix endianness conditionals 2020-02-19 18:09:54 +01:00
Martin Kroeker
9f39f0a2c3 Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test -  Makefile.L1 defaults look wrong
2020-02-17 19:55:39 +01:00
Martin Liska
aeea14ee40 Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S. 2020-02-17 09:01:53 +01:00