Rajalakshmi Srinivasaraghavan
8efba9b7c0
Improve shgemm test
...
This patch adds another check to test shgemm results.
2020-05-11 17:15:10 -05:00
Martin Kroeker
4fffa556d8
Merge pull request #2611 from RajalakshmiSR/bench_half
...
Include shgemm in benchtest
2020-05-11 21:08:41 +02:00
Rajalakshmi Srinivasaraghavan
ce90e2bd3f
Include shgemm in benchtest
...
This patch is to enable benchtest for half precision gemm
when BUILD_HALF is set during make.
2020-05-11 09:57:46 -05:00
Martin Kroeker
948b6712ba
Merge pull request #2610 from martin-frbg/issue2552-3
...
Temporary workaround for excessive LAPACK test failures with COMPLEX on Skylake-X
2020-05-10 13:10:31 +02:00
Martin Kroeker
2271c3506b
Work around excessive LAPACK test failures on Skylake-X
...
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
2020-05-09 23:49:18 +02:00
Martin Kroeker
db00b21445
Merge pull request #2609 from martin-frbg/issue2552-2
...
Correct ifort options
2020-05-09 21:33:02 +02:00
Martin Kroeker
58d26b4448
Correct ifort options
...
to same as suggested by reference-lapack
2020-05-09 17:15:36 +02:00
Martin Kroeker
8e47d14053
Merge pull request #2608 from martin-frbg/issue2604
...
Handle trailing whitespace and empty variables in KERNEL files
2020-05-09 16:36:14 +02:00
Martin Kroeker
cd10b35fe9
Handle trailing spaces and empty condition variables
2020-05-09 13:42:33 +02:00
Martin Kroeker
9472dd99cd
Merge pull request #57 from xianyi/develop
...
rebase
2020-05-09 13:20:44 +02:00
Martin Kroeker
7181665452
Merge pull request #2605 from RajalakshmiSR/cmake-power
...
Fix cmake compilation issue - POWER9
2020-05-09 11:29:28 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
...
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Martin Kroeker
63e45def70
Merge pull request #2603 from martin-frbg/issue2552
...
Add FFLAGS_DRV entry to the generated make.inc to fix lapack-test failure with Intel compilers
2020-05-08 22:08:39 +02:00
Martin Kroeker
ec0f228632
Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort
...
fixes #2552
2020-05-08 18:06:12 +02:00
Martin Kroeker
90e2941c61
Merge pull request #56 from xianyi/develop
...
rebase
2020-05-07 22:43:48 +02:00
Martin Kroeker
10d5f3c87b
Merge pull request #2602 from ashwinyes/thunderx2_develop
...
DAXPY Optimizations for ThunderX2
2020-05-07 22:06:41 +02:00
Ashwin Sekhar T K
8353cb245a
ARM64: Improve DAXPY for ThunderX2
...
Improve performance of DAXPY for ThunderX2
when the vector fits in L1 Cache.
2020-05-07 09:22:50 -07:00
Martin Kroeker
ec2dd7b875
Merge pull request #2601 from martin-frbg/issue818
...
Undefine NAME/CNAME etc in Makefile.system before defining them
2020-05-07 10:12:33 +02:00
Martin Kroeker
4e82eb9f8a
Undefine ASMNAME/NAME/CNAME before defining them
...
to avoid redefinition warning when environment variables like CFLAGS are being used (fixes #818 )
2020-05-07 00:31:32 +02:00
Martin Kroeker
61300bb735
Merge pull request #55 from xianyi/develop
...
rebase
2020-05-07 00:27:14 +02:00
Martin Kroeker
33e9b12464
Merge pull request #2597 from martin-frbg/appleclang
...
Use Clang 9.0.0 miscompilation fix for corresponding AppleClang version as well
2020-05-05 13:55:08 +02:00
Martin Kroeker
90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
...
As discussed on the original PR #2329 , the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
2020-05-05 10:44:50 +02:00
Martin Kroeker
424d551e01
Merge pull request #53 from xianyi/develop
...
rebase
2020-05-01 15:18:46 +02:00
Martin Kroeker
596f5df9e8
Merge pull request #2591 from RajalakshmiSR/testhalf
...
Add test for shgemm
2020-05-01 09:59:39 +02:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
a54e35e780
Merge pull request #2586 from martin-frbg/miscfixes
...
Trivial fix for compiler warnings
2020-04-29 22:01:41 +02:00
Rajalakshmi Srinivasaraghavan
564b0d39ef
Add test for shgemm
...
This patch has Makefile changes to add test for shgemm which
compares sgemm and shgemm result.
2020-04-29 13:40:34 -05:00
Martin Kroeker
5d58b11101
Merge pull request #52 from xianyi/develop
...
rebase
2020-04-29 14:36:15 +02:00
Martin Kroeker
d394d4e677
Merge pull request #2585 from martin-frbg/mips64fix
...
Increase default BUFFER_SIZE on MIPS64
2020-04-28 19:47:55 +02:00
Martin Kroeker
f4248af26e
Fix compiler warnings
2020-04-28 10:43:12 +02:00
Martin Kroeker
2d89603e9d
Increase BUFFER_SIZE on mips64 to match SGEMM parameters
2020-04-28 10:40:40 +02:00
Martin Kroeker
26bc15258a
Merge pull request #51 from xianyi/develop
...
rebase
2020-04-28 10:38:50 +02:00
Martin Kroeker
141998dce2
Merge pull request #2584 from martin-frbg/issue2583
...
[WIP] Have CMAKE parse conditional lines in KERNEL files
2020-04-28 10:35:12 +02:00
Martin Kroeker
3bd56846bb
Silence a debug message
2020-04-27 16:27:09 +02:00
Martin Kroeker
e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
...
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker
b6795db731
Merge pull request #2582 from martin-frbg/mips32fix
...
Increase BUFFER_SIZE on MIPS32 to accomodate SGEMM requirements
2020-04-27 09:18:34 +02:00
Martin Kroeker
5e0dbf8dfe
Increase default BUFFER_SIZE to accomodate SGEMM parameters
...
in response to compile-time warning from #2551
2020-04-26 22:21:05 +02:00
Martin Kroeker
955d73127f
Merge pull request #50 from xianyi/develop
...
rebase
2020-04-26 22:17:56 +02:00
Martin Kroeker
a8c1bea7ae
Merge pull request #2581 from martin-frbg/raji
...
Fix travis configuration and update CONTRIBUTORS.md
2020-04-25 19:57:10 +02:00
Martin Kroeker
e43b49e064
Drop the set -e from travis scripts
2020-04-25 16:18:54 +02:00
Martin Kroeker
3e28db7f38
Update CONTRIBUTORS.md
2020-04-25 13:51:44 +02:00
Martin Kroeker
4b69ee31af
Merge pull request #2580 from martin-frbg/issue2538-3
...
Increase POWER8 ZGEMM_R and use same R values for POWER9
2020-04-25 00:28:18 +02:00
Martin Kroeker
03ff213c51
Increase POWER8 ZGEMM_R and use same R values for POWER9
...
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
2020-04-24 21:46:54 +02:00
Martin Kroeker
299d1c8de0
Merge pull request #2578 from martin-frbg/issue2576
...
Quote getarch include paths in prebuild.cmake
2020-04-24 14:32:46 +02:00
Martin Kroeker
70869d571f
Quote include paths for getarch to protect any embedded spaces
2020-04-24 10:30:44 +02:00
Martin Kroeker
cba87222b2
Merge pull request #49 from xianyi/develop
...
rebase
2020-04-24 10:21:48 +02:00
Martin Kroeker
f80dd2151e
xcode 11.4.1 for homebrew ?
2020-04-23 14:31:09 +02:00
Martin Kroeker
4412ee1754
Switch homebrew build env to new xcode 11.4
...
default 11.3.1 in the github image is causing brew to fail with "outdated xcode" message
2020-04-23 10:54:46 +02:00
Martin Kroeker
f6104b68c1
Merge pull request #2571 from martin-frbg/issue2299
...
Work around IDAMAX/IZAMAX bugs on POWER8BE with ELFv2 FreeBSD
2020-04-22 18:27:13 +02:00
Martin Kroeker
84f2c71e93
Merge pull request #2573 from martin-frbg/issue2572
...
Enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 15:04:49 +02:00