Commit Graph

5577 Commits

Author SHA1 Message Date
Martin Kroeker
efa72a631b Merge pull request #17 from xianyi/develop
rebase
2021-03-14 17:20:49 +01:00
Martin Kroeker
30d835168a Merge pull request #3088 from xoviat/msvc
add misc fixes.
2021-03-14 17:14:28 +01:00
Martin Kroeker
8f6a744807 Merge pull request #3141 from martin-frbg/nagfor-2
Leave out ARM64 march/mtune options when compiling with nagfor
2021-03-13 23:04:53 +01:00
Martin Kroeker
6726771645 Support compilation with NAG fortran 2021-03-13 20:16:18 +01:00
Martin Kroeker
a51cae6b2e Merge pull request #3140 from martin-frbg/issue3139
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
2021-03-12 15:35:58 +01:00
Martin Kroeker
d30b943251 Merge pull request #3138 from martin-frbg/nagfor
Add support for compilation with the NAG Fortran compiler
2021-03-12 12:46:19 +01:00
Martin Kroeker
0934568d9c Move includes under the ifdef for compilers w/o intrinsics support 2021-03-12 12:42:05 +01:00
Martin Kroeker
697e64bbb6 Fix syntax 2021-03-11 23:03:58 +01:00
Martin Kroeker
bffb9b0e95 Merge pull request #3136 from austinpagan/Gemm.PQ
Modifying a couple parameters in the "POWER10"-specific section of pa…
2021-03-11 15:17:48 +01:00
Martin Kroeker
6ae7af78a3 Support compilation with nagfor 2021-03-11 11:53:51 +01:00
Martin Kroeker
041a26fd79 Support compilation with nagfor 2021-03-11 11:52:29 +01:00
Martin Kroeker
3c356b1a1f Support compilation with the NAG Fortran compiler 2021-03-11 11:51:09 +01:00
Martin Kroeker
b1215f2f8c Merge pull request #16 from xianyi/develop
rebase
2021-03-11 11:48:37 +01:00
Martin Kroeker
0b73041b16 Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
2021-03-11 07:18:05 +01:00
austinpagan
9579bd47e5 Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM. 2021-03-10 18:19:12 -05:00
Rajalakshmi Srinivasaraghavan
09d47af2c0 Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-10 17:15:33 -06:00
Martin Kroeker
ef0238ba2b Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
2021-03-06 19:15:53 +01:00
Martin Kroeker
a9f6f7ad39 Remove spurious AVX512 requirement and add AVX2/FMA3 guard 2021-03-06 14:35:49 +01:00
Martin Kroeker
1d254d321b Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
2021-03-06 09:13:59 +01:00
Rajalakshmi Srinivasaraghavan
41646ed006 Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-05 16:22:36 -06:00
Martin Kroeker
3679781872 Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
2021-03-02 21:27:21 +01:00
Martin Kroeker
38dcf3454b Support timing Apple M1 2021-03-02 17:50:55 +01:00
Martin Kroeker
e34d57ca90 Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
2021-03-02 09:58:40 +01:00
Martin Kroeker
20f492c298 Fix AMD AOCC compiler detection 2021-03-01 21:00:10 +01:00
Martin Kroeker
c7c82be1c3 Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
2021-02-28 22:13:09 +01:00
Martin Kroeker
9564f688c4 Adjust build rules for ?chkee.F 2021-02-28 18:57:05 +01:00
Martin Kroeker
90c1776c86 Adjust build rules for ?chkee.F 2021-02-28 18:53:20 +01:00
Martin Kroeker
9cf861e8fa Add rewritten cchkee.F from Reference-LAPACK PR335 2021-02-28 18:51:03 +01:00
Martin Kroeker
9b7b1da133 Add rewritten dchkee.F from Reference-LAPACK PR335 2021-02-28 18:50:26 +01:00
Martin Kroeker
a5ab891292 Add rewritten schkee.F from Reference-LAPACK PR335 2021-02-28 18:49:50 +01:00
Martin Kroeker
90bb4ac821 Add rewritten zchkee.F from Reference-LAPACK PR335 2021-02-28 18:49:10 +01:00
Martin Kroeker
23a0d1bc1f Delete zchkee.f 2021-02-28 18:47:06 +01:00
Martin Kroeker
0e96c378fd Delete schkee.f 2021-02-28 18:46:52 +01:00
Martin Kroeker
ee16efff3c Delete dchkee.f 2021-02-28 18:46:38 +01:00
Martin Kroeker
0197519dd7 Delete cchkee.f 2021-02-28 18:46:08 +01:00
Martin Kroeker
865829cfac Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
2021-02-27 19:15:49 +01:00
Rajalakshmi Srinivasaraghavan
0571c3187b POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.

Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
2021-02-26 20:56:34 -06:00
Martin Kroeker
d12a2d0d04 Merge pull request #3120 from martin-frbg/3118-x
Fix use of undefined CC variable in f_check
2021-02-26 11:50:47 +01:00
Martin Kroeker
2d369bd916 fix undefined CC variable 2021-02-26 09:09:43 +01:00
Martin Kroeker
93843c55b6 Merge pull request #15 from xianyi/develop
rebase
2021-02-26 09:06:25 +01:00
Martin Kroeker
e3a6132e12 Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:33 +01:00
Martin Kroeker
736f0146c3 Revert "Fix undefined CC in f_check (again)" 2021-02-26 04:18:04 +01:00
Martin Kroeker
897fc2b6ef Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
2021-02-25 13:48:41 +01:00
Martin Kroeker
441c116105 fix undefined CC again 2021-02-25 13:47:34 +01:00
Martin Kroeker
8ecd80a34a Merge pull request #14 from xianyi/develop
rebase
2021-02-25 13:45:27 +01:00
Martin Kroeker
4ba53db0da Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
2021-02-24 18:39:28 +01:00
Martin Kroeker
6c365ff648 Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
2021-02-24 18:38:25 +01:00
Martin Kroeker
e33bcdbb7b Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
2021-02-24 18:37:36 +01:00
Harmen Stoppels
ec6b354c32 use /usr/bin/env perl 2021-02-24 14:07:20 +01:00
Martin Kroeker
292d1af1a0 Update omatcopy_rt.c 2021-02-24 09:34:14 +01:00