Martin Kroeker
4250e6ed64
Merge pull request #2191 from tylerjereddy/conditional_updates
...
MAINT: remove legacy CMake endif()
2019-07-23 16:20:39 +02:00
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
...
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
2019-07-23 16:15:08 +02:00
Martin Kroeker
d14cf1ccf4
Merge pull request #2189 from wjc404/develop
...
Update dgemm_kernel_4x8_haswell.S for reducing cache misses
2019-07-23 08:32:56 +02:00
Tyler Reddy
3f6ab1582a
MAINT: remove legacy CMake endif()
...
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
2019-07-22 21:24:57 -06:00
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
...
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
2019-07-22 08:28:16 +02:00
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
2019-07-21 01:10:32 +08:00
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
2019-07-21 00:47:45 +08:00
wjc404
9440fa607d
Add files via upload
2019-07-20 22:08:22 +08:00
wjc404
94db259e5b
Add files via upload
2019-07-20 22:04:41 +08:00
wjc404
f49f8047ac
Add files via upload
2019-07-20 14:33:37 +08:00
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
2019-07-19 23:58:24 +08:00
wjc404
9c89757562
Add files via upload
2019-07-19 23:47:58 +08:00
Martin Kroeker
b0b7600bef
Merge pull request #2186 from wjc404/develop
...
Update "dgemm_kernel_4x8_haswell.S" for improving performance on zen2 chips
2019-07-18 16:04:44 +02:00
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
2019-07-17 23:50:03 +08:00
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
2019-07-17 23:47:30 +08:00
wjc404
211ab03b14
Update dgemm_kernel_4x8_haswell.S
2019-07-17 22:39:15 +08:00
wjc404
1733f927e6
Update dgemm_kernel_4x8_haswell.S
2019-07-17 21:27:41 +08:00
wjc404
182b06d6ad
Update dgemm_kernel_4x8_haswell.S
2019-07-17 17:02:35 +08:00
wjc404
7a9050d681
Update dgemm_kernel_4x8_haswell.S
2019-07-17 00:55:06 +08:00
wjc404
0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
...
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
2019-07-17 00:46:51 +08:00
Martin Kroeker
bafa021ed6
Merge pull request #2181 from isuruf/install_name
...
Change install_name on osx to match linux
2019-07-09 20:08:52 +02:00
Isuru Fernando
b89d9762a2
Change install_name on osx to match linux
2019-07-08 17:14:35 -05:00
Martin Kroeker
08dedf4c5e
Merge pull request #2177 from martin-frbg/noaff
...
Fix surprising behaviour of NO_AFFINITY=0
2019-07-07 18:28:21 +02:00
Martin Kroeker
b89c781637
Fix surprising behaviour of NO_AFFINITY=0
2019-07-07 16:04:45 +02:00
Martin Kroeker
dd7ff77f4b
Merge pull request #2175 from martin-frbg/cmake-mingw-fixes
...
Fix CMAKE compilation with MinGW32 and add it to Appveyor
2019-07-06 18:07:19 +02:00
Martin Kroeker
8fb76134bc
Mingw32 needs leading underscore on object names
...
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
2019-07-06 15:07:15 +02:00
Martin Kroeker
04d671aae2
Make disabling DYNAMIC_ARCH on unsupported systems work
...
needs to be unset in the cache for the change to have any effect
2019-07-06 15:05:04 +02:00
Martin Kroeker
f69a0be712
Add getarch flags to disable AVX on x86
...
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Martin Kroeker
ae9e8b131e
Add mingw builds to Appveyor config
2019-07-06 14:30:33 +02:00
Martin Kroeker
9086543f50
Utest needs CBLAS but not necessarily FORTRAN
2019-07-06 14:29:47 +02:00
Martin Kroeker
abea977ded
Merge pull request #2162 from martin-frbg/pgi
...
Fixes for PGI compiler
2019-07-03 19:16:30 +02:00
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
...
power9 cgemm/ctrmm. new sgemm 8x16
2019-07-01 21:06:02 +02:00
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
2019-07-01 14:07:54 +00:00
Martin Kroeker
2f13f04224
Merge pull request #2170 from pkubaj/patch-1
...
Fix build on PPC970 for FreeBSD
2019-06-30 23:29:02 +02:00
pkubaj
7c7505a778
Fix build for PPC970 on FreeBSD pt.2
...
FreeBSD needs those macros too.
2019-06-28 10:31:45 +00:00
pkubaj
5a4f1a2118
Fix build for PPC970 on FreeBSD pt. 1
...
FreeBSD needs DCBT_ARG=0 as well.
2019-06-28 10:29:44 +00:00
Martin Kroeker
3b761892df
Merge pull request #2169 from pkubaj/develop
...
Fix build on FreeBSD/powerpc64.
2019-06-25 12:56:33 +02:00
Piotr Kubaj
eebfeba768
Fix build on FreeBSD/powerpc64.
...
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
2019-06-25 10:58:56 +02:00
Martin Kroeker
7684c4f8f8
PGI compiler does not like -march=native
2019-06-20 19:56:01 +02:00
Martin Kroeker
7faf42b7bb
Merge pull request #2167 from kavanabhat/dtrmm_power8_segfault
...
Fix DTRMMKERNEL register save for power8 64-bit mode (Fix for #2166 )
2019-06-19 14:38:01 +02:00
kavanabhat
a575f1e4c7
Update dtrmm_kernel_16x4_power8.S
2019-06-19 15:27:14 +05:30
AbdelRauf
cdbfb891da
new sgemm 8x16
2019-06-17 15:33:38 +00:00
Martin Kroeker
280552b988
Fix mov syntax
2019-06-16 18:35:43 +02:00
Martin Kroeker
bbd4bb0154
Zero ecx with a mov instruction
...
PGI assembler does not like the initialization in the constraints.
2019-06-16 15:04:10 +02:00
Martin Kroeker
6d3efb2b58
Update Makefile.x86_64
2019-06-14 08:08:11 +02:00
Martin Kroeker
d9ff2cd90d
Do not force gcc options on non-gcc compilers
...
fixes compile failure with pgi 18.10 as reported on OpenBLAS-users
2019-06-13 23:01:35 +02:00
Martin Kroeker
2a43062de7
Merge pull request #2159 from martin-frbg/issue2149
...
Avoid unintentional activation of TLS codepath via USE_TLS=0
2019-06-10 19:12:45 +02:00
Martin Kroeker
4ea794a522
Avoid unintentional activation of TLS code via USE_TLS=0
...
fixes #2149
2019-06-10 17:24:15 +02:00
Martin Kroeker
ece0bfb881
Merge pull request #2158 from martin-frbg/issue2143
...
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
2019-06-10 14:08:11 +02:00
Martin Kroeker
1f4b6a5d5d
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
...
from #2143 , -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
2019-06-10 09:50:13 +02:00