Commit Graph

3887 Commits

Author SHA1 Message Date
Martin Kroeker 56837e9d92 Make local labels in macro compatible with the xcode assembler
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
2019-10-04 14:53:23 +02:00
Martin Kroeker bb5413863f Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios 2019-10-04 14:50:03 +02:00
Martin Kroeker c0d570a357 Merge pull request #7 from xianyi/develop
update
2019-09-27 00:42:32 +02:00
Martin Kroeker 6b83079368 Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker 673e5a0495 Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263)
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy

To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0

* Use gcc-generated assembly instead of original C sources

to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3

* Use gcc-generated assembly instead of the original C source

to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3

* Add gcc7-generated assembler version of caxpy for power8

to work around wrong code generated by gcc 8.3

* Handle CONJ define for caxpyc

* Handle CONJ define for caxpyc

* Add gcc7-generated assembly cdot for POWER9

* Use prebuilt assembly for POWER9 cdot

created with gcc 7.3.1 to work around ICE in older gcc versions

* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6

* Update Makefile.system

* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH

* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Martin Kroeker bfa2cc7d64 Restore ppc64 CI job and remove the travis_wait that caused the problem with it 2019-09-20 10:29:35 +02:00
Martin Kroeker e7c4d6705a Revert #2051 and replace with a better fix (#2261)
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Martin Kroeker 2a1911cc14 Merge pull request #6 from xianyi/develop
update to current develop
2019-09-13 14:00:23 +02:00
Martin Kroeker 9f7a9a32e3 Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel 5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00
Guillaume Horel 6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel 459bb9291d fix error codes 2019-09-10 17:10:33 -04:00
Martin Kroeker 3f1077ce6f Merge pull request #2249 from brada4/gcc7minor
Address minor warnings popping up in gcc7+
2019-09-10 08:27:32 +02:00
Martin Kroeker eb45eb6942 Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Guillaume Horel f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel 5997b6b491 bugfix 2019-09-08 11:14:49 -04:00
Guillaume Horel 4b21b646ea turn on optimized code 2019-09-08 11:14:49 -04:00
Guillaume Horel 7ec7b999a5 add missing file 2019-09-08 11:14:49 -04:00
Guillaume Horel af9ac0898a fix Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel c7b5a459b6 add missing defines and headers 2019-09-08 11:14:49 -04:00
Guillaume Horel 9b2f0323d6 update Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel 9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel 42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel 733d97b2df add files 2019-09-08 11:14:49 -04:00
Guillaume Horel ea747cf933 start working on ?trtrs 2019-09-08 11:14:49 -04:00
Andrew 4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew 6e9a93ec19 init 2019-09-07 10:18:46 +03:00
Martin Kroeker fde8a8e6a0 Improve cmake build behaviour with non-host cpu targets (#2246)
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker 256fc15f5f Merge pull request #2 from xianyi/develop
update
2019-09-03 15:12:14 +02:00
Martin Kroeker ee498525e0 Merge pull request #2242 from martin-frbg/issue2235
Add arch data for cmake cross-compiling to CORE2
2019-09-02 22:06:29 +02:00
Martin Kroeker 1fec0570f6 Add cgemm and zgemm unroll factors for core2 2019-09-02 15:03:45 +02:00
Martin Kroeker b5af7b9c78 Disable ppc64le test environment on Travis CI
as this semi-official beta option has suddenly reverted to a standard x86_64 environment causing spurious failures
2019-08-31 18:06:12 +02:00
Martin Kroeker f3c314550c Merge pull request #2243 from quickwritereader/develop
possible cgemv,caxpy,cdot fix
2019-08-30 23:06:23 +02:00
AbdelRauf 847c20c9b7 fix uninitialized variables i 2019-08-30 11:14:55 +00:00
AbdelRauf 4c22828812 caxpy and cdot are using vec_vsx_ld 2019-08-30 04:09:15 +00:00
AbdelRauf e79712d969 cgemv using vec_vsx_ld instead of letting gcc to decide 2019-08-30 02:52:04 +00:00
AbdelRauf be09551cdf aligned 2019-08-29 23:22:23 +00:00
Martin Kroeker ec1ef6aa9e Merge pull request #2241 from martin-frbg/zdotfix
Make x86_64 zdot compile with PGI and Sun C again
2019-08-29 07:12:54 +02:00
Martin Kroeker 11c59acfb1 Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows 2019-08-28 18:07:44 +02:00
Martin Kroeker bf0d92a310 Add arch data for cross-compiling to CORE2
for #2235
2019-08-28 17:35:56 +02:00
Martin Kroeker db066151ee Merge pull request #2240 from martin-frbg/issue2237
Fix PGI build options (again)
2019-08-28 15:30:53 +02:00
Martin Kroeker 3a55dca2dc Make x86_64 zdot compile with PGI and Sun C again
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
2019-08-28 11:35:31 +02:00
Martin Kroeker 7d380f7d79 Fix PGI build options (again)
for #2237
2019-08-28 11:31:20 +02:00
Martin Kroeker 300f158d3b Merge pull request #2239 from martin-frbg/issue2231
Fix 32bit armv8 compilation regression
2019-08-28 07:54:57 +02:00
Martin Kroeker 3635fdbf2b Do not abuse the global ARCH variable as a local temporary
Setting it with a simple "uname -m" just to be able to decide whether to compile getarch.c with -march=native
may actually keep getarch from doing a proper probe. Fixes #2231, a regression caused by #2110
2019-08-27 22:52:17 +02:00
Martin Kroeker b6552b11eb Merge pull request #2 from xianyi/develop
merge develop
2019-08-27 22:41:31 +02:00
Martin Kroeker 5fdf9ad24f Merge pull request #2228 from martin-frbg/issue2227
Add Intel Goldmont Plus CPUID
2019-08-19 18:26:51 +02:00
Martin Kroeker 2fe967c542 Merge branch 'develop' into issue2227 2019-08-19 14:20:39 +02:00
Martin Kroeker 6d8595351c Add Intel Goldmont Plus CPUID
fixes #2227
2019-08-19 14:19:21 +02:00