Commit Graph

4056 Commits

Author SHA1 Message Date
Martin Kroeker f2cde2ccfb
Update common_arm64.h 2019-10-08 20:12:08 +02:00
Martin Kroeker ba7838d2e1
Merge pull request #2280 from martin-frbg/iosfix
Add overlooked part of IOS compilation fix
2019-10-08 10:25:25 +02:00
Martin Kroeker a448884a63
Remove automatic label postfixes from macro included only once 2019-10-08 08:37:50 +02:00
Martin Kroeker 17609f88f1
Merge pull request #11 from xianyi/develop
sync with upstream
2019-10-08 08:32:52 +02:00
Martin Kroeker 3a2df19db6
Fix accidental duplication of jump instruction 2019-10-08 08:09:26 +02:00
Martin Kroeker d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275
Rewrite ARMV8 code to allow cross-compilation for IOS
2019-10-06 23:01:54 +02:00
Martin Kroeker aa04b0925e
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-06 11:12:44 +02:00
Martin Kroeker 258ac56e0a
Move 32bit OSX build back to xcode 8.3 but switch to gcc8 2019-10-05 10:52:47 +02:00
Martin Kroeker 56837e9d92
Make local labels in macro compatible with the xcode assembler
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
2019-10-04 14:53:23 +02:00
Martin Kroeker bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios 2019-10-04 14:50:03 +02:00
Martin Kroeker 32f5907fef
Update 32bit macOS again to xcode 9.3
os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source
2019-10-03 01:09:02 +02:00
Martin Kroeker ac10236cc8
Update the OSX BINARY=32 test to xcode9.2
in response to Homebrew updates
2019-10-02 22:35:34 +02:00
Martin Kroeker 8617d75548
Revert "Avoid taking root of negative number in symv_thread.c" 2019-10-01 23:50:41 +02:00
Martin Kroeker c07d78b9e9
Merge pull request #2272 from seberg/thread-sqrt-of-negative
Avoid taking root of negative number in symv_thread.c
2019-09-30 11:27:29 +02:00
Sebastian Berg 6355c25dde Avoid taking root of negative number in symv_thread.c
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
2019-09-29 22:03:12 -07:00
Martin Kroeker 5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix
fixed bug power9 strmm . BLAS-TESTER passes
2019-09-29 13:53:45 +02:00
AbdelRauf ede5efebab trmm fix 2019-09-29 02:28:34 +00:00
Martin Kroeker 84908d60d2
Merge pull request #2269 from martin-frbg/ppc-fixes
Ppc fixes
2019-09-27 09:52:19 +02:00
Martin Kroeker 596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc 2019-09-27 00:47:18 +02:00
Martin Kroeker 7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel 2019-09-27 00:44:26 +02:00
Martin Kroeker c0d570a357
Merge pull request #7 from xianyi/develop
update
2019-09-27 00:42:32 +02:00
Martin Kroeker 6b83079368
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker 673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263)
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy

To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0

* Use gcc-generated assembly instead of original C sources

to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3

* Use gcc-generated assembly instead of the original C source

to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3

* Add gcc7-generated assembler version of caxpy for power8

to work around wrong code generated by gcc 8.3

* Handle CONJ define for caxpyc

* Handle CONJ define for caxpyc

* Add gcc7-generated assembly cdot for POWER9

* Use prebuilt assembly for POWER9 cdot

created with gcc 7.3.1 to work around ICE in older gcc versions

* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6

* Update Makefile.system

* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH

* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Martin Kroeker bfa2cc7d64
Restore ppc64 CI job and remove the travis_wait that caused the problem with it 2019-09-20 10:29:35 +02:00
Martin Kroeker e7c4d6705a
Revert #2051 and replace with a better fix (#2261)
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Martin Kroeker 2a1911cc14
Merge pull request #6 from xianyi/develop
update to current develop
2019-09-13 14:00:23 +02:00
Martin Kroeker 9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel 5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00
Guillaume Horel 6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel 459bb9291d fix error codes 2019-09-10 17:10:33 -04:00
Martin Kroeker 3f1077ce6f
Merge pull request #2249 from brada4/gcc7minor
Address minor warnings popping up in gcc7+
2019-09-10 08:27:32 +02:00
Martin Kroeker eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Guillaume Horel f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel 5997b6b491 bugfix 2019-09-08 11:14:49 -04:00
Guillaume Horel 4b21b646ea turn on optimized code 2019-09-08 11:14:49 -04:00
Guillaume Horel 7ec7b999a5 add missing file 2019-09-08 11:14:49 -04:00
Guillaume Horel af9ac0898a fix Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel c7b5a459b6 add missing defines and headers 2019-09-08 11:14:49 -04:00
Guillaume Horel 9b2f0323d6 update Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel 9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel 42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel 733d97b2df add files 2019-09-08 11:14:49 -04:00
Guillaume Horel ea747cf933 start working on ?trtrs 2019-09-08 11:14:49 -04:00
Andrew 4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew 6e9a93ec19 init 2019-09-07 10:18:46 +03:00
Martin Kroeker fde8a8e6a0
Improve cmake build behaviour with non-host cpu targets (#2246)
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker 256fc15f5f
Merge pull request #2 from xianyi/develop
update
2019-09-03 15:12:14 +02:00
Martin Kroeker ee498525e0
Merge pull request #2242 from martin-frbg/issue2235
Add arch data for cmake cross-compiling to CORE2
2019-09-02 22:06:29 +02:00
Martin Kroeker 1fec0570f6
Add cgemm and zgemm unroll factors for core2 2019-09-02 15:03:45 +02:00