Commit Graph

4415 Commits

Author SHA1 Message Date
Martin Kroeker e8a2aed2b9
Support QEMU cpu calling itself 64bit AMD Athlon as well
Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.
2019-10-09 18:24:13 +02:00
Martin Kroeker f262031685
Support QEMU virtual cpu as CORE2
qemu itself claims it is a 64bit P6, which does not exist in the wild.
2019-10-08 22:30:02 +02:00
Martin Kroeker 5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it (#2279)
* Add automatic fixups for OSX/IOS cross-compilation

* Add OSX/IOS cross-compilation test to Travis CI

* Handle platforms that lack hwcap.h by falling back to ARMV8

* Fix PROLOGUE for OSX/IOS
2019-10-08 20:13:14 +02:00
Martin Kroeker f2cde2ccfb
Update common_arm64.h 2019-10-08 20:12:08 +02:00
Martin Kroeker ba7838d2e1
Merge pull request #2280 from martin-frbg/iosfix
Add overlooked part of IOS compilation fix
2019-10-08 10:25:25 +02:00
Martin Kroeker a448884a63
Remove automatic label postfixes from macro included only once 2019-10-08 08:37:50 +02:00
Martin Kroeker 17609f88f1
Merge pull request #11 from xianyi/develop
sync with upstream
2019-10-08 08:32:52 +02:00
Martin Kroeker 3a2df19db6
Fix accidental duplication of jump instruction 2019-10-08 08:09:26 +02:00
Martin Kroeker d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275
Rewrite ARMV8 code to allow cross-compilation for IOS
2019-10-06 23:01:54 +02:00
Martin Kroeker aa04b0925e
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-06 11:12:44 +02:00
Martin Kroeker 258ac56e0a
Move 32bit OSX build back to xcode 8.3 but switch to gcc8 2019-10-05 10:52:47 +02:00
Martin Kroeker 56837e9d92
Make local labels in macro compatible with the xcode assembler
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
2019-10-04 14:53:23 +02:00
Martin Kroeker bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios 2019-10-04 14:50:03 +02:00
Martin Kroeker 32f5907fef
Update 32bit macOS again to xcode 9.3
os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source
2019-10-03 01:09:02 +02:00
Martin Kroeker ac10236cc8
Update the OSX BINARY=32 test to xcode9.2
in response to Homebrew updates
2019-10-02 22:35:34 +02:00
Martin Kroeker 8617d75548
Revert "Avoid taking root of negative number in symv_thread.c" 2019-10-01 23:50:41 +02:00
Martin Kroeker c07d78b9e9
Merge pull request #2272 from seberg/thread-sqrt-of-negative
Avoid taking root of negative number in symv_thread.c
2019-09-30 11:27:29 +02:00
Sebastian Berg 6355c25dde Avoid taking root of negative number in symv_thread.c
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
2019-09-29 22:03:12 -07:00
Martin Kroeker 5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix
fixed bug power9 strmm . BLAS-TESTER passes
2019-09-29 13:53:45 +02:00
AbdelRauf ede5efebab trmm fix 2019-09-29 02:28:34 +00:00
Martin Kroeker 84908d60d2
Merge pull request #2269 from martin-frbg/ppc-fixes
Ppc fixes
2019-09-27 09:52:19 +02:00
Martin Kroeker 596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc 2019-09-27 00:47:18 +02:00
Martin Kroeker 7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel 2019-09-27 00:44:26 +02:00
Martin Kroeker c0d570a357
Merge pull request #7 from xianyi/develop
update
2019-09-27 00:42:32 +02:00
Martin Kroeker 6b83079368
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker 673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263)
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy

To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0

* Use gcc-generated assembly instead of original C sources

to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3

* Use gcc-generated assembly instead of the original C source

to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3

* Add gcc7-generated assembler version of caxpy for power8

to work around wrong code generated by gcc 8.3

* Handle CONJ define for caxpyc

* Handle CONJ define for caxpyc

* Add gcc7-generated assembly cdot for POWER9

* Use prebuilt assembly for POWER9 cdot

created with gcc 7.3.1 to work around ICE in older gcc versions

* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6

* Update Makefile.system

* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH

* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Martin Kroeker bfa2cc7d64
Restore ppc64 CI job and remove the travis_wait that caused the problem with it 2019-09-20 10:29:35 +02:00
Martin Kroeker e7c4d6705a
Revert #2051 and replace with a better fix (#2261)
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Martin Kroeker 2a1911cc14
Merge pull request #6 from xianyi/develop
update to current develop
2019-09-13 14:00:23 +02:00
Martin Kroeker 9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel 2463938879 fix error message 2019-09-11 10:35:25 -04:00
Guillaume Horel 5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00
Guillaume Horel 6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel 459bb9291d fix error codes 2019-09-10 17:10:33 -04:00
Martin Kroeker 3f1077ce6f
Merge pull request #2249 from brada4/gcc7minor
Address minor warnings popping up in gcc7+
2019-09-10 08:27:32 +02:00
Martin Kroeker eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Guillaume Horel f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel 5997b6b491 bugfix 2019-09-08 11:14:49 -04:00
Guillaume Horel 4b21b646ea turn on optimized code 2019-09-08 11:14:49 -04:00
Guillaume Horel 7ec7b999a5 add missing file 2019-09-08 11:14:49 -04:00
Guillaume Horel af9ac0898a fix Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel c7b5a459b6 add missing defines and headers 2019-09-08 11:14:49 -04:00
Guillaume Horel 9b2f0323d6 update Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel 9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel 42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel 733d97b2df add files 2019-09-08 11:14:49 -04:00
Guillaume Horel ea747cf933 start working on ?trtrs 2019-09-08 11:14:49 -04:00
Andrew 4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew 6e9a93ec19 init 2019-09-07 10:18:46 +03:00