OpenBLAS/kernel
Renato Golin 310ea55f29 Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.

Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.

Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.

A summary:
 * Removed TX2 code from ARMv8 build, to make sure it is compatible with
   all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
   harmed performance on big cores.
 * Commoned up ARMv8 architectures' defines in params.h, to make sure
   that all will benefit from ARMv8 settings, in addition to their own.
 * Adding a few more cores, using ARMv8's include strategy, to benefit
   from compiler optimisations using mtune. Also updated cache
   information from the manuals, making sure we set good conservative
   values by default. Removed Vulcan, as it's an alias to TX2.
 * Auto-detecting most of those cores, but also updating the forced
   compilation in getarch.c, to make sure the parameters are the same
   whether compiled natively or forced arch.

Benefits:
 * ARMv8 build is now guaranteed to work on all ARMv8 cores
 * Improved performance for ARMv8 builds on some cores (A72, Falkor,
   ThunderX1 and 2: up to 11%) over current develop
 * Improved performance for *all* cores comparing to develop branch
   before TX2's patch (9% ~ 36%)
 * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
   current develop's branch and 8% faster than deveop before tx2 patches

Issues:
 * Regression from current develop branch for A53 (-12%) and A57 (-3%)
   with ARMv8 builds, but still faster than before TX2's commit (+15%
   and +24% respectively). This can be improved with a simplification of
   TX2's code, to be done in future patches. At least the code is
   guaranteed to be ARMv8.0 now.

Comments:
 * CortexA57 builds are unchanged on A57 hardware from develop's branch,
   which makes sense, as it's untouched.
 * CortexA72 builds improve over A57 on A72 hardware, even if they're
   using the same includes due to new compiler tunning in the makefile.
2018-11-19 16:41:49 +00:00
..
alpha Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
arm Convert fldmia/fstmia instructions to UAL syntax for clang7 2018-09-28 23:05:15 +02:00
arm64 Simplifying ARMv8 build parameters 2018-11-19 16:41:49 +00:00
generic fix small typo 2018-09-09 16:52:25 +02:00
ia64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
mips Merge pull request #1565 from martin-frbg/mipstypo 2018-05-17 20:22:58 +02:00
mips64 test_axpy work error on LOONGSON3A platform #1777 2018-09-26 15:14:04 +08:00
power Use the new zrot.c on POWER8 for crot as well 2018-05-23 22:54:39 +02:00
sparc Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
x86 Typo fix (misplaced parenthesis) 2018-06-03 13:22:59 +02:00
x86_64 skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case 2018-11-01 01:42:09 +00:00
zarch Merge pull request #1499 from quickwritereader/develop 2018-03-27 21:43:23 +02:00
CMakeLists.txt Initial support for SkylakeX / AVX512 2018-06-03 07:58:52 +00:00
Makefile ARM64: Fix DYNAMIC_ARCH compilation for cores which dont use GEMM3M 2018-10-22 01:45:51 -07:00
Makefile.L1 Remove duplicate -D args in kernel/Makefile.L1 2015-11-09 14:15:48 +05:30
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic 2018-08-28 21:34:07 +02:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c ARM64: Enable DYNAMIC_ARCH 2018-10-22 01:49:35 -07:00