Ashwin Sekhar T K
d5aeff636f
ARM64: Enable DYNAMIC_ARCH
...
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt ).
If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
...
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K
351a0c777c
ARM64: Remove XGENE1 references
...
Remove XGENE1 target as the implementation for the
same is incomplete. Moreover whoever wishes to use
on XGENE1 can use the generic ARMV8 target as there
are no XGENE1 specific optimizations in OpenBLAS.
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Ashwin Sekhar T K
caf339412f
ARM64: Remove dependency of THUNDERX2T99 Makefile on CORTEXA57 Makefile
2018-10-17 08:02:40 -07:00
Ashwin Sekhar T K
8001fdcd2a
ARM64: Remove dependency of THUNDERX Makefile on ARMV8 Makefile
2018-10-17 08:02:16 -07:00
Ashwin Sekhar T K
162e312832
ARM64: Remove dependency of CORTEXA57 Makefile on ARMV8 Makefile
2018-10-17 08:01:45 -07:00
Ashwin Sekhar T K
c3d93caa8d
ARM64: Remove dependency of XGENE1 Makefile on ARMV8 Makefile
2018-10-17 08:01:27 -07:00
Martin Kroeker
1cb7b9015e
Conditional compilation of assembly files that IOS does not like
2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2
Fix paths to C kernels for nrm2
2018-09-04 10:51:19 +02:00
Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
...
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
2018-02-27 10:47:55 +00:00
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
2018-02-25 19:45:16 +01:00
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
...
ref #1439
2018-02-06 11:42:58 +01:00
Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
...
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Ashwin Sekhar T K
4899d67f7d
THUDNERX2T99: Fix clang compilation
2017-08-02 11:28:45 -07:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
2017-02-23 10:02:15 -08:00
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
2017-02-21 21:42:32 -08:00
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
2017-02-21 03:24:32 -08:00
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
2017-02-17 03:06:32 -08:00
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
2017-02-14 04:10:06 -08:00
Ashwin Sekhar T K
8e89668f62
THUNDERX2T99: Fix bug in SNRM2
2017-02-07 02:14:33 -08:00
Ashwin Sekhar T K
f63deae9de
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
2017-02-07 01:35:55 -08:00
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
2017-02-03 03:55:06 -08:00
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
2017-01-19 11:10:57 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
2017-01-16 21:44:19 +05:30
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
...
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
2017-01-10 15:01:37 +05:30
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
2017-01-10 15:01:37 +05:30