TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
Nursultan Zarlyk
1dfc4e6150
Replace with ARM64 intrinsics
2022-06-09 18:49:49 +02:00
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
2022-06-02 16:53:54 +02:00
Niyas Sait
cdb5d2737e
add support for building on windows/arm64 target
2021-08-16 11:22:51 +01:00
Martin Kroeker
2d45a262d9
Support compilation with nvfortran
2021-01-12 16:32:29 +01:00
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
2020-11-01 00:00:43 +01:00
Martin Kroeker
d237dc1360
Add read barrier definition
2020-04-13 12:11:58 +02:00
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
...
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
e94590e400
Merge pull request #2468 from AGSaidi/wfe
...
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Ali Saidi
0af9991cc9
Use wait-for-event to not spin in the blas_lock
2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c
Make rpcc() on arm64 get closer to what x86 returns
...
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
...
Optimize RPCC function on ARM64
2019-10-25 09:56:30 +02:00
Martin Kroeker
b687fba5bc
Disable direct clock register access on IOS and Android
...
as I find conflicting information on accessibility from non-priviledged processes
2019-10-24 21:18:17 +02:00
Martin Kroeker
5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it ( #2279 )
...
* Add automatic fixups for OSX/IOS cross-compilation
* Add OSX/IOS cross-compilation test to Travis CI
* Handle platforms that lack hwcap.h by falling back to ARMV8
* Fix PROLOGUE for OSX/IOS
2019-10-08 20:13:14 +02:00
Martin Kroeker
f2cde2ccfb
Update common_arm64.h
2019-10-08 20:12:08 +02:00
Martin Kroeker
bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios
2019-10-04 14:50:03 +02:00
Paul Osmialowski
42bbe74791
build: LLVM: Add Flang compiler support and enable OpenMP for Clang
...
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-25 17:03:20 +01:00
Ashwin Sekhar T K
1d121852c1
Fix blas_lock for arm64
2015-11-20 01:45:35 +05:30
Ashwin Sekhar T K
39937d15cd
Change BUFFER_SIZE for Cortex A57 to 20 MB
...
Change the GEMM_P, GEMM_Q, GEMM_R values for Cortex A57
2015-11-20 01:12:04 +05:30
Zhang Xianyi
233ec2a1cc
Use 40 MB buffer for ARM Cortex A57.
2015-11-11 04:22:34 +08:00
Ashwin Sekhar T K
f2f8a0fe8b
Adding arm64 target CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Grazvydas Ignotas
abade3f896
really fix ARM64 locking
2015-08-17 01:27:45 +02:00
Grazvydas Ignotas
6b92204a7c
add fallback blas_lock implementation
...
to be used on armv5 and new platforms
2015-08-16 18:59:17 +02:00
Grazvydas Ignotas
e12cf1123e
add fallback rpcc implementation
...
- use on arm, arm64 and any new platform
- use faster integer math instead of double
- use similar scale as rdtsc so that timeouts work
2015-08-16 18:59:16 +02:00
Zhang Xianyi
3f1b57668e
Fix blas lock bug on AArch64.
2015-06-26 11:54:41 +08:00
Werner Saar
19b8fd2aed
smp lock bugfix
2015-05-23 10:58:38 +02:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Benedikt Huber
58c90d5937
# The first commit's message is:
...
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
fe5f46c330
added experimental support for ARMV8
2013-11-24 15:47:00 +01:00