Commit Graph

30 Commits

Author SHA1 Message Date
TGY b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Nursultan Zarlyk 1dfc4e6150 Replace with ARM64 intrinsics 2022-06-09 18:49:49 +02:00
Nursultan Zarlyk 1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Niyas Sait cdb5d2737e add support for building on windows/arm64 target 2021-08-16 11:22:51 +01:00
Martin Kroeker 2d45a262d9
Support compilation with nvfortran 2021-01-12 16:32:29 +01:00
Martin Kroeker 7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH 2020-11-01 00:00:43 +01:00
Martin Kroeker d237dc1360
Add read barrier definition 2020-04-13 12:11:58 +02:00
Martin Kroeker a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
As shown in #2538, default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker e94590e400
Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Ali Saidi 0af9991cc9 Use wait-for-event to not spin in the blas_lock 2020-02-29 04:23:48 +00:00
Ali Saidi 19f3a4091c Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Martin Kroeker 48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
Optimize RPCC function on ARM64
2019-10-25 09:56:30 +02:00
Martin Kroeker b687fba5bc
Disable direct clock register access on IOS and Android
as I find conflicting information on accessibility from non-priviledged processes
2019-10-24 21:18:17 +02:00
Martin Kroeker 5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it (#2279)
* Add automatic fixups for OSX/IOS cross-compilation

* Add OSX/IOS cross-compilation test to Travis CI

* Handle platforms that lack hwcap.h by falling back to ARMV8

* Fix PROLOGUE for OSX/IOS
2019-10-08 20:13:14 +02:00
Martin Kroeker f2cde2ccfb
Update common_arm64.h 2019-10-08 20:12:08 +02:00
Martin Kroeker bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios 2019-10-04 14:50:03 +02:00
Paul Osmialowski 42bbe74791 build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-25 17:03:20 +01:00
Ashwin Sekhar T K 1d121852c1 Fix blas_lock for arm64 2015-11-20 01:45:35 +05:30
Ashwin Sekhar T K 39937d15cd Change BUFFER_SIZE for Cortex A57 to 20 MB
Change the GEMM_P, GEMM_Q, GEMM_R values for Cortex A57
2015-11-20 01:12:04 +05:30
Zhang Xianyi 233ec2a1cc Use 40 MB buffer for ARM Cortex A57. 2015-11-11 04:22:34 +08:00
Ashwin Sekhar T K f2f8a0fe8b Adding arm64 target CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Grazvydas Ignotas abade3f896 really fix ARM64 locking 2015-08-17 01:27:45 +02:00
Grazvydas Ignotas 6b92204a7c add fallback blas_lock implementation
to be used on armv5 and new platforms
2015-08-16 18:59:17 +02:00
Grazvydas Ignotas e12cf1123e add fallback rpcc implementation
- use on arm, arm64 and any new platform
- use faster integer math instead of double
- use similar scale as rdtsc so that timeouts work
2015-08-16 18:59:16 +02:00
Zhang Xianyi 3f1b57668e Fix blas lock bug on AArch64. 2015-06-26 11:54:41 +08:00
Werner Saar 19b8fd2aed smp lock bugfix 2015-05-23 10:58:38 +02:00
Zhang Xianyi 2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Benedikt Huber 58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00