Martin Kroeker
e7fca060db
Merge pull request #3457 from wjc404/optimize-A53-dgemm
...
MOD: optimize zgemm on cortex-A53/cortex-A55
2021-11-26 10:30:47 +01:00
Martin Kroeker
bc4c98de26
Merge pull request #3456 from martin-frbg/issue3444
...
Add/restore a GENERIC target for MIPS32 and support MIPS32 cross-compilation using CMAKE
2021-11-26 10:29:28 +01:00
Martin Kroeker
c3b1e55bdc
AzureCI: Fetch alpine-chroot-install from master to get key updates ( #3460 )
...
* Fetch alpine-chroot-install from master to get key updates
2021-11-26 09:38:41 +01:00
Jia-Chen
5c1cd5e0c2
MOD: add comments to a53 zgemm kernel
2021-11-25 22:48:48 +08:00
Rafael Cardoso Fernandes Sousa
d5c9353f1b
Modify the order that cmake set the KERNEL variables (generic now is fallback)
2021-11-24 20:08:35 -06:00
Rafael Cardoso Fernandes Sousa
fb891f33da
Fix the cmake parser to identify more patterns
2021-11-24 14:07:28 -06:00
Jia-Chen
9f59b19fcd
MOD: optimize zgemm on cortex-A53/cortex-A55
2021-11-24 21:51:45 +08:00
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
2021-11-23 21:18:08 +01:00
Bine Brank
531a28b6a0
removed unused code (compiler warnings)
2021-11-22 10:12:34 +01:00
Bine Brank
9b9cb90bb1
modify Makefile for SVE copy
2021-11-22 09:54:20 +01:00
Bine Brank
9388f05a3c
configure SVE Makefile
2021-11-21 18:33:43 +01:00
Bine Brank
b58d4f31ab
some clean-up & commentary
2021-11-21 14:56:27 +01:00
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
2021-11-20 23:54:48 +01:00
Martin Kroeker
a3cd36acff
Add CMAKE support for cross-compiling to MIPS32
2021-11-20 17:34:28 +01:00
Martin Kroeker
b7df500106
Add generic mips32 target
2021-11-20 17:31:51 +01:00
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
2021-11-20 17:31:11 +01:00
Bine Brank
e6ed4be02e
symm SVE copy rutines
2021-11-20 16:35:29 +01:00
Caroline Newcombe
feeb8283a5
Fix unsafe read during final iteration of zsymv_L_sse2.S
2021-11-19 14:29:32 -06:00
Martin Kroeker
ec4daf420f
Merge pull request #3451 from wjc404/optimize-A53-dgemm
...
MOD: optimize DGEMM of large matrices on cortex A53 & A55
2021-11-18 18:17:27 +01:00
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
2021-11-18 21:14:43 +08:00
Martin Kroeker
7b825531a6
Merge pull request #3450 from mmuetzel/suffix-nofortran
...
cmake: Set SUFFIX64 also for NOFORTRAN
2021-11-16 23:58:09 +01:00
Markus Mützel
de2ed66596
cmake: Set SUFFIX64 also for NOFORTRAN
2021-11-15 08:53:52 +01:00
Bine Brank
3c7eed0e53
add remaining trmm copy rutines for SVE
2021-11-14 16:00:10 +01:00
Martin Kroeker
8f6c8d1a9e
Merge pull request #3449 from martin-frbg/mips_msa
...
Fix MIPS/MIPS64 compilation querying compiler rather than cpu for MSA capability
2021-11-14 12:01:53 +01:00
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
2021-11-13 23:32:26 +01:00
Martin Kroeker
a569fa1540
MIPS P5600 and 24KC,1004K cpus do not support MSA
2021-11-13 23:26:48 +01:00
Martin Kroeker
d6194d6a0c
get MSA capability from feature flags
2021-11-13 23:25:34 +01:00
Bine Brank
7d996b1c36
dtrmm_utcopy sve function
2021-11-13 18:48:53 +01:00
Martin Kroeker
5b7a3c0e1b
Merge pull request #3447 from martin-frbg/issue3446
...
Fix potentially wrong HOSTARCH definition in cross-compilation
2021-11-11 09:29:36 +01:00
Martin Kroeker
9cc0098ce2
Fix potentially wrong HOSTARCH definition in cross-compilation
2021-11-10 22:27:14 +01:00
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
2021-11-07 20:37:51 +01:00
Martin Kroeker
2d7ca63e21
Merge pull request #3443 from martin-frbg/issue3441
...
Fix NULL pointer checks in blas_memory_alloc
2021-11-05 12:23:47 +01:00
Martin Kroeker
4f057bffd6
Fix NULL pointer checks in blas_memory_alloc
2021-11-05 10:43:17 +01:00
Martin Kroeker
2c32e462ac
Merge pull request #3431 from MehdiChinoune/export-shared-only
...
Fix exported OpenBLASTargets.cmake
2021-11-04 23:48:02 +01:00
Martin Kroeker
7bcd64357d
Merge pull request #3442 from martin-frbg/cpuid_x86
...
Add CPUID recognition of Intel Alder Lake
2021-11-04 23:47:11 +01:00
Martin Kroeker
08f8bb66c0
Add CPUIDs for Alder Lake and other recent Intel cpus
2021-11-04 20:36:39 +01:00
Martin Kroeker
faae86fba2
Add CPUIDs for Alder Lake and some other recent Intel cpus
2021-11-04 20:35:41 +01:00
Martin Kroeker
4ea9a14567
Merge pull request #3429 from martin-frbg/issue3428
...
Adjust compiler options for nvc after 21.9 (and fix typo in DYNAMIC_ARCH settings)
2021-11-04 12:13:22 +01:00
Martin Kroeker
3737766bdd
Merge pull request #3440 from mhillenbrand/fix_gemv_indices
...
Fix flipped indices in benchmark for gemv
2021-11-04 12:11:50 +01:00
Martin Kroeker
efb16fafb0
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE ( #3437 )
...
* return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
2021-11-04 12:11:16 +01:00
Marius Hillenbrand
f119e26354
Fix flipped indices in benchmark for gemv
...
Fixes #3439
2021-11-03 12:45:09 +01:00
Bine Brank
7093372e32
add ARMV8SVE target
2021-11-01 22:53:21 +01:00
Martin Kroeker
abf45f7235
Merge pull request #3427 from mhillenbrand/zarch-detection-notes
...
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
2021-11-01 21:45:33 +01:00
Martin Kroeker
2847a24498
Merge pull request #3434 from gxw-loongson/develop
...
Add cblas_{c/z}srot cblas_{c/z}rotg support
2021-11-01 21:44:49 +01:00
gxw
25f99fa9f8
Add cblas_{c/z}srot cblas_{c/z}rotg support
2021-11-01 20:19:13 +08:00
Bine Brank
a8fbdbac34
fix sve dgemm kernel + sve dtrmm
2021-10-31 10:24:25 +01:00
Martin Kroeker
a6fd497820
Fix nvidia HPC version checks
2021-10-30 17:31:19 +02:00
Bine Brank
746b4f0f17
added SVE ncopy and tcopy
2021-10-30 12:11:44 +02:00
Mehdi Chinoune
9874cd11cb
Fix exported OpenBLASTargets.cmake
...
When both BUILD_SHARED_LIBS and BUILD_STATIC_LIBS are enabled,
cmake export both of them to OpenBLASTargets under tha same name `OpenBLAS::OpenBLAS`
which leads to fatal error about OpenBLAS::OpenBLAS being both static and shared target.
This change makes cmake export only the shared library in that case.
There is another solution to treat them as components,
but I am afraid that will make it backward incompatible.
2021-10-30 04:37:41 +01:00
Martin Kroeker
bb01e26cfe
Adjust compiler options for nvidia hpc 21.9 (and fix a long-standing typo in dynamic_arch settings)
2021-10-29 16:39:03 +02:00