Martin Kroeker
eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
...
Fix null pointer check in blas_memory_alloc
2021-01-26 20:11:42 +01:00
Martin Kroeker
85be43e0df
Merge pull request #3083 from martin-frbg/develop
...
Add DYNAMIC_LIST support for ARM64
2021-01-26 15:13:35 +01:00
Martin Kroeker
0cb9e9fc8d
Remove the VORTEX support bits again for now
2021-01-25 19:02:21 +01:00
Martin Kroeker
cb61d3b46b
Add DYNAMIC_LIST support for ARM64
2021-01-25 13:13:20 +01:00
Alex Henrie
113840da12
Fix null pointer check in blas_memory_alloc
2021-01-24 22:20:44 -07:00
Martin Kroeker
deb2e66bcc
Add DYNAMIC_LIST support for ARM64
2021-01-24 23:18:52 +01:00
Martin Kroeker
9b2d69aa80
Add DYNAMIC_LIST option for ARM64
2021-01-24 23:18:01 +01:00
Martin Kroeker
e3ff4cdd23
Merge pull request #9 from xianyi/develop
...
rebase
2021-01-24 23:14:45 +01:00
Martin Kroeker
0745ba43a4
Merge pull request #3082 from RajalakshmiSR/scalp10
...
Optimize s/dscal function for POWER10
2021-01-24 19:03:40 +01:00
Rajalakshmi Srinivasaraghavan
3ede843d50
Optimize s/dscal function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
xoviat
2e8d6e8690
add functions for embedded
2021-01-23 22:12:17 -06:00
Martin Kroeker
69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
...
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00
Martin Kroeker
d6905403e3
Merge pull request #3068 from alexhenrie/scan-build
...
scan-build fixes
2021-01-23 19:06:29 +01:00
Martin Kroeker
411926b572
Merge pull request #3079 from RajalakshmiSR/rotp10
...
Optimize s/drot function for POWER10
2021-01-22 08:26:00 +01:00
Rajalakshmi Srinivasaraghavan
439b93f6d2
Optimize s/drot function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-21 13:24:45 -06:00
Martin Kroeker
d6cf67778c
Merge pull request #3075 from martin-frbg/issue3074
...
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
2021-01-21 08:51:30 +01:00
Martin Kroeker
b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
2021-01-20 21:34:36 +01:00
Martin Kroeker
6178974cd9
Update .drone.yml
2021-01-20 20:21:27 +01:00
Martin Kroeker
0b9e4d1278
Add gcc10/arm64 DYNAMIC_ARCH build
2021-01-20 18:30:05 +01:00
Martin Kroeker
63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
...
fixes #3074
2021-01-20 15:41:04 +01:00
Martin Kroeker
3612d9a57a
Merge pull request #8 from xianyi/develop
...
rebase
2021-01-20 15:38:30 +01:00
xoviat
b60de4447a
add cortex-m platform
2021-01-19 08:57:44 -06:00
Martin Kroeker
16dddb760e
Merge pull request #3070 from RajalakshmiSR/cdot
...
Optimize cdot function for POWER10
2021-01-16 15:47:34 +01:00
Rajalakshmi Srinivasaraghavan
eff7c9166e
Optimize cdot function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-15 13:40:34 -06:00
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Alex Henrie
202fc9e8ed
Fix uninitialized argument value in dasum_k
2021-01-14 19:40:31 -07:00
Martin Kroeker
e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
...
Fix building "generic" TRMM kernel with CMake
2021-01-14 21:35:19 +01:00
Martin Kroeker
3628b22d49
Merge pull request #3064 from martin-frbg/issue3063
...
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
2021-01-14 16:47:59 +01:00
Martin Kroeker
af2b0d0205
Merge pull request #3066 from martin-frbg/buffsizefix
...
Fix compile-time setting of the GEMM buffer size for gmake builds
2021-01-14 16:00:38 +01:00
Martin Kroeker
4bf988959a
Merge pull request #3062 from austinpagan/GemmPreferedSize3
...
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
2021-01-14 15:59:53 +01:00
Martin Kroeker
a0e4fb3a28
Merge pull request #3061 from martin-frbg/arm64-pgi
...
Support NVIDIA HPC SDK on ARM64
2021-01-14 15:59:21 +01:00
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
...
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Albert Ziegenhagel
e3f4063683
Fix building "generic" TRMM kernel with CMake
...
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
2021-01-14 10:00:49 +01:00
Martin Kroeker
6bbe6d5b92
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
2021-01-13 22:36:04 +01:00
Martin Kroeker
89ae305e11
Workaround for cmake having its own C_COMPILER variable
2021-01-13 12:30:26 +01:00
Martin Kroeker
da8d7f09f1
try to work around gcc update problems
2021-01-13 09:46:53 +01:00
Martin Kroeker
25c986db5a
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
2021-01-13 00:30:27 +01:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
bc5b35367f
restore Makefile after accidental overwrite
2021-01-13 00:28:43 +01:00
Martin Kroeker
930aff2c2e
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:27:42 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
9ccb12b031
Add prototypes for cblas_csrot and cblas_zdrot
2021-01-12 23:20:07 +01:00
Martin Kroeker
e18a2c22db
Merge pull request #3060 from martin-frbg/dyn_arm64
...
Label the assembly part of the ARMV8 dynamic arch detection as volatile
2021-01-12 23:02:05 +01:00
Martin Kroeker
b716c0ef01
Add workaround for NVIDIA HPC
2021-01-12 16:51:35 +01:00
Martin Kroeker
2efa3b70dc
Add workaround for NVIDIA HPC
2021-01-12 16:49:39 +01:00
Martin Kroeker
49959d4f1c
Add workaround for NVIDIA HPC
2021-01-12 16:47:15 +01:00
Martin Kroeker
0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:39:35 +01:00
Martin Kroeker
c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:38:51 +01:00
Martin Kroeker
43aac5bacc
Support NVIDIA HPC compiler
2021-01-12 16:36:12 +01:00