Commit Graph

244 Commits

Author SHA1 Message Date
Martin Kroeker 8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Ian McInerney 79c15db348 Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Martin Kroeker 54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:55:17 +02:00
Manjul Mohan 58b88aa5f0 POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.

Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
Martin Kroeker 1688c7da43
change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Martin Kroeker 6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-24 17:42:31 +02:00
Martin Kroeker c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Rajalakshmi Srinivasaraghavan a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan 432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
VFerrari cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker 9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Rajalakshmi Srinivasaraghavan f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Rajalakshmi Srinivasaraghavan 8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Rajalakshmi Srinivasaraghavan b62173c5a0 POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
2022-05-12 11:17:33 -05:00
Martin Kroeker 05dcfa176e
fix undefined prefetchsizes 2022-04-16 10:04:27 +02:00
Martin Kroeker 2bbb9f05c7
fix undefined prefetchsize 2022-04-16 10:00:10 +02:00
Rafael Cardoso Fernandes Sousa c78fdcc80d [POWER] Add support for SMALL_MATRIX_OPT 2021-11-28 12:41:16 -06:00
kavanabhat 9cc95e5657 AIX changes for P10 with GNU Compiler 2021-10-01 05:18:35 -05:00
kavanabhat fe3c778c51 AIX changes for P10 with GNU Compiler 2021-09-30 06:06:27 -05:00
Rafael Cardoso Fernandes Sousa b751edf624 Fix unused variable warnings on Power 2021-09-15 13:36:07 -05:00
Rajalakshmi Srinivasaraghavan b06880c2cd POWER10: Improving dasum performance
Unrolling a loop in dasum micro code to help in improving
POWER10 performance.
2021-08-10 22:06:04 -05:00
Martin Kroeker c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
2021-06-15 22:58:48 +02:00
Gordon Fossum e6dd44d989 Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
2021-06-15 13:07:47 -05:00
Martin Kroeker 2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
2021-06-10 18:05:47 +02:00
Martin Kroeker efdbdd8f82
Add prefetch values for power3 2021-06-10 11:20:29 +02:00
Martin Kroeker 3906ef3b0f
Add prefetch values for power3 2021-06-10 11:19:40 +02:00
Martin Kroeker 8adf0971d8
Add prefetch values for power3 2021-06-10 11:18:22 +02:00
Martin Kroeker 08e2e60762
Add prefetch values for power3 2021-06-10 11:17:33 +02:00
Martin Kroeker fb9e678235
Fix caxpy/zaxpy for big-endian 2021-06-10 11:15:48 +02:00
Martin Kroeker dc4fcb48df
Fix inverted conditional for caxpy/zaxpy 2021-06-10 11:14:03 +02:00
Martin Kroeker 7a48247761
fix c/zrot and sgemv for POWER5 2021-06-10 11:11:56 +02:00
Rajalakshmi Srinivasaraghavan cbb70438df POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
2021-06-09 12:20:09 -05:00
Rajalakshmi Srinivasaraghavan 2379abaa5e POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
2021-04-13 22:30:06 -05:00
Rajalakshmi Srinivasaraghavan 55bb9f639a POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
2021-04-10 19:00:24 -05:00
Rajalakshmi Srinivasaraghavan 2dbcddd83d POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
2021-03-31 21:32:42 -05:00
Martin Kroeker 86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler 2021-03-19 11:47:58 +01:00
Martin Kroeker ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:46:25 +01:00
Martin Kroeker d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:44:31 +01:00
Rajalakshmi Srinivasaraghavan 09d47af2c0 Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-10 17:15:33 -06:00
Rajalakshmi Srinivasaraghavan 41646ed006 Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-05 16:22:36 -06:00
Rajalakshmi Srinivasaraghavan 0571c3187b POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.

Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
2021-02-26 20:56:34 -06:00
Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-29 13:51:43 -06:00
Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
Rajalakshmi Srinivasaraghavan 439b93f6d2 Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-21 13:24:45 -06:00
Rajalakshmi Srinivasaraghavan eff7c9166e Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-15 13:40:34 -06:00
Rajalakshmi Srinivasaraghavan 601b711c78 Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-08 08:01:36 -06:00
Rajalakshmi Srinivasaraghavan 2fb11f873b POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-13 10:41:45 -06:00
Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Rajalakshmi Srinivasaraghavan 346e30a46a POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-10 11:51:42 -06:00