Commit Graph

4864 Commits

Author SHA1 Message Date
Martin Kroeker
8a2a137a9e Correct argument to SLASET (Improves fix from PR2778)
as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior
2020-09-05 13:06:31 +02:00
Martin Kroeker
0d1f30a297 Merge pull request #81 from xianyi/develop
rebase
2020-09-05 12:47:03 +02:00
Martin Kroeker
70a254d507 Merge pull request #2822 from martin-frbg/issue2821
Fix potential domain error in sqrt
2020-09-05 12:39:32 +02:00
Martin Kroeker
330044d821 Fix potentiol domain error in sqrt 2020-09-05 09:44:33 +02:00
Martin Kroeker
97636b2c8a Merge pull request #2819 from h-vetinari/carry_lapack_437
Carry lapack#437
2020-09-04 23:50:43 +02:00
Martin Kroeker
4d36711547 Merge pull request #2820 from RajalakshmiSR/clang
POWER9: Fix mcpu option with clang
2020-09-04 23:09:31 +02:00
Rajalakshmi Srinivasaraghavan
718f67421a POWER9: Fix mcpu option with clang
Adding check for compiler type before checking GCC version in Makefile.
This allows clang to use power9 instead of power8 when CORE is POWER9.
2020-09-04 10:36:19 -05:00
H. Vetinari
3426519ae2 adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h 2020-09-04 17:33:24 +02:00
H. Vetinari
1c6c71fa85 Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures
Based on how the surrounding functions in lapack.h are handling the
parameters, particularly the ?ggsv?3-variants of the affected functions
2020-09-04 17:33:11 +02:00
H. Vetinari
860247b5da Follow-up to lapack#434 & lapack#409: fix signature mismatches 2020-09-04 17:32:53 +02:00
Martin Kroeker
c61771e335 Merge pull request #2778 from martin-frbg/lapackeig
Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite
2020-09-04 10:06:02 +02:00
Martin Kroeker
c7ef7174e4 Merge pull request #2817 from martin-frbg/lapack436
LAPACKE: fix declaration of work arrays in [cz]gesvdq
2020-09-03 17:10:23 +02:00
Martin Kroeker
c31b72965e Fix data type of work array in zgesvdq prototype 2020-09-02 23:44:44 +02:00
Martin Kroeker
0ce2aa3163 Fix data type of rwork array 2020-09-02 23:41:51 +02:00
Martin Kroeker
9d1ea75aa0 Merge pull request #80 from xianyi/develop
rebase
2020-09-02 22:16:41 +02:00
Martin Kroeker
776d005f4c Merge pull request #2815 from mhillenibm/clang_s390x
Fix build with clang on s390x
2020-09-02 16:56:01 +02:00
Marius Hillenbrand
2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
095f4e6964 s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)
gcc's default setting for floating-point expression contraction is
"fast", which allows the compiler to emit fused multiply adds instead of
separate multiplies and adds (amongst others). Fused multiply-adds,
which assembly kernels typically apply, also bring a significant
performance advantage to the C implementation for matrix-matrix
multiplication on s390x. To enable that performance advantage for builds
with clang, add -ffp-contract=fast to the compiler options.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
87e5bbd887 s390x: avoid variable-length arrays in struct for asm operands
... since it is not required and clang does not support that gcc
extension. Instead, use a variable-length array directly for these
operands.

Note that, while the actual inline assembly code does not directly use
these memory operands, they serve to inform the compiler that it cannot
reorder reads or writes to/from the input and output data across the
inline asm statements.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
b9b3265ec8 s390x: avoid inline assembly for vector loads for clang
... since clang does not support the instruction format for inline
assembly and also it is not required for current versions of clang.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:30 +02:00
Marius Hillenbrand
a1616a0b86 s390x: replace nop with "nop 0" in inline assembly
... as a bandaid for building with clang until LLVM's internal assembler
supports nops without operand.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:30 +02:00
Marius Hillenbrand
60ef193258 s390x: use "lghi" for immediate values to fix build with clang
Some of the kernels written in assembly utilize a "load address"
instruction for loading an immediate value into a register. That is
both unnecessarily complex and LLVM's assembler does not understand that
specific syntax. Thus, replace with the appropriate "load immediate"
instruction, which is also clearer to read.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:30 +02:00
Martin Kroeker
18bfb6d6f7 Merge pull request #2813 from martin-frbg/issue2804-2
Fix for c_check misinterpreting arm64 in uname -m output as armv7
2020-09-01 23:39:46 +02:00
Martin Kroeker
e4900caa11 Fix c_check misinterpreting arm64 in uname output to mean armv7
additionla fix for upcoming OSX on ARM64 related to #2804, as suggested by fxcoudert in #2805
2020-09-01 19:54:08 +02:00
Martin Kroeker
68b1713c30 Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker
4074770d00 Merge pull request #2797 from martin-frbg/relafixes1
ReLAPACK fixes
2020-09-01 16:04:03 +02:00
Martin Kroeker
b87a77da02 Merge pull request #79 from xianyi/develop
rebase
2020-09-01 12:03:53 +02:00
Martin Kroeker
f42e84d46c Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808)
* Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_

* Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.
2020-09-01 10:44:48 +02:00
Martin Kroeker
0a4c5c4c44 Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker
3210a42734 Report cpu as ARMV8 instead of just giving up on non-Linux hosts 2020-08-31 20:03:21 +02:00
Martin Kroeker
5feb087c05 Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Martin Kroeker
59e01b1aec Merge pull request #2799 from RajalakshmiSR/p10_ger
POWER10: Avoid setting accumulators to zero in gemm kernels
2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan
317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels
For the first iteration, it is better to use xvf*ger instead of xvf*gerpp
builtins which helps to avoid setting accumulators to zero. This helps
to reduce few instructions.
2020-08-28 10:42:54 -05:00
Martin Kroeker
514a3d7d63 Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler
085aae8bdb Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Martin Kroeker
de63675717 Add early returns and fix sign errors in workspace calculations 2020-08-27 11:25:18 +02:00
Martin Kroeker
d64cc2be81 Add early returns 2020-08-27 11:22:50 +02:00
Martin Kroeker
c9b67141f0 Add early returns 2020-08-27 11:20:31 +02:00
Martin Kroeker
6797a3a1e0 Add early returns 2020-08-27 11:15:12 +02:00
Martin Kroeker
936966a42c Make ILAENV and xGETRF2 functions available 2020-08-27 10:59:08 +02:00
Martin Kroeker
5c6c2cd4f6 Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker
e54be4ba1c Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj
48a1364e10 Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing
0c1c903f1e Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-08-24 02:45:54 +08:00
Martin Kroeker
a073fa870e Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker
b2053239fc Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Martin Kroeker
b11bb6e728 Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker
1840bc5b52 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:55:18 +02:00
Martin Kroeker
7c0977c267 Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker
fb3d80c42a Merge pull request #78 from xianyi/develop
rebase
2020-08-22 13:52:29 +02:00