Albert Ziegenhagel
6b731d917f
Do not require pkg-config to generate the *.pc file
...
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker
5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
...
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker
aa286e301b
Add typedef for bfloat16 if needed
2020-08-17 15:32:14 +02:00
Martin Kroeker
9f0ef9cdfc
Merge pull request #77 from xianyi/develop
...
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker
6bfc66663c
revert
2020-08-17 15:20:41 +02:00
Martin Kroeker
a8c6fb9e1c
revert
2020-08-17 15:20:16 +02:00
Martin Kroeker
5ec8f716cf
revert
2020-08-17 15:19:40 +02:00
Martin Kroeker
82f8a0aeba
Update .drone.yml
2020-08-15 15:46:18 +02:00
Martin Kroeker
d57d503c15
Update Makefile
2020-08-15 14:46:26 +02:00
Martin Kroeker
37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build
2020-08-15 13:38:05 +02:00
Martin Kroeker
6a93e3b2ba
Add simple sgemm preicsion test
2020-08-15 13:33:52 +02:00
Martin Kroeker
47ce1dd08f
Update gemm64.cpp
2020-08-15 13:31:28 +02:00
Martin Kroeker
f5fcc5baec
Add trivial gemm test for multithread consistency
2020-08-15 13:30:29 +02:00
Martin Kroeker
597010a968
Fix incorrect argument to SLASET
...
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:41:56 +02:00
Martin Kroeker
d64f1ef26b
Fix incorrect argument to SLASET
...
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:40:24 +02:00
Martin Kroeker
c62aad62e5
Fix incorrect calls to DLASET
...
Reference-LAPACK issue 429
2020-08-14 00:35:45 +02:00
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
...
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
efdd237a91
Add a dedicated POWER9 build to the Travis CI ( #2774 )
...
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker
4573cb2f43
Merge pull request #2765 from martin-frbg/issue2760
...
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker
2a4bb797db
Merge pull request #2773 from martin-frbg/issue2770
...
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker
cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
...
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker
619343278d
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 13:40:40 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Martin Kroeker
62f4c84f27
Merge pull request #76 from xianyi/develop
...
rebase
2020-08-11 13:25:12 +02:00
Marius Hillenbrand
e115c97e05
s390x/SGEMM: adjust default P and Q to multiples of M
...
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:46 +02:00
Marius Hillenbrand
07c334e7be
s390x: Factor out small block sizes for SGEMM/DGEMM on z14
...
For small register blockings that are too small to fill up vector
registers with column vectors, we currently use a generic code block.
Replace that with instantiations of the generic code as individual
functions, so that the compiler can optimize each one separately.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:56:39 +02:00
Marius Hillenbrand
e2828e30aa
s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
...
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.
Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:55:42 +02:00
Martin Kroeker
7219c9cb87
Merge pull request #2764 from martin-frbg/lapacktests
...
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker
c9d32674ea
Add memory barrier to the blas_lock implementation for Linux
...
as recommended by cparrott73 in #2760
2020-08-09 19:17:04 +02:00
Martin Kroeker
64259d521a
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL
2020-08-09 13:02:27 +02:00
Martin Kroeker
6f5ca44c1a
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3
2020-08-09 12:59:20 +02:00
Martin Kroeker
d28b3f2776
Create Jenkinsfile for OSUOSL PowerCI
2020-08-08 18:05:20 +02:00
Martin Kroeker
ba3f7b3acf
Merge pull request #2761 from RajalakshmiSR/Makefile_err
...
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00
Rajalakshmi Srinivasaraghavan
475b5c95b9
Remove extra symbol in Makefile
...
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker
cd60080d4a
Merge pull request #2758 from martin-frbg/undef_shift
...
Fix GCC ubsan warnings in x86_64 complex dot and gemv_t kernels
2020-08-03 23:30:26 +02:00
Martin Kroeker
4847bfdddd
Merge pull request #2757 from martin-frbg/cmake64
...
Fix lapack-tests linking to a suffixed libopenblas in cmake builds
2020-08-02 23:05:21 +02:00
Martin Kroeker
81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number
...
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number
...
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb
Multiply by two instead of left-shifting one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c
Multiply by two rather than left shift by one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Martin Kroeker
aaf1a17168
Apply current library name suffix
2020-08-02 17:58:33 +02:00
Martin Kroeker
53add6a80d
Apply library name suffix to openblas if any
2020-08-02 17:57:12 +02:00
Martin Kroeker
9eb897cc01
Merge pull request #75 from xianyi/develop
...
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker
7cead56258
Merge pull request #2753 from martin-frbg/issue2751
...
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker
6794ac3415
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed
2020-08-02 11:20:08 +02:00
Martin Kroeker
ecf4b9e0fc
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition
2020-08-01 17:06:03 +02:00
Martin Kroeker
dfe5d09641
Merge pull request #2756 from martin-frbg/issue2755
...
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker
60cd5e55fc
Protect against inadvertent activation of USE_CUDA
2020-08-01 12:31:39 +02:00
Martin Kroeker
da9e2a7ada
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-07-31 16:03:33 +02:00
Martin Kroeker
c88cbc5e0d
Merge pull request #2752 from kadler/cpuid_aix
...
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00