Commit Graph

220 Commits

Author SHA1 Message Date
Martin Kroeker
8533aca964 Avoid penalizing tall skinny matrices 2019-01-23 10:03:00 +01:00
Martin Kroeker
cda81cfae0 Shift transition to multithreading towards larger matrix sizes
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
2019-01-19 00:10:01 +01:00
Arjan van de Ven
cdc668d82b Add a "sgemm direct" mode for small matrixes
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.

This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.

But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.

This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.

What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
Martin Kroeker
5393759a98 Merge pull request #1869 from martin-frbg/axpy0
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-25 20:52:49 +01:00
Martin Kroeker
c171b8ad13 Handle special case INCX=0,INCY=0 in the axpy interface 2018-11-13 13:57:18 +01:00
Martin Kroeker
96d2f2c9b2 Merge pull request #1831 from brada4/hemv
disable threading in C/ZSWAP copying from S/DSWAP
2018-11-07 08:49:21 +01:00
Andrew
2992e3886a disable threading in C/ZSWAP copying from S/DSWAP 2018-10-22 23:21:49 +03:00
Martin Kroeker
e3c262e5cf Merge pull request #1825 from brada4/hemv
Delay _hemv threading in attempt to address #1820
2018-10-21 20:34:05 +02:00
Andrew
a293bdcd5e re-arrange new code for readability 2018-10-20 21:37:53 +03:00
Andrew
c7bbf9c987 Attempt to tame _hemv threading #1820 2018-10-20 11:13:29 +03:00
Ashwin Sekhar T K
21f46a1cf2 ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Martin Kroeker
b991570210 Merge pull request #1762 from martin-frbg/issue1710-2
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
f3c262156e Add an explicit cast to silence a warning
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8 Add explicit cast to silence a warning
for #1710
2018-09-13 14:23:31 +02:00
Martin Kroeker
4a553e8678 Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
165f00c159 fabs -> fabsl 2018-08-04 20:14:51 +02:00
Martin Kroeker
933896a1d0 Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
Steven G. Johnson
a4e321400b fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
2018-08-03 13:00:10 -04:00
Martin Kroeker
9cf22b7d91 Build cblas_iXamin interfaces 2018-06-23 13:27:30 +02:00
Craig Donner
c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Craig Donner
66316b9f4c Improve performance of GEMM for small matrices when SMP is defined.
Always checking num_cpu_avail() regardless of whether threading will actually
be used adds noticeable overhead for small matrices.  Most other uses of
num_cpu_avail() do so only if threading will be used, so do the same here.
2018-06-07 15:29:13 +01:00
Martin Kroeker
e8880c1699 Use a single thread for small input size
copies daxpy improvement from #27, see #1560
2018-06-07 10:26:55 +02:00
Martin Kroeker
1d27fa8507 Merge pull request #1539 from martin-frbg/ztrmv-1332
Disable multithreading in ztrmv
2018-04-27 23:10:21 +02:00
Martin Kroeker
a8ed428bab Disable multithreading in ztrmv
BLAS-Tester shows that the same problem exists as with DTRMV (issue #1332)
2018-04-25 22:35:46 +02:00
Martin Kroeker
809fd0d451 Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480)
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
2018-03-04 17:39:56 +01:00
Martin Kroeker
72f14a0363 Fix conditionals in the rescaling against GAMSQ 2018-02-18 12:54:52 +01:00
Martin Kroeker
798f1595d5 Fix condition in both second scaling loops 2018-02-18 12:37:09 +01:00
Martin Kroeker
0464aa6784 Remove debug printfs 2018-02-09 23:06:50 +01:00
Martin Kroeker
55840f0bc9 Keep the flag handling separate from the scaling loops
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
2018-02-09 23:00:03 +01:00
Andrew
47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Martin Kroeker
38763ec4f3 Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
2017-12-03 22:40:54 +01:00
Martin Kroeker
9251a2efde Merge pull request #1359 from brada4/develop
Eliminate mode variable where not needed in syrk interface
2017-11-18 23:47:17 +01:00
Martin Kroeker
b46e2b57cc Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well 2017-11-18 20:28:02 +01:00
Martin Kroeker
3ce401f51b Make last parameter of cblas_Xdotc_sub/cblas_Xdotu_sub a void pointer as well 2017-11-18 18:58:40 +01:00
Andrew
27575d200a Eliminate mode variable where not needed 2017-11-15 15:32:38 +01:00
Martin Kroeker
2c222f1faa Modify complex CBLAS functions to take void pointers
Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)
2017-11-05 15:53:14 +01:00
Martin Kroeker
742f54c235 Merge pull request #1303 from martin-frbg/imatcopy-rowscols
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
2017-09-14 21:46:26 +02:00
Martin Kroeker
d674fbb4c7 Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Equivalent of #1244 (issue #899) for the non-complex cases. Fixes #1289
2017-09-14 19:59:05 +02:00
Martin Kroeker
46c9357c72 Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Abdurrauf
1cfdb2295d Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Martin Kroeker
00740c0e34 Merge pull request #1290 from martin-frbg/imatcopy
Use in-place transform shortcut only if matrix is square
2017-09-03 13:02:10 +02:00
Martin Kroeker
254db9bd7c Use in-place transform shortcut only if matrix is square 2017-09-03 09:52:55 +02:00
Isuru Fernando
d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Martin Kroeker
376048156b Use in-place transform shortcut only if matrix is square 2017-07-21 11:20:15 +02:00
Martin Kroeker
d1c5b8f913 Add files via upload 2017-07-20 20:51:06 +02:00
Martin Kroeker
91bde7d315 Exchange rows and cols in final omatcopy with BlasTrans
This is MicMuc's patch from #899
2017-07-15 22:02:53 +02:00
Martin Kroeker
1e06b49854 Update xerbla.c 2017-04-26 20:29:30 +02:00
Martin Kroeker
7f546f54fa Add cblas_xerbla 2017-04-26 20:01:34 +02:00
Martin Kroeker
a809431e34 Add cblas_xerbla() 2017-04-26 19:58:59 +02:00
Andrew
99880f7906 Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00