Commit Graph

257 Commits

Author SHA1 Message Date
Alex Henrie
f1bf2603e6 Remove dead assignment to dflag in rotmg functions 2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae Don't define the mode variable when not needed in gemm functions 2021-01-14 19:40:31 -07:00
Martin Kroeker
a8f249458d Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd Add CBLAS interfaces for csrot and zdrot 2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9 Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d Allow supporting only a subset of variable types 2020-10-11 14:53:26 +02:00
Martin Kroeker
b475b4bd0d Support building only a subset of types 2020-09-22 23:25:04 +02:00
Chen, Guobing
deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
75eeb265d7 [WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
fee361ae64 fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K
4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
2db5178e2d enable cblas interfaces to GEMM3M in CMAKE builds 2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
8229c163b7 Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 2020-03-26 21:12:56 +01:00
Martin Kroeker
6a14b34c20 Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 2020-03-22 14:33:16 +01:00
Martin Kroeker
d65e9a2bbd Merge pull request #2253 from thrasibule/xerbla
fix error messages
2020-01-18 20:39:04 +01:00
Guillaume Horel
2463938879 fix error message 2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00
Guillaume Horel
459bb9291d fix error codes 2019-09-10 17:10:33 -04:00
Guillaume Horel
5997b6b491 bugfix 2019-09-08 11:14:49 -04:00
Guillaume Horel
7ec7b999a5 add missing file 2019-09-08 11:14:49 -04:00
Guillaume Horel
af9ac0898a fix Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6 update Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel
ea747cf933 start working on ?trtrs 2019-09-08 11:14:49 -04:00
luz.paz
daf2fec12d Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker
268c28db7d Merge pull request #2095 from martin-frbg/trsm
Correct length of name string in xerbla call
2019-04-28 09:55:25 +02:00
Martin Kroeker
0bd956fd21 Correct length of name string in xerbla call 2019-04-27 22:49:04 +02:00
Martin Kroeker
79cfc24a62 Add interface for ?sum (derived from ?asum) 2019-03-30 21:59:18 +01:00
Martin Kroeker
c19a449096 Merge pull request #2071 from martin-frbg/issue2068
Provide CBLAS interfaces to I?MIN and I?MAX
2019-03-30 14:54:28 +01:00
Martin Kroeker
3d1e36d4cb Build CBLAS interfaces for I?MIN and I?MAX 2019-03-30 12:38:41 +01:00
Martin Kroeker
e29b0cfcc4 Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
2019-02-19 21:03:30 +01:00
Martin Kroeker
8533aca964 Avoid penalizing tall skinny matrices 2019-01-23 10:03:00 +01:00
Martin Kroeker
cda81cfae0 Shift transition to multithreading towards larger matrix sizes
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
2019-01-19 00:10:01 +01:00
Arjan van de Ven
cdc668d82b Add a "sgemm direct" mode for small matrixes
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.

This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.

But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.

This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.

What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
Martin Kroeker
5393759a98 Merge pull request #1869 from martin-frbg/axpy0
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-25 20:52:49 +01:00
Martin Kroeker
c171b8ad13 Handle special case INCX=0,INCY=0 in the axpy interface 2018-11-13 13:57:18 +01:00
Martin Kroeker
96d2f2c9b2 Merge pull request #1831 from brada4/hemv
disable threading in C/ZSWAP copying from S/DSWAP
2018-11-07 08:49:21 +01:00
Andrew
2992e3886a disable threading in C/ZSWAP copying from S/DSWAP 2018-10-22 23:21:49 +03:00
Martin Kroeker
e3c262e5cf Merge pull request #1825 from brada4/hemv
Delay _hemv threading in attempt to address #1820
2018-10-21 20:34:05 +02:00
Andrew
a293bdcd5e re-arrange new code for readability 2018-10-20 21:37:53 +03:00
Andrew
c7bbf9c987 Attempt to tame _hemv threading #1820 2018-10-20 11:13:29 +03:00
Ashwin Sekhar T K
21f46a1cf2 ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Martin Kroeker
b991570210 Merge pull request #1762 from martin-frbg/issue1710-2
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
f3c262156e Add an explicit cast to silence a warning
for #1710
2018-09-13 14:24:29 +02:00