Martin Kroeker
8c99d5d1b6
Merge pull request #3796 from martin-frbg/gemmt
...
Add a trivial GEMMT implementation based on a looped GEMV
2022-11-12 19:06:05 +01:00
Martin Kroeker
e6204d254f
Update CMakeLists.txt
2022-11-08 16:21:11 +01:00
Martin Kroeker
1b77764182
Conditionally leave out bits of LAPACK to be overridden by ReLAPACK
2022-11-08 12:02:59 +01:00
Martin Kroeker
c970717157
fix missing t in xgemmt rule
...
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker
e7fd8d21a6
Add GEMMT based on looped GEMV
2022-10-26 15:33:58 +02:00
Martin Kroeker
a3e02742f2
Add USE_PERL fallback option for create script used with FUNCTION_PROFILE
2022-05-22 18:32:19 +02:00
Martin Kroeker
f1c570a5f1
Add back original PERL-based script under new name
2022-05-22 18:29:01 +02:00
Owen Rafferty
42c7a27e6b
rewrite perl scripts in universal shell
2022-05-18 19:00:15 -05:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
d2b5fbf80f
Exclude some complex (LAPACK) functions when NO_LAPACK is set
2022-01-27 22:02:08 +01:00
Martin Kroeker
64365c919e
fix function typecasts
2021-12-21 18:47:35 +01:00
gxw
25f99fa9f8
Add cblas_{c/z}srot cblas_{c/z}rotg support
2021-11-01 20:19:13 +08:00
Martin Kroeker
4b3769823a
Revert #3252
2021-10-24 23:57:06 +02:00
Martin Kroeker
2845f54eb8
Remove dangerous optimization from previous #3252 - buffer is never unused here
2021-10-20 10:50:02 +02:00
Martin Kroeker
c35739db5e
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla
2021-09-14 16:15:57 +02:00
Martin Kroeker
1085775bc6
really remove the unused variable
2021-09-11 15:05:55 +02:00
Martin Kroeker
20581bf303
Remove unused variable
2021-09-11 14:36:27 +02:00
Wangyang Guo
4289cf048d
sbgemm: avoid falling into SGEMM_KERNEL_DIRECT
2021-09-07 21:30:46 +08:00
Wangyang Guo
2e44ca0136
sbgemm: add missing cblas_sbgemm definition
2021-08-30 17:40:30 +08:00
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
2021-08-30 17:40:20 +08:00
Wangyang Guo
c17d6dacb2
Small Matrix: skip compile in unimplemented data type
2021-08-05 05:46:13 +00:00
Wangyang Guo
aa50185647
Small Matrix: better handle with GEMM3M marco
2021-08-05 02:45:53 +00:00
Wangyang Guo
478d1086c1
Small Matrix: support DYNAMIC_ARCH build
2021-08-04 03:12:41 +00:00
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
4271cfcc6f
Fix gemm interface bug for small matrix.
2021-08-02 07:06:51 +00:00
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
...
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker
1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64
2021-07-14 20:32:57 +02:00
Martin Kroeker
7bb59fceb7
Clean up some warnings
2021-07-11 16:00:29 +02:00
Martin Kroeker
4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit
...
Add lower limit for multithreading in xSYRK
2021-07-07 20:46:28 +02:00
Martin Kroeker
8186963d8c
Add lower limit for multithreading
2021-07-04 17:00:26 +02:00
Martin Kroeker
726c44242b
Add lower threshold for multithreading
2021-07-01 17:41:05 +02:00
Martin Kroeker
1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri
2021-06-26 23:47:41 +02:00
Martin Kroeker
baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
...
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker
7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
...
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker
f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-29 22:28:00 +02:00
Martin Kroeker
734bd265a8
revert symv changes for now
2021-05-29 15:40:03 +02:00
Martin Kroeker
1217eb910d
Fix copy-paste errors in variables used
2021-05-28 09:38:48 +02:00
Martin Kroeker
d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-27 22:39:18 +02:00
Martin Kroeker
f0e7345fb8
Add shortcut for small-size gemv_n with increments of one
2021-05-26 22:02:34 +02:00
Martin Kroeker
03297ff9f0
Add fast path for small xSYR with INCX==1
2021-05-22 20:41:18 +02:00
Gordon Fossum
8b599836db
Add error message token for SBGEMM in gemm.c
2021-05-04 13:55:02 -05:00
Martin Kroeker
904b221f03
Add cast to prevent overflow of intermediate result
2021-05-01 14:47:22 +02:00
Martin Kroeker
c5fb91f1bc
Fix division by zero in the non-x86 codepath
2021-04-29 09:47:18 +02:00
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
2021-02-24 14:07:20 +01:00
Martin Kroeker
bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 16:46:25 +01:00
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d
Use ifeq instead of ifdef for user-definable build options
2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9
Fix typos in currently unused sections
2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d
Allow supporting only a subset of variable types
2020-10-11 14:53:26 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
2020-03-26 21:12:56 +01:00
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-22 14:33:16 +01:00
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
...
fix error messages
2020-01-18 20:39:04 +01:00
Guillaume Horel
2463938879
fix error message
2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c
more bugfix
2019-09-10 17:30:57 -04:00
Guillaume Horel
459bb9291d
fix error codes
2019-09-10 17:10:33 -04:00
Guillaume Horel
5997b6b491
bugfix
2019-09-08 11:14:49 -04:00
Guillaume Horel
7ec7b999a5
add missing file
2019-09-08 11:14:49 -04:00
Guillaume Horel
af9ac0898a
fix Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6
update Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
ea747cf933
start working on ?trtrs
2019-09-08 11:14:49 -04:00
luz.paz
daf2fec12d
Misc. typo fixes
...
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker
268c28db7d
Merge pull request #2095 from martin-frbg/trsm
...
Correct length of name string in xerbla call
2019-04-28 09:55:25 +02:00
Martin Kroeker
0bd956fd21
Correct length of name string in xerbla call
2019-04-27 22:49:04 +02:00
Martin Kroeker
79cfc24a62
Add interface for ?sum (derived from ?asum)
2019-03-30 21:59:18 +01:00
Martin Kroeker
c19a449096
Merge pull request #2071 from martin-frbg/issue2068
...
Provide CBLAS interfaces to I?MIN and I?MAX
2019-03-30 14:54:28 +01:00
Martin Kroeker
3d1e36d4cb
Build CBLAS interfaces for I?MIN and I?MAX
2019-03-30 12:38:41 +01:00
Martin Kroeker
e29b0cfcc4
Allow multithreading TRMV again
...
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388 )
2019-02-19 21:03:30 +01:00
Martin Kroeker
8533aca964
Avoid penalizing tall skinny matrices
2019-01-23 10:03:00 +01:00
Martin Kroeker
cda81cfae0
Shift transition to multithreading towards larger matrix sizes
...
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
2019-01-19 00:10:01 +01:00
Arjan van de Ven
cdc668d82b
Add a "sgemm direct" mode for small matrixes
...
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.
This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.
But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.
This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.
What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
Martin Kroeker
5393759a98
Merge pull request #1869 from martin-frbg/axpy0
...
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-25 20:52:49 +01:00
Martin Kroeker
c171b8ad13
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-13 13:57:18 +01:00
Martin Kroeker
96d2f2c9b2
Merge pull request #1831 from brada4/hemv
...
disable threading in C/ZSWAP copying from S/DSWAP
2018-11-07 08:49:21 +01:00
Andrew
2992e3886a
disable threading in C/ZSWAP copying from S/DSWAP
2018-10-22 23:21:49 +03:00
Martin Kroeker
e3c262e5cf
Merge pull request #1825 from brada4/hemv
...
Delay _hemv threading in attempt to address #1820
2018-10-21 20:34:05 +02:00
Andrew
a293bdcd5e
re-arrange new code for readability
2018-10-20 21:37:53 +03:00
Andrew
c7bbf9c987
Attempt to tame _hemv threading #1820
2018-10-20 11:13:29 +03:00
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
...
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
f3c262156e
Add an explicit cast to silence a warning
...
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8
Add explicit cast to silence a warning
...
for #1710
2018-09-13 14:23:31 +02:00