OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	1e7eb7b7a9	Fix typos in currently unused sections	2020-10-13 09:17:15 +02:00
Martin Kroeker	052f31bc3c	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:02:16 +02:00
Martin Kroeker	0f7d73ff6d	Allow supporting only a subset of variable types	2020-10-11 14:53:26 +02:00
Martin Kroeker	b475b4bd0d	Support building only a subset of types	2020-09-22 23:25:04 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	fee361ae64	fix another source of NO_CBLAS=0 surprise	2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Martin Kroeker	2db5178e2d	enable cblas interfaces to GEMM3M in CMAKE builds	2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	8229c163b7	Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH	2020-03-26 21:12:56 +01:00
Martin Kroeker	6a14b34c20	Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX	2020-03-22 14:33:16 +01:00
Martin Kroeker	d65e9a2bbd	Merge pull request #2253 from thrasibule/xerbla fix error messages	2020-01-18 20:39:04 +01:00
Guillaume Horel	2463938879	fix error message	2019-09-11 10:35:25 -04:00
Guillaume Horel	5d6525c87c	more bugfix	2019-09-10 17:30:57 -04:00
Guillaume Horel	459bb9291d	fix error codes	2019-09-10 17:10:33 -04:00
Guillaume Horel	5997b6b491	bugfix	2019-09-08 11:14:49 -04:00
Guillaume Horel	7ec7b999a5	add missing file	2019-09-08 11:14:49 -04:00
Guillaume Horel	af9ac0898a	fix Makefile	2019-09-08 11:14:49 -04:00
Guillaume Horel	9b2f0323d6	update Makefile	2019-09-08 11:14:49 -04:00
Guillaume Horel	ea747cf933	start working on ?trtrs	2019-09-08 11:14:49 -04:00
luz.paz	daf2fec12d	Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`	2019-04-29 17:03:56 -04:00
Martin Kroeker	268c28db7d	Merge pull request #2095 from martin-frbg/trsm Correct length of name string in xerbla call	2019-04-28 09:55:25 +02:00
Martin Kroeker	0bd956fd21	Correct length of name string in xerbla call	2019-04-27 22:49:04 +02:00
Martin Kroeker	79cfc24a62	Add interface for ?sum (derived from ?asum)	2019-03-30 21:59:18 +01:00
Martin Kroeker	c19a449096	Merge pull request #2071 from martin-frbg/issue2068 Provide CBLAS interfaces to I?MIN and I?MAX	2019-03-30 14:54:28 +01:00
Martin Kroeker	3d1e36d4cb	Build CBLAS interfaces for I?MIN and I?MAX	2019-03-30 12:38:41 +01:00
Martin Kroeker	e29b0cfcc4	Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)	2019-02-19 21:03:30 +01:00
Martin Kroeker	8533aca964	Avoid penalizing tall skinny matrices	2019-01-23 10:03:00 +01:00
Martin Kroeker	cda81cfae0	Shift transition to multithreading towards larger matrix sizes See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.	2019-01-19 00:10:01 +01:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Martin Kroeker	5393759a98	Merge pull request #1869 from martin-frbg/axpy0 Handle special case INCX=0,INCY=0 in the axpy interface	2018-11-25 20:52:49 +01:00
Martin Kroeker	c171b8ad13	Handle special case INCX=0,INCY=0 in the axpy interface	2018-11-13 13:57:18 +01:00
Martin Kroeker	96d2f2c9b2	Merge pull request #1831 from brada4/hemv disable threading in C/ZSWAP copying from S/DSWAP	2018-11-07 08:49:21 +01:00
Andrew	2992e3886a	disable threading in C/ZSWAP copying from S/DSWAP	2018-10-22 23:21:49 +03:00
Martin Kroeker	e3c262e5cf	Merge pull request #1825 from brada4/hemv Delay _hemv threading in attempt to address #1820	2018-10-21 20:34:05 +02:00
Andrew	a293bdcd5e	re-arrange new code for readability	2018-10-20 21:37:53 +03:00
Andrew	c7bbf9c987	Attempt to tame _hemv threading #1820	2018-10-20 11:13:29 +03:00
Ashwin Sekhar T K	21f46a1cf2	ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Currently the generic ARMV8 target uses C implementations for many routines. Replace these with the neon implementations written for THUNDERX2T99 target which are upto 6x faster for certain routines.	2018-10-17 10:44:37 -07:00
Martin Kroeker	b991570210	Merge pull request #1762 from martin-frbg/issue1710-2 Add explicit casts to silence compiler warnings	2018-09-19 18:16:21 +02:00
Martin Kroeker	f3c262156e	Add an explicit cast to silence a warning for #1710	2018-09-13 14:24:29 +02:00
Martin Kroeker	30f5a69ab8	Add explicit cast to silence a warning for #1710	2018-09-13 14:23:31 +02:00
Martin Kroeker	4a553e8678	Merge pull request #1713 from martin-frbg/issue1710 Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64	2018-08-04 23:51:31 +02:00
Martin Kroeker	165f00c159	fabs -> fabsl	2018-08-04 20:14:51 +02:00
Martin Kroeker	933896a1d0	Use blasabs to switch between abs and labs as needed for INTERFACE64	2018-08-04 20:06:49 +02:00
Steven G. Johnson	a4e321400b	fabs -> fabsl Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.	2018-08-03 13:00:10 -04:00
Martin Kroeker	9cf22b7d91	Build cblas_iXamin interfaces	2018-06-23 13:27:30 +02:00
Craig Donner	c2545b0fd6	Fixed a few more unnecessary calls to num_cpu_avail. I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.	2018-06-11 10:17:16 +01:00

1 2 3 4 5

250 Commits