OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	b2053239fc	Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function	2020-08-23 15:08:16 +02:00
Martin Kroeker	b11bb6e728	Merge pull request #2790 from martin-frbg/issue2789 Add OpenMP dependency to pkgconfig information if needed	2020-08-23 14:42:35 +02:00
Martin Kroeker	1840bc5b52	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:55:18 +02:00
Martin Kroeker	7c0977c267	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:53:44 +02:00
Martin Kroeker	fb3d80c42a	Merge pull request #78 from xianyi/develop rebase	2020-08-22 13:52:29 +02:00
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	2020-08-20 19:54:29 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	7c1986640b	fallback from cooperlake to skylake if gcc<10	2020-08-19 20:48:39 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Martin Kroeker	430f741b30	-march=cooperlake requires gcc10	2020-08-19 17:17:53 +02:00
Martin Kroeker	6f4dc7445d	Fix typo	2020-08-19 16:36:55 +02:00
Martin Kroeker	81fbe8d088	-march=cooperlake only available in gcc >= 10	2020-08-19 16:10:15 +02:00
Martin Kroeker	bb9cf766f5	make march=cooperlake option conditional on gcc >= 10.1	2020-08-19 15:06:30 +02:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	2c72972570	Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config Do not require pkg-config to generate the *.pc file	2020-08-19 14:42:58 +02:00
Albert Ziegenhagel	6b731d917f	Do not require pkg-config to generate the *.pc file Generating the pkg-config file does not actually depend on pkg-config being available.	2020-08-18 08:48:48 +02:00
Martin Kroeker	5dcf47cd97	Merge pull request #2784 from martin-frbg/issue2783 Add fallback typedef for bfloat16 to openblas_config.h template	2020-08-17 19:06:13 +02:00
Martin Kroeker	aa286e301b	Add typedef for bfloat16 if needed	2020-08-17 15:32:14 +02:00
Martin Kroeker	9f0ef9cdfc	Merge pull request #77 from xianyi/develop rebase	2020-08-17 15:28:15 +02:00
Martin Kroeker	6bfc66663c	revert	2020-08-17 15:20:41 +02:00
Martin Kroeker	a8c6fb9e1c	revert	2020-08-17 15:20:16 +02:00
Martin Kroeker	5ec8f716cf	revert	2020-08-17 15:19:40 +02:00
Martin Kroeker	82f8a0aeba	Update .drone.yml	2020-08-15 15:46:18 +02:00
Martin Kroeker	d57d503c15	Update Makefile	2020-08-15 14:46:26 +02:00
Martin Kroeker	37ac23e8a3	Add simple MT sgemm precision test and INTERFACE64 build	2020-08-15 13:38:05 +02:00
Martin Kroeker	6a93e3b2ba	Add simple sgemm preicsion test	2020-08-15 13:33:52 +02:00
Martin Kroeker	47ce1dd08f	Update gemm64.cpp	2020-08-15 13:31:28 +02:00
Martin Kroeker	f5fcc5baec	Add trivial gemm test for multithread consistency	2020-08-15 13:30:29 +02:00
Martin Kroeker	597010a968	Fix incorrect argument to SLASET Reference-LAPACK issue 425 (and 318)	2020-08-14 00:41:56 +02:00
Martin Kroeker	d64f1ef26b	Fix incorrect argument to SLASET Reference-LAPACK issue 425 (and 318)	2020-08-14 00:40:24 +02:00
Martin Kroeker	c62aad62e5	Fix incorrect calls to DLASET Reference-LAPACK issue 429	2020-08-14 00:35:45 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	efdd237a91	Add a dedicated POWER9 build to the Travis CI (#2774 ) * Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that might end up on P8 hardware half of the time) * Bump gcc version for POWER9 build	2020-08-12 23:08:38 +02:00
Martin Kroeker	4573cb2f43	Merge pull request #2765 from martin-frbg/issue2760 Add memory barrier to the PPC blas_lock implementation for Linux	2020-08-11 22:40:17 +02:00
Martin Kroeker	2a4bb797db	Merge pull request #2773 from martin-frbg/issue2770 Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0	2020-08-11 21:02:55 +02:00
Martin Kroeker	cbbe38bb88	Merge pull request #2772 from mhillenibm/s390x_gemm_tuning s390x: GEMM tuning for z14	2020-08-11 18:14:09 +02:00
Martin Kroeker	619343278d	Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0	2020-08-11 13:40:40 +02:00
Martin Kroeker	fee361ae64	fix another source of NO_CBLAS=0 surprise	2020-08-11 13:27:19 +02:00
Martin Kroeker	62f4c84f27	Merge pull request #76 from xianyi/develop rebase	2020-08-11 13:25:12 +02:00
Marius Hillenbrand	e115c97e05	s390x/SGEMM: adjust default P and Q to multiples of M We recently changed the register blocking for SGEMM on s390x to 16x4. However, we did not adjust Q to a multiple of 16 and thus fell back to the 8x4 kernel at each block's margin, without need. Adjust P and Q to multiples of 16 to employ the faster 16x4 kernel for complete full-sized blocks. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-08-11 12:56:46 +02:00
Marius Hillenbrand	07c334e7be	s390x: Factor out small block sizes for SGEMM/DGEMM on z14 For small register blockings that are too small to fill up vector registers with column vectors, we currently use a generic code block. Replace that with instantiations of the generic code as individual functions, so that the compiler can optimize each one separately. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-08-11 12:56:39 +02:00
Marius Hillenbrand	e2828e30aa	s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks. Specifically, we explicitly interleave vector register loads and computation of two iterations. Note that this change only adds one C function, since SGEMM 16x4 and DGEMM 8x4 actually map to the same C code: they both hold intermediate results in a 4x4 grid of vector registers, and the C implementation is built around that. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-08-11 12:55:42 +02:00
Martin Kroeker	7219c9cb87	Merge pull request #2764 from martin-frbg/lapacktests Fix array overruns in the LIN part of the LAPACK testsuite	2020-08-10 13:27:51 +02:00
Martin Kroeker	c9d32674ea	Add memory barrier to the blas_lock implementation for Linux as recommended by cparrott73 in #2760	2020-08-09 19:17:04 +02:00
Martin Kroeker	64259d521a	Fix use of unallocated array in workspace query and wrong type of argument to xSCAL	2020-08-09 13:02:27 +02:00
Martin Kroeker	6f5ca44c1a	Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3	2020-08-09 12:59:20 +02:00
Martin Kroeker	d28b3f2776	Create Jenkinsfile for OSUOSL PowerCI	2020-08-08 18:05:20 +02:00
Martin Kroeker	ba3f7b3acf	Merge pull request #2761 from RajalakshmiSR/Makefile_err Remove extra symbol in Makefile	2020-08-08 12:20:04 +02:00

... 5 6 7 8 9 ...

5120 Commits All Branches Search

5120 Commits

All Branches