OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	3eda3d34c3	Merge pull request #2641 from martin-frbg/ppcg4 Work around PPC G4 test failures	2020-06-03 16:43:46 +02:00
Martin Kroeker	a8f42ae85c	set cmake build type to Release	2020-06-03 15:28:59 +02:00
Martin Kroeker	e6e2e531bc	revert clang pragma	2020-06-03 15:16:27 +02:00
Martin Kroeker	456dc04441	Update sgemm_kernel_16x4_skylakex_3.c	2020-06-03 15:15:41 +02:00
Martin Kroeker	89323458a9	preset optimization level for apple clang	2020-06-03 15:07:25 +02:00
Martin Kroeker	e153bdeb70	Update dynamic_arch.yml	2020-06-03 13:46:43 +02:00
Martin Kroeker	c2001f7756	Make cmake build verbose to see options in use	2020-06-03 12:18:15 +02:00
Martin Kroeker	c2b3f0b3f6	Revert "keep Apple Clang from optimizing this"	2020-06-03 10:22:15 +02:00
Martin Kroeker	f16e39554d	Change PPCG4 CGEMM_M to match kernel change	2020-06-03 09:15:29 +02:00
Martin Kroeker	b1ee81228a	Change complex DOT and ROT to generic kernels and switch CGEMM in response to test failures seen in #2628 and BLAS-Tester	2020-06-03 09:13:29 +02:00
Martin Kroeker	9f7358d7dc	Keep Apple Clang from optimizing this	2020-06-03 08:52:53 +02:00
Martin Kroeker	54fa90fb25	Keep apple clang 11.0.3 from trying to optimize this (and running out of registers)	2020-06-02 17:31:45 +02:00
Leonard Lausen	5a709b8340	Print CPU info in output	2020-06-01 20:51:11 +00:00
Leonard Lausen	b31a68b835	Add Github Actions test for DYNAMIC_ARCH builds	2020-06-01 20:16:59 +00:00
Martin Kroeker	86552bf4c7	Update f_check	2020-05-31 15:22:12 +02:00
Martin Kroeker	a349d48d89	Merge pull request #2636 from martin-frbg/issue2634 Fix CMAKE build Issues on OS X	2020-05-31 15:16:09 +02:00
Martin Kroeker	4db00121dc	Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)	2020-05-31 12:39:36 +02:00
Martin Kroeker	909897f13b	Document option USE_LOCKING	2020-05-31 12:37:57 +02:00
Martin Kroeker	e79245acd9	Merge pull request #2635 from ilayn/patch-1 BUG: Fix the loop range in ZHEEQUB.f	2020-05-30 14:37:12 +02:00
Ilhan Polat	76d2612e0c	BUG: Fix the loop range in ZHEEQUB.f	2020-05-30 14:11:11 +02:00
Martin Kroeker	ced49466f0	Use the fortran compiler to link LAPACK-related benchmarks to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.	2020-05-29 13:35:51 +02:00
Martin Kroeker	6e270f91ec	add support for RETURN_BY_STACK semantics, e.g. clang	2020-05-29 13:29:10 +02:00
Martin Kroeker	200296b0f4	remove libomp from link list only for pgfortran at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default	2020-05-29 13:23:51 +02:00
Martin Kroeker	dd7a650792	Merge pull request #59 from xianyi/develop rebase	2020-05-29 13:06:25 +02:00
Martin Kroeker	4a4c50a7ce	Merge pull request #2627 from pkubaj/patch-1 Add powerpc (32-bit)	2020-05-26 08:36:24 +02:00
Martin Kroeker	d069780e63	Merge pull request #2626 from docularxu/working-gcc-version-detections make GCC version detection OS-independent	2020-05-26 08:35:58 +02:00
pkubaj	33c8790603	Add powerpc (32-bit) Only powerpc64 is present.	2020-05-25 13:14:09 +02:00
Guodong Xu	06387ac0e6	make GCC version detection OS-independent Previous design put GCC version detection inside of OSNAME 'WINNT'. However, such detections are required for 'Linux' and possibly other OS'es as well. For example, there is usage of the GCC versions in Makefile.arm64. When compiling on Linux machine, in the previous design, Markfile.arm64 will not know the correct GCC version. The fix is to move GCC version detection into common part, not wrapped by anything. Signed-off-by: Guodong Xu <guodong.xu@linaro.com>	2020-05-25 10:57:03 +00:00
Martin Kroeker	f1a18d245b	Merge pull request #2618 from craft-zhang/cortex-A53 Improve performance of SGEMM and STRMM on Arm Cortex-A53	2020-05-25 12:14:46 +02:00
张丹枫	2a3aa91354	update CONTRIBUTORS.md, adding myself	2020-05-20 22:35:26 +08:00
张丹枫	ea5bdc3f72	split cortex-a53 param to match 8x8 kernel	2020-05-20 22:34:47 +08:00
张丹枫	9df79ae9a3	update sgemm and strmm kernel selecting strategy	2020-05-20 22:26:58 +08:00
张丹枫	a1fc6041cd	use general register to speedup	2020-05-20 22:26:58 +08:00
张丹枫	edb423d772	align general register using to strmm_kernel_8x8	2020-05-20 22:26:58 +08:00
zhangdanfeng	0e6eb8c247	sgemm kernel use sgemm_kernel_8x8_cortexa53 Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>	2020-05-20 22:26:58 +08:00
zhangdanfeng	d475db29c6	optimized for cortex-a53 Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>	2020-05-20 22:26:58 +08:00
Martin Kroeker	729ac6bd4a	Merge pull request #2623 from mhillenibm/zarch_dgemm_z14 s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)	2020-05-20 14:51:04 +02:00
Marius Hillenbrand	89fe17f20e	s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 Apply our new GEMM kernel implementation, written in C with vector intrinsics, also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD instructions). As a result, we gain around 10% in performance on z15, in addition to improving maintainability. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-20 10:23:35 +02:00
Marius Hillenbrand	bdd795ed03	s390x/GEMM: replace 0-init with peeled first iteration ... since it gains another ~2% of SGEMM and DGEMM performance on z15; also, the code just called for that cleanup. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-20 10:23:35 +02:00
Martin Kroeker	e1038ea836	Merge pull request #2622 from martin-frbg/issue2619 Improve declaration of LAPACKE_get_nancheck	2020-05-19 23:07:22 +02:00
Martin Kroeker	6baa9a778d	Improve declaration of LAPACKE_get_nancheck	2020-05-19 17:59:31 +02:00
Martin Kroeker	cf46c9f84e	Merge pull request #2617 from martin-frbg/issue2616 Add workaround for unhandled gmake jobserver flags in c_check/f_check	2020-05-18 13:23:58 +02:00
Martin Kroeker	55602fce56	Ignore spurious all-numeric library names derived from mishandled jobserver flags	2020-05-17 15:28:14 +02:00
Martin Kroeker	3d5e159e7a	Ignore spurious all-numeric library names derived from mishandled jobserver flags	2020-05-17 15:26:57 +02:00
Martin Kroeker	2931feb575	Merge pull request #58 from xianyi/develop rebase	2020-05-17 15:23:32 +02:00
Martin Kroeker	20245ded5f	Merge pull request #2615 from mhillenibm/z14_alignment_hints s390x: improvise vector alignment hints for older compilers	2020-05-14 21:06:34 +02:00
Marius Hillenbrand	2840432e49	s390x: improvise vector alignment hints for older compilers Introduce inline assembly so that we can employ vector loads with alignment hints on older compilers (pre gcc-9), since these are still used in distributions such as RHEL 8 and Ubuntu 18.04 LTS. Informing the hardware about alignment can speed up vector loads. For that purpose, we can encode hints about 8-byte or 16-byte alignment of the memory operand into the opcodes. gcc-9 and newer automatically emit such hints, where applicable. Add a bit of inline assembly that achieves the same for older compilers. Since an older binutils may not know about the additional operand for the hints, we explicitly encode the opcode in hex. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-14 15:36:03 +02:00
Martin Kroeker	ea78106c71	Merge pull request #2614 from mhillenibm/gemm_vec_z14 s390x: Improve performance of SGEMM and STRMM on z14 and newer	2020-05-13 15:09:23 +02:00
Marius Hillenbrand	cb9dc36dd5	Update CONTRIBUTORS.md Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 16:14:00 +02:00
Marius Hillenbrand	1b0b4349a1	s390x/Z14: Change register blocking for SGEMM to 16x4 Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4 by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy implementations. Actually make KERNEL.Z14 more flexible, so that the change in param.h suffices. As a result, performance for SGEMM improves by around 30% on z15. On z14, FP SIMD instructions can operate on float-sized scalars in vector registers, while z13 could do that for double-sized scalars only. Thus, we can double the amount of elements of C that are held in registers in an SGEMM kernel. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 15:59:51 +02:00

... 47 48 49 50 51 ...

6983 Commits All Branches Search

6983 Commits

All Branches