OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	776d005f4c	Merge pull request #2815 from mhillenibm/clang_s390x Fix build with clang on s390x	2020-09-02 16:56:01 +02:00
Marius Hillenbrand	2ee5b899ce	s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses explicit unrolling and interleaving to improve performance. The code employs an empty inline asm statement with operands that constrain the compiler's instruction scheduling and thereby enforce proper overlapping of load and compute phases. Fix an ifdef to apply that for clang builds, as well. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	095f4e6964	s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior) gcc's default setting for floating-point expression contraction is "fast", which allows the compiler to emit fused multiply adds instead of separate multiplies and adds (amongst others). Fused multiply-adds, which assembly kernels typically apply, also bring a significant performance advantage to the C implementation for matrix-matrix multiplication on s390x. To enable that performance advantage for builds with clang, add -ffp-contract=fast to the compiler options. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	87e5bbd887	s390x: avoid variable-length arrays in struct for asm operands ... since it is not required and clang does not support that gcc extension. Instead, use a variable-length array directly for these operands. Note that, while the actual inline assembly code does not directly use these memory operands, they serve to inform the compiler that it cannot reorder reads or writes to/from the input and output data across the inline asm statements. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	b9b3265ec8	s390x: avoid inline assembly for vector loads for clang ... since clang does not support the instruction format for inline assembly and also it is not required for current versions of clang. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Marius Hillenbrand	a1616a0b86	s390x: replace nop with "nop 0" in inline assembly ... as a bandaid for building with clang until LLVM's internal assembler supports nops without operand. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Marius Hillenbrand	60ef193258	s390x: use "lghi" for immediate values to fix build with clang Some of the kernels written in assembly utilize a "load address" instruction for loading an immediate value into a register. That is both unnecessarily complex and LLVM's assembler does not understand that specific syntax. Thus, replace with the appropriate "load immediate" instruction, which is also clearer to read. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Martin Kroeker	18bfb6d6f7	Merge pull request #2813 from martin-frbg/issue2804-2 Fix for c_check misinterpreting arm64 in uname -m output as armv7	2020-09-01 23:39:46 +02:00
Martin Kroeker	e4900caa11	Fix c_check misinterpreting arm64 in uname output to mean armv7 additionla fix for upcoming OSX on ARM64 related to #2804, as suggested by fxcoudert in #2805	2020-09-01 19:54:08 +02:00
Martin Kroeker	68b1713c30	Merge pull request #2811 from martin-frbg/issue2806 Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well	2020-09-01 17:19:14 +02:00
Martin Kroeker	4074770d00	Merge pull request #2797 from martin-frbg/relafixes1 ReLAPACK fixes	2020-09-01 16:04:03 +02:00
Martin Kroeker	b87a77da02	Merge pull request #79 from xianyi/develop rebase	2020-09-01 12:03:53 +02:00
Martin Kroeker	f42e84d46c	Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808 ) * Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_ * Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.	2020-09-01 10:44:48 +02:00
Martin Kroeker	0a4c5c4c44	Merge pull request #2807 from martin-frbg/issue2804 Work around ARMV8 build-time cpu detection problems on non-Linux systems	2020-08-31 23:44:56 +02:00
Martin Kroeker	3210a42734	Report cpu as ARMV8 instead of just giving up on non-Linux hosts	2020-08-31 20:03:21 +02:00
Martin Kroeker	5feb087c05	Handle Apple labeling armv8 as arm64 rather than aarch64	2020-08-31 20:02:08 +02:00
Gengxin Xie	448152cdd8	define __AVX2__ to ensure the haswell code compiled with avx2	2020-08-31 14:39:08 +08:00
Gengxin Xie	cb3c190a3a	Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	2020-08-31 11:44:08 +08:00
Martin Kroeker	59e01b1aec	Merge pull request #2799 from RajalakshmiSR/p10_ger POWER10: Avoid setting accumulators to zero in gemm kernels	2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan	317ff27cda	POWER10: Avoid setting accumulators to zero in gemm kernels For the first iteration, it is better to use xvfger instead of xvfgerpp builtins which helps to avoid setting accumulators to zero. This helps to reduce few instructions.	2020-08-28 10:42:54 -05:00
Martin Kroeker	514a3d7d63	Merge pull request #2798 from kadler/aix-cpuid Fix compile error on AIX cpuid detection	2020-08-28 08:30:59 +02:00
Kevin Adler	085aae8bdb	Fix compile error on AIX cpuid detection In `589c74a` the cpuid detection was changed to use systemcfg, but a copy and paste error was introduced during some refactoring that caused POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist) instead of CPUTYPE_POWER6.	2020-08-27 23:10:45 -05:00
Martin Kroeker	de63675717	Add early returns and fix sign errors in workspace calculations	2020-08-27 11:25:18 +02:00
Martin Kroeker	d64cc2be81	Add early returns	2020-08-27 11:22:50 +02:00
Martin Kroeker	c9b67141f0	Add early returns	2020-08-27 11:20:31 +02:00
Martin Kroeker	6797a3a1e0	Add early returns	2020-08-27 11:15:12 +02:00
Martin Kroeker	936966a42c	Make ILAENV and xGETRF2 functions available	2020-08-27 10:59:08 +02:00
Martin Kroeker	5c6c2cd4f6	Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify Fix OMP num specify issue	2020-08-24 20:18:09 +02:00
Martin Kroeker	e54be4ba1c	Merge pull request #2792 from pkubaj/patch-1 Add aliases for armv6, armv7	2020-08-24 08:03:39 +02:00
pkubaj	48a1364e10	Add aliases for armv6, armv7 FreeBSD uses those names for 32-bit ARM variants.	2020-08-23 18:50:19 +00:00
Chen, Guobing	0c1c903f1e	Fix OMP num specify issue In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-08-24 02:45:54 +08:00
Martin Kroeker	a073fa870e	Merge pull request #2791 from martin-frbg/issue2787 Fix crashes in parallelized x86_64 ZDOT particularly on Windows	2020-08-23 19:33:03 +02:00
Martin Kroeker	b2053239fc	Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function	2020-08-23 15:08:16 +02:00
Martin Kroeker	b11bb6e728	Merge pull request #2790 from martin-frbg/issue2789 Add OpenMP dependency to pkgconfig information if needed	2020-08-23 14:42:35 +02:00
Martin Kroeker	1840bc5b52	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:55:18 +02:00
Martin Kroeker	7c0977c267	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:53:44 +02:00
Martin Kroeker	fb3d80c42a	Merge pull request #78 from xianyi/develop rebase	2020-08-22 13:52:29 +02:00
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	2020-08-20 19:54:29 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	7c1986640b	fallback from cooperlake to skylake if gcc<10	2020-08-19 20:48:39 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Martin Kroeker	430f741b30	-march=cooperlake requires gcc10	2020-08-19 17:17:53 +02:00
Martin Kroeker	6f4dc7445d	Fix typo	2020-08-19 16:36:55 +02:00
Martin Kroeker	81fbe8d088	-march=cooperlake only available in gcc >= 10	2020-08-19 16:10:15 +02:00
Martin Kroeker	bb9cf766f5	make march=cooperlake option conditional on gcc >= 10.1	2020-08-19 15:06:30 +02:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	2c72972570	Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config Do not require pkg-config to generate the *.pc file	2020-08-19 14:42:58 +02:00
Albert Ziegenhagel	6b731d917f	Do not require pkg-config to generate the *.pc file Generating the pkg-config file does not actually depend on pkg-config being available.	2020-08-18 08:48:48 +02:00

... 23 24 25 26 27 ...

6065 Commits All Branches Search

6065 Commits

All Branches