OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Gengxin Xie	cb3c190a3a	Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	2020-08-31 11:44:08 +08:00
Martin Kroeker	59e01b1aec	Merge pull request #2799 from RajalakshmiSR/p10_ger POWER10: Avoid setting accumulators to zero in gemm kernels	2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan	317ff27cda	POWER10: Avoid setting accumulators to zero in gemm kernels For the first iteration, it is better to use xvfger instead of xvfgerpp builtins which helps to avoid setting accumulators to zero. This helps to reduce few instructions.	2020-08-28 10:42:54 -05:00
Martin Kroeker	514a3d7d63	Merge pull request #2798 from kadler/aix-cpuid Fix compile error on AIX cpuid detection	2020-08-28 08:30:59 +02:00
Kevin Adler	085aae8bdb	Fix compile error on AIX cpuid detection In `589c74a` the cpuid detection was changed to use systemcfg, but a copy and paste error was introduced during some refactoring that caused POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist) instead of CPUTYPE_POWER6.	2020-08-27 23:10:45 -05:00
Martin Kroeker	de63675717	Add early returns and fix sign errors in workspace calculations	2020-08-27 11:25:18 +02:00
Martin Kroeker	d64cc2be81	Add early returns	2020-08-27 11:22:50 +02:00
Martin Kroeker	c9b67141f0	Add early returns	2020-08-27 11:20:31 +02:00
Martin Kroeker	6797a3a1e0	Add early returns	2020-08-27 11:15:12 +02:00
Martin Kroeker	936966a42c	Make ILAENV and xGETRF2 functions available	2020-08-27 10:59:08 +02:00
Martin Kroeker	5c6c2cd4f6	Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify Fix OMP num specify issue	2020-08-24 20:18:09 +02:00
Martin Kroeker	e54be4ba1c	Merge pull request #2792 from pkubaj/patch-1 Add aliases for armv6, armv7	2020-08-24 08:03:39 +02:00
pkubaj	48a1364e10	Add aliases for armv6, armv7 FreeBSD uses those names for 32-bit ARM variants.	2020-08-23 18:50:19 +00:00
Chen, Guobing	0c1c903f1e	Fix OMP num specify issue In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-08-24 02:45:54 +08:00
Martin Kroeker	a073fa870e	Merge pull request #2791 from martin-frbg/issue2787 Fix crashes in parallelized x86_64 ZDOT particularly on Windows	2020-08-23 19:33:03 +02:00
Martin Kroeker	b2053239fc	Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function	2020-08-23 15:08:16 +02:00
Martin Kroeker	b11bb6e728	Merge pull request #2790 from martin-frbg/issue2789 Add OpenMP dependency to pkgconfig information if needed	2020-08-23 14:42:35 +02:00
Martin Kroeker	1840bc5b52	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:55:18 +02:00
Martin Kroeker	7c0977c267	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:53:44 +02:00
Martin Kroeker	fb3d80c42a	Merge pull request #78 from xianyi/develop rebase	2020-08-22 13:52:29 +02:00
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	2020-08-20 19:54:29 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	7c1986640b	fallback from cooperlake to skylake if gcc<10	2020-08-19 20:48:39 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Martin Kroeker	430f741b30	-march=cooperlake requires gcc10	2020-08-19 17:17:53 +02:00
Martin Kroeker	6f4dc7445d	Fix typo	2020-08-19 16:36:55 +02:00
Martin Kroeker	81fbe8d088	-march=cooperlake only available in gcc >= 10	2020-08-19 16:10:15 +02:00
Martin Kroeker	bb9cf766f5	make march=cooperlake option conditional on gcc >= 10.1	2020-08-19 15:06:30 +02:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	2c72972570	Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config Do not require pkg-config to generate the *.pc file	2020-08-19 14:42:58 +02:00
Albert Ziegenhagel	6b731d917f	Do not require pkg-config to generate the *.pc file Generating the pkg-config file does not actually depend on pkg-config being available.	2020-08-18 08:48:48 +02:00
Martin Kroeker	5dcf47cd97	Merge pull request #2784 from martin-frbg/issue2783 Add fallback typedef for bfloat16 to openblas_config.h template	2020-08-17 19:06:13 +02:00
Martin Kroeker	aa286e301b	Add typedef for bfloat16 if needed	2020-08-17 15:32:14 +02:00
Martin Kroeker	9f0ef9cdfc	Merge pull request #77 from xianyi/develop rebase	2020-08-17 15:28:15 +02:00
Martin Kroeker	6bfc66663c	revert	2020-08-17 15:20:41 +02:00
Martin Kroeker	a8c6fb9e1c	revert	2020-08-17 15:20:16 +02:00
Martin Kroeker	5ec8f716cf	revert	2020-08-17 15:19:40 +02:00
Martin Kroeker	82f8a0aeba	Update .drone.yml	2020-08-15 15:46:18 +02:00
Martin Kroeker	d57d503c15	Update Makefile	2020-08-15 14:46:26 +02:00
Martin Kroeker	37ac23e8a3	Add simple MT sgemm precision test and INTERFACE64 build	2020-08-15 13:38:05 +02:00
Martin Kroeker	6a93e3b2ba	Add simple sgemm preicsion test	2020-08-15 13:33:52 +02:00
Martin Kroeker	47ce1dd08f	Update gemm64.cpp	2020-08-15 13:31:28 +02:00
Martin Kroeker	f5fcc5baec	Add trivial gemm test for multithread consistency	2020-08-15 13:30:29 +02:00
Martin Kroeker	597010a968	Fix incorrect argument to SLASET Reference-LAPACK issue 425 (and 318)	2020-08-14 00:41:56 +02:00
Martin Kroeker	d64f1ef26b	Fix incorrect argument to SLASET Reference-LAPACK issue 425 (and 318)	2020-08-14 00:40:24 +02:00
Martin Kroeker	c62aad62e5	Fix incorrect calls to DLASET Reference-LAPACK issue 429	2020-08-14 00:35:45 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	efdd237a91	Add a dedicated POWER9 build to the Travis CI (#2774 ) * Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that might end up on P8 hardware half of the time) * Bump gcc version for POWER9 build	2020-08-12 23:08:38 +02:00

... 7 8 9 10 11 ...

5235 Commits All Branches Search

5235 Commits

All Branches