OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	47691c031f	Use Haswell optimizations for Zen as well	2021-02-11 09:26:15 +01:00
Martin Kroeker	ce7ddd8921	Use Haswell optimizations for Zen as well	2021-02-11 09:25:36 +01:00
Martin Kroeker	950c047b49	Use Haswell optimizations for Zen as well	2021-02-11 09:24:51 +01:00
Martin Kroeker	46509953a9	Use Haswell optimizations for Zen as well	2021-02-11 09:24:16 +01:00
Martin Kroeker	db348dcff2	Enable optimized srot/drot kernels from Haswell	2021-02-11 09:23:05 +01:00
Rajalakshmi Srinivasaraghavan	2056ffc227	Optimize cscal function for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2021-01-29 13:51:43 -06:00
Rajalakshmi Srinivasaraghavan	3ede843d50	Optimize s/dscal function for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2021-01-24 07:48:28 -06:00
Martin Kroeker	69a5558203	Merge pull request #3059 from Guobing-Chen/BF16_gemm Initial code for Cooperlake BF16 GEMM kernel	2021-01-23 19:08:05 +01:00
Martin Kroeker	d6905403e3	Merge pull request #3068 from alexhenrie/scan-build scan-build fixes	2021-01-23 19:06:29 +01:00
Rajalakshmi Srinivasaraghavan	439b93f6d2	Optimize s/drot function for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2021-01-21 13:24:45 -06:00
Rajalakshmi Srinivasaraghavan	eff7c9166e	Optimize cdot function for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2021-01-15 13:40:34 -06:00
Alex Henrie	202fc9e8ed	Fix uninitialized argument value in dasum_k	2021-01-14 19:40:31 -07:00
Martin Kroeker	e378b24487	Merge pull request #3067 from albertziegenhagel/fix-generic-cmake Fix building "generic" TRMM kernel with CMake	2021-01-14 21:35:19 +01:00
Albert Ziegenhagel	e3f4063683	Fix building "generic" TRMM kernel with CMake The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected. This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.	2021-01-14 10:00:49 +01:00
Martin Kroeker	b716c0ef01	Add workaround for NVIDIA HPC	2021-01-12 16:51:35 +01:00
Martin Kroeker	2efa3b70dc	Add workaround for NVIDIA HPC	2021-01-12 16:49:39 +01:00
Martin Kroeker	49959d4f1c	Add workaround for NVIDIA HPC	2021-01-12 16:47:15 +01:00
Martin Kroeker	0f27a03607	Add workaround for NVIDIA HPC mishandling of the asm DOT kernels	2021-01-12 16:39:35 +01:00
Martin Kroeker	c2a8ebfe69	Add workaround for NVIDIA HPC mishandling of the asm DOT kernels	2021-01-12 16:38:51 +01:00
Martin Kroeker	43aac5bacc	Support NVIDIA HPC compiler	2021-01-12 16:36:12 +01:00
Chen, Guobing	b0beb0b1ca	Initial code for Cooperlake BF16 GEMM kernel	2021-01-11 02:15:21 +08:00
Rajalakshmi Srinivasaraghavan	601b711c78	Optimize swap function for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2021-01-08 08:01:36 -06:00
Ashwin Sekhar T K	1b2508362b	arm64: Fix nrm2 for input vectors with Inf Fix double precision nrm2 kernels returning NaN when the input vectors contain Inf/-Inf.	2021-01-01 02:49:37 -08:00
Martin Kroeker	3559c5d7a2	Merge pull request #3048 from martin-frbg/issue2998 Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1	2020-12-21 13:30:08 +01:00
Martin Kroeker	8631e2976a	Temporarily revert to the old nrm2 kernels	2020-12-21 07:45:13 +01:00
Martin Kroeker	2768bc1764	Temporarily revert to the old nrm2 kernels	2020-12-21 07:42:51 +01:00
Martin Kroeker	6f4698ee1f	Temporarily revert to the old nrm2 kernel	2020-12-21 07:41:18 +01:00
Martin Kroeker	114eb159a4	Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA	2020-12-19 22:15:58 +01:00
Martin Kroeker	005cce5507	Amend SkylakeX options to support the NVIDIA compiler	2020-12-19 22:11:49 +01:00
Martin Kroeker	c73d8ee40d	Conditionally add -mfma to compiler options where needed	2020-12-17 11:34:05 +01:00
Rajalakshmi Srinivasaraghavan	2fb11f873b	POWER10: Improve copy performance This patch aligns the stores to 32 byte boundary for scopy and dcopy before entering into vector pair loop. For ccopy, changed the store instructions to stxv to improve performance of unaligned cases.	2020-12-13 10:41:45 -06:00
Martin Kroeker	043128cbe5	Merge pull request #3029 from RajalakshmiSR/axpyp10 POWER10: Improve axpy performance	2020-12-10 22:49:28 +01:00
Martin Kroeker	3331ca492d	Merge pull request #3021 from austinpagan/trsm_p10 POWER: Added special unrolled vectorized versions of "Solve" for specific si…	2020-12-10 19:42:54 +01:00
Rajalakshmi Srinivasaraghavan	346e30a46a	POWER10: Improve axpy performance This patch aligns the stores to 32 byte boundary for saxpy and daxpy before entering into vector pair loop. Fox caxpy, changed the store instructions to stxv to improve performance of unaligned cases.	2020-12-10 11:51:42 -06:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	7f11e33e8d	Merge pull request #3025 from TiredNotTear/develop MIPS: Fix two bugs	2020-12-08 09:39:27 +01:00
Martin Kroeker	53e0837809	Merge pull request #3022 from jinboson/develop Fix test errors reported by cblas_cgemm & cblas_ctrmm	2020-12-07 08:09:11 +01:00
Hao Chen	ad38bd0e89	Fix failed cgemv and zgemv test case after using msa optimization The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment. When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2. This patch updates the value of OP2 and passes the corresponding test.	2020-12-07 10:25:01 +08:00
Hao Chen	47b639cc9b	Fix failed sswap and dswap case by using msa optimization The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet. When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong. This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.	2020-12-07 10:24:49 +08:00
Martin Kroeker	b660008c7e	Work around DOT and SWAP test failures	2020-12-06 19:15:37 +01:00
Martin Kroeker	f8346603cf	Fix compilation with SolarisStudio	2020-12-06 19:14:16 +01:00
Jin Bo	65de6f5957	Fix test errors reported by cblas_cgemm & cblas_ctrmm The file cgemm_kernel_8x4_msa.c holds the MSA optimization codes of cblas_cgemm and cblas_ctrmm. It defines two macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1 array index in the two macros should be 0 and 1.	2020-12-05 15:08:17 +08:00
Gordon Fossum	213c0e7abb	Added special unrolled vectorized versions of "Solve" for specific sizes, in DTRSM and STRSM, to improve performance in Power9 and Power10.	2020-12-04 17:07:06 -06:00
Martin Kroeker	441c08c9ff	Merge pull request #3016 from xiegengxin/complex-asum Improve the performance of zasum and casum with AVX512 intrinsic	2020-12-04 22:07:16 +01:00
Gengxin Xie	0cb7a403b2	fix error declare function blas_level1_thread_with_return_value	2020-12-02 09:51:52 +08:00
Gengxin Xie	b766c1e9bb	Improve the performance of zasum and casum with AVX512 intrinsic	2020-12-01 16:49:26 +08:00
Rajalakshmi Srinivasaraghavan	7d46e31de1	POWER10: Optimize dgemv_n Handling as 4x8 with vector pairs gives better performance than existing code in POWER10.	2020-11-29 15:28:28 -06:00
Martin Kroeker	f1bf040b25	Merge pull request #2988 from xiegengxin/smp-asum Improve the performance of dasum and sasum when SMP is defined	2020-11-22 12:24:13 +01:00
Xianyi Zhang	7037849498	Merge branch 'develop' into risc-v	2020-11-22 16:04:50 +08:00
Martin Kroeker	7e9cb39a25	Merge pull request #2981 from Qiyu8/fix-sum Fix sum optimize issues	2020-11-16 08:40:46 +01:00

1 2 3 4 5 ...

1615 Commits