OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	de4d5646eb	Merge pull request #3857 from martin-frbg/issue3856 Fix stride in shortcut path of C/ZSYR for small N	2022-12-09 08:25:54 +01:00
Martin Kroeker	f10c266b4d	Fix stride in shortcut path for small N	2022-12-08 21:02:01 +01:00
Martin Kroeker	c17b5ce75c	Merge pull request #3854 from martin-frbg/travis-gcc8arm Travis CI: Add a DYNAMIC_ARCH build on Neoverse using older gcc8	2022-12-07 18:01:07 +01:00
Martin Kroeker	8531dbaa25	Update .travis.yml	2022-12-07 15:04:13 +01:00
Martin Kroeker	ce1a9ae8bd	Add a DYNAMIC_ARCH build on Neoverse using older gcc8	2022-12-07 14:28:55 +01:00
Martin Kroeker	aab9c410ef	Merge pull request #3853 from Mousius/fix-sve Remove SVE from Arm(R) Neoverse(TM) N1 CPU in Makefile	2022-12-06 23:55:33 +01:00
Bart Oldeman	60e49b851c	Fix typo in clobber list, should be xmm14 instead of ymm14.	2022-12-06 16:30:46 -05:00
Chris Sidebottom	f76e3de3a5	Remove SVE from Arm(R) Neoverse(TM) N1 CPU in Makefile I incorrectly added `+sve` to the Neoverse(TM) N1 CPUs GCC parameters, which doesn't support SVE - this results in failed builds when using a compiler that doesn't support `-mtune=neoverse-n1` which appears to hide the mistake.	2022-12-06 21:23:07 +00:00
Bart Oldeman	4afe1439a1	Fix skylake fallback kernel name for old compilers.	2022-12-06 16:09:54 -05:00
Bart Oldeman	5ceca1a4d8	Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S. This code follows dscal as closely as possible, except for the inc_x > 1 code for which a plain C loop is used much like the one in cscal.c, instead of an adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't better than the plain C loop).	2022-12-06 14:05:49 -05:00
lilianhuang	729af6406f	bugfix for sbgemm_ncopy_8_neoversen2	2022-12-05 05:10:18 -05:00
Martin Kroeker	042e3c0e7c	Merge pull request #3848 from bartoldeman/dscal-haswell-ymm dscal: use ymm registers in Haswell microkernel	2022-12-05 08:56:08 +01:00
Martin Kroeker	02763077d6	Merge pull request #3851 from martin-frbg/lapack773 Allocate work array in LAPACKE ?TGSEN when ijob is zero (Reference-LAPACK PR 733)	2022-12-04 14:52:36 +01:00
Martin Kroeker	d59dcd7b16	Allocate work array when ijob is zero (Reference-LAPACK PR 733)	2022-12-04 11:43:24 +01:00
Martin Kroeker	14aef9400d	Merge pull request #3850 from martin-frbg/lapack765 Check for NaN in ?GECON (Reference-LAPACK PR765)	2022-12-04 11:33:57 +01:00
Martin Kroeker	9b96990e5d	Check for NaN in ?GECON (Reference-LAPACK PR765)	2022-12-03 20:33:27 +01:00
Martin Kroeker	1c1e0682a0	Merge pull request #3849 from martin-frbg/lapack769 Fix uninitialized M in quick return from D/SLARRD (Reference-LAPACK PR769)	2022-12-03 20:08:20 +01:00
Martin Kroeker	00cc78cfba	Fix uninitialized M in quick return (Reference-LAPACK 769)	2022-12-03 16:19:20 +01:00
Martin Kroeker	9307b0fabc	Fix uninitialized M in quick return (Reference-LAPACK 769)	2022-12-03 16:17:54 +01:00
Martin Kroeker	0a24f631e9	Merge pull request #3844 from Mousius/switch-ratio-16 Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs	2022-12-02 12:48:43 +01:00
Martin Kroeker	65984fbe68	Merge pull request #3847 from bartoldeman/scal-benchmark scal benchmark: eliminate y, move init/timing out of loop	2022-12-02 11:51:50 +01:00
Martin Kroeker	f6f0d13b9f	Merge pull request #3842 from Mousius/sve-dot Add SVE implementation for sdot/ddot	2022-12-02 08:30:51 +01:00
Bart Oldeman	5c3169ecd8	dscal: use ymm registers in Haswell microkernel Using 256-bit registers in dscal makes this microkernel consistent with cscal and zscal, and generally doubles performance if the vector fits in L1 cache.	2022-12-01 07:48:05 -05:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
Martin Kroeker	b6a4ef98b9	Merge pull request #3845 from Mousius/asimd-dot-opt Remove unnecessary instructions from Advanced SIMD dot	2022-11-30 21:07:30 +01:00
Chris Sidebottom	2fb096315e	Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs From testing this yields better results than the default of `2`.	2022-11-30 09:35:38 +00:00
Bart Oldeman	bae45d94d1	scal benchmark: eliminate y, move init/timing out of loop Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.	2022-11-29 08:02:45 -05:00
lilianhuang	fdac8a97c1	Add sbgemm_ncopy_8 and sbgemm_tcopy_4	2022-11-29 04:46:14 -05:00
lilianhuang	135718eafc	Improve the performance of sbgemm_tcopy on neoversen2	2022-11-28 04:17:54 -05:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00
Martin Kroeker	e9a911fb9f	Merge pull request #3841 from martin-frbg/lapack755+764 Fix SLATRS3 and CLATRS3 tests in TESTING/LIN (Reference-LAPACK PRs 755+764)	2022-11-23 22:38:06 +01:00
Martin Kroeker	bf0e8d67b5	Merge pull request #3840 from martin-frbg/lapack760 Fix typo in EIG tests and spurious return in lapacke_?tz_trans utility (Reference-LAPACK PR760)	2022-11-23 19:16:25 +01:00
Martin Kroeker	a5470521ee	Fix array indexation in copy, and fix test (Reference-LAPACK PR764)	2022-11-23 15:31:25 +01:00
Martin Kroeker	b0393ea4e1	Fix test (Reference-LAPACK PR764)	2022-11-23 15:27:46 +01:00
Martin Kroeker	0d26f1a4c7	Fix wrong indexation in test (Reference-LAPACK PR755)	2022-11-23 15:22:27 +01:00
Martin Kroeker	19fd2d7f00	Use LSAME for character comparison (Reference-LAPACK PR755)	2022-11-23 15:19:07 +01:00
Martin Kroeker	663bf68dbd	Merge pull request #3839 from martin-frbg/lapack758 Fix array dimesion in complex SYL01 test (Reference-LAPACK PR758)	2022-11-23 14:57:56 +01:00
Martin Kroeker	c2ba4e6249	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:43:34 +01:00
Martin Kroeker	74962c7f53	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:42:29 +01:00
Martin Kroeker	d952cbf7bc	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:41:50 +01:00
Martin Kroeker	7694ff495f	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:40:59 +01:00
Martin Kroeker	825ae316e2	Fix typo in EXTERNAL (Reference-LAPACK PR760)	2022-11-23 10:36:10 +01:00
Martin Kroeker	730ed549e6	Fix typo in EXTERNAL (Reference-LAPACK PR760)	2022-11-23 10:35:23 +01:00
Martin Kroeker	bc3393f703	Fix array dimension (Reference-LAPACK 758)	2022-11-23 10:31:18 +01:00
Martin Kroeker	0b2f8dabbf	Fix array dimension (Reference-LAPACK 758)	2022-11-23 10:30:35 +01:00
Martin Kroeker	b4c9228441	Merge pull request #3838 from martin-frbg/lapa311 Update the version number of the included LAPACK to 3.11.0	2022-11-22 17:39:51 +01:00
Martin Kroeker	e6e2a63650	Update LAPACK version number to 3.11.0	2022-11-22 14:02:21 +01:00
Martin Kroeker	8408357bab	Update LAPACK version number to 3.11.0	2022-11-22 14:01:48 +01:00
Martin Kroeker	ba8fb8b4b2	Merge pull request #3837 from martin-frbg/lapack655+697 Improve convergence of LAPACK ?LAED4 and fix a bug in DORCSD2BY1 (Reference-LAPACK PRs 655+697)	2022-11-22 13:51:57 +01:00

... 11 12 13 14 15 ...

7452 Commits All Branches Search

7452 Commits

All Branches