OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Bart Oldeman	bae45d94d1	scal benchmark: eliminate y, move init/timing out of loop Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.	2022-11-29 08:02:45 -05:00
Martin Kroeker	f92dd6e303	change line endings from CRLF to LF	2022-11-17 10:18:36 +01:00
Bart Oldeman	9e6b060bf3	Fix comment. It stores the pointer, not an offset (that would be an alternative approach).	2022-10-20 20:11:09 -04:00
Bart Oldeman	9959a60873	Benchmarks: align malloc'ed buffers. Benchmarks should allocate with cacheline (often 64 bytes) alignment to avoid unreliable timings. This technique, storing the offset in the byte before the pointer, doesn't require C11's aligned_alloc for compatibility with older compilers. For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).	2022-10-20 13:28:20 -04:00
Marius Hillenbrand	f119e26354	Fix flipped indices in benchmark for gemv Fixes #3439	2021-11-03 12:45:09 +01:00
Martin Kroeker	14e33e0f7e	Handle OPENBLAS_LOOPS in SYR2 benchmark	2021-07-10 21:27:53 +02:00
Martin Kroeker	4ed99c2ce3	Merge pull request #3292 from martin-frbg/syrk_limit Add lower limit for multithreading in xSYRK	2021-07-07 20:46:28 +02:00
Martin Kroeker	a4543e4918	Handle OPENBLAS_LOOP	2021-07-04 16:59:43 +02:00
Martin Kroeker	dcfc5cf714	Handle OPENBLAS_LOOPS for more stable results	2021-07-01 17:39:37 +02:00
Martin Kroeker	06e3b07ecb	Handle OPENBLAS_LOOPS and OPENBLAS_TEST options	2021-07-01 17:38:45 +02:00
Martin Kroeker	1f8bda71b9	Add OPENBLAS_LOOPS support to potrf/potrs/potri benchmark	2021-06-26 23:46:00 +02:00
Martin Kroeker	d57c681a6d	Fix compilation on older OSX versions	2021-03-26 22:29:29 +01:00
Martin Kroeker	38dcf3454b	Support timing Apple M1	2021-03-02 17:50:55 +01:00
Qiyu8	f917c26e83	Refractoring remaining benchmark cases.	2020-10-26 10:25:05 +08:00
Qiyu8	dd6ebdfdab	Refactor the performance measurement system	2020-10-23 10:32:03 +08:00
Martin Kroeker	7ae9e8960e	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:08:29 +02:00
Martin Kroeker	5464eb13ea	Change ifdef linux to __linux for C11 compatibility	2020-09-30 22:59:41 +02:00
Martin Kroeker	6f8fad87c5	Use POSIX2001 clock.gettime for higher resolution	2020-09-05 19:44:01 +02:00
Martin Kroeker	ced49466f0	Use the fortran compiler to link LAPACK-related benchmarks to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.	2020-05-29 13:35:51 +02:00
Martin Kroeker	6e270f91ec	add support for RETURN_BY_STACK semantics, e.g. clang	2020-05-29 13:29:10 +02:00
Rajalakshmi Srinivasaraghavan	ce90e2bd3f	Include shgemm in benchtest This patch is to enable benchtest for half precision gemm when BUILD_HALF is set during make.	2020-05-11 09:57:46 -05:00
l00536773	6b7ef6543a	[OpenBLAS]: benchmark error of potrf [description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F. [solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation. [dts]:	2020-04-16 10:55:10 +08:00
Martin Kroeker	717c604aeb	Merge pull request #2515 from zelong-1024/develop [OpenBLAS]: benchmark for her/her2 LEVEL2 functions	2020-03-16 21:59:55 +01:00
Martin Kroeker	ce33da4cab	Merge pull request #2513 from aaawuanjun/develop [OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile	2020-03-16 21:58:55 +01:00
l00536773	d45c53ecf1	[OpenBLAS]: benchmark for her/her2 LEVEL2 functions [description]: benchmark for her/her2 [solution]: added benchmark for her/her2, modified makefile in benchmark [dts]:	2020-03-16 11:19:05 +08:00
Martin Kroeker	c2840997db	Merge pull request #2508 from liujingjue/develop [OpenBLAS]:fix the iamax benchmark error	2020-03-14 14:21:30 +01:00
Martin Kroeker	c0649aa694	Merge pull request #2506 from xiaofengF/develop Add benchmark for SPMV and fix segmentation fault when data size >= 50000	2020-03-14 13:08:36 +01:00
wuanjun 00447568	2428dc9fd3	[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile [Description]: Solve lack of tpsv benchmark.	2020-03-14 09:11:08 +08:00
l00546269	a0a3bf7c81	[OpenBLAS]:fix the iamax benchmark error [Description]:the result for i?amax is not MFlops, it is MBytes	2020-03-13 10:58:39 +08:00
jayfely@qq.com	ae3f2c2e49	Remove cspmv and zspmv to remove the error occured in travis CI	2020-03-11 17:02:34 +08:00
jayfely@qq.com	649733ff15	Only keep spmv.goto and spmv.atlas	2020-03-11 15:48:58 +08:00
wuanjun 00447568	3e8f1c6cc5	[OpenBlas]:Add benchmark tpmv.c and modify Makefile [Description]:Solve the problem of missing tpmv.c benchmark file	2020-03-11 12:31:48 +08:00
jayfely@qq.com	2f4c5bb3a9	Update spmv.c: solve segmentation fault when m and n are larger than 50000	2020-03-11 10:30:09 +08:00
Martin Kroeker	047dfb216d	Merge pull request #2501 from jijiwawa/Fix_mistakes Fix pr #2487 error	2020-03-10 16:44:40 +01:00
s00527847	cd8871f1a1	Use the correct unit of measure	2020-03-10 19:26:06 -04:00
jayfely@qq.com	08e1d8cbae	Modify Makefile in Benchmark	2020-03-10 14:32:18 +08:00
jayfely@qq.com	ff40a4e726	Add benchmark for SPMV	2020-03-10 14:22:18 +08:00
s00548429	c5bdd21352	Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.	2020-03-09 14:59:03 +08:00
Martin Kroeker	b6a6ccbbea	Merge pull request #2495 from ZuoQ3/develop add benchmark for axpby test	2020-03-08 08:09:58 +01:00
Martin Kroeker	8b720f7365	Merge pull request #2494 from shengyang-3390/develop add benchmark for csrot and zdrot	2020-03-07 23:04:21 +01:00
Martin Kroeker	14df234edb	Merge pull request #2489 from jijiwawa/brightness Remove redundant code	2020-03-07 22:26:00 +01:00
s00527847	bbeda55b7b	add trmm.c	2020-03-07 13:09:19 -05:00
s00527847	efcf89aec7	Remove redundant code	2020-03-07 12:03:05 -05:00
zq	0c8162eba6	Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby	2020-03-07 17:48:55 +08:00
shengyang	09c7a191bd	add benchmark for csrot and zdrot modified: benchmark/Makefile modified: benchmark/rot.c	2020-03-07 15:17:49 +08:00
Martin Kroeker	dca3e0cf20	Merge pull request #2491 from chenxuqiang/hbmv_benchmark benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c	2020-03-06 15:06:42 +01:00
Martin Kroeker	c9f8db979b	Merge pull request #2490 from shengyang-3390/develop Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm	2020-03-06 15:05:55 +01:00
Martin Kroeker	97c36ca58c	Merge branch 'develop' into develop	2020-03-06 14:41:40 +01:00
Martin Kroeker	9f5a74f3c7	Merge pull request #2486 from qqqil/develop add benchmark for trsv	2020-03-06 14:30:09 +01:00

1 2 3 4

162 Commits