OpenBLAS

Author	SHA1	Message	Date
Arjan van de Ven	00abaa865b	Add an AVX512 enabled SDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-10 02:33:43 +00:00
Arjan van de Ven	7932ff3ea9	Add an AVX512 enabled DDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-09 03:55:52 +00:00
Martin Kroeker	4e103c822c	typo fix	2018-07-16 12:56:39 +02:00
Martin Kroeker	d2142760e0	Fix precision problem in DSDOT	2018-07-15 17:11:40 +02:00
Martin Kroeker	2fbfc64da8	Use C kernels for default c/zAXPY, xROT, c/zSWAP	2018-07-15 17:09:55 +02:00
Martin Kroeker	ba8388cee0	Merge pull request #1651 from martin-frbg/avx512-nodgemm Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:48:03 +02:00
Martin Kroeker	6e54b0a027	Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:31:06 +02:00
Martin Kroeker	40c8cbc3bf	Merge pull request #1650 from martin-frbg/avx512-nodgemm Disable the AVX512 DGEMM kernel for now	2018-06-30 13:05:46 +02:00
Martin Kroeker	f0a8dc2eec	Disable the AVX512 DGEMM kernel for now due to #1643	2018-06-30 11:34:48 +02:00
Martin Kroeker	b83e4c60c7	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:46:42 +02:00
Martin Kroeker	e344db269b	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:45:57 +02:00
Martin Kroeker	545b82efd3	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:45:00 +02:00
Martin Kroeker	e322a951fe	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:44:13 +02:00
Martin Kroeker	c628c6fa59	Merge pull request #1612 from oon3m0oo/cpus Fixed a few more unnecessary calls to num_cpu_avail.	2018-06-14 16:51:31 +02:00
Martin Kroeker	6f71c0fce4	Return a somewhat sane default value for L2 cache size if cpuid retur… (#1611 ) * Return a somewhat sane default value for L2 cache size if cpuid returned something unexpected Fixes #1610, the KVM hypervisor on Google Chromebooks returning zero for CPUID 0x80000006, causing DYNAMIC_ARCH builds of OpenBLAS to hang	2018-06-11 13:26:19 +02:00
Craig Donner	c2545b0fd6	Fixed a few more unnecessary calls to num_cpu_avail. I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.	2018-06-11 10:17:16 +01:00
Arjan van de Ven	89372e0993	Use AVX512 also for DGEMM this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range	2018-06-03 22:17:27 +00:00
Martin Kroeker	0023515733	Typo fix (misplaced parenthesis)	2018-06-03 13:22:59 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	8562d5787a	Merge pull request #1583 from martin-frbg/issue1575 Handle INCX=0,INCY=0 case	2018-05-31 21:55:26 +02:00
Martin Kroeker	7df8c4f76f	typo fix	2018-05-31 17:23:08 +02:00
Martin Kroeker	2fc748bf72	Restore optimized swap kernel now that we have a proper fix	2018-05-31 13:41:12 +02:00
Martin Kroeker	d1b7be14aa	Handle INCX=0,INCY=0 case Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.	2018-05-31 12:52:04 +02:00
Martin Kroeker	961d25e9c7	Use the new zrot.c on POWER8 for crot as well fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)	2018-05-23 22:54:39 +02:00
Martin Kroeker	f5959f2543	Merge pull request #1567 from martin-frbg/mipstrmm Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"	2018-05-17 20:50:23 +02:00
Martin Kroeker	82012b960b	Revert " Switch mips32 target to USE_TRMM to fix complex TRMM" ... as it was just a silly workaround for the issue seen in #1563, caused by #1419	2018-05-17 20:30:03 +02:00
Martin Kroeker	8dd3515fa2	Merge pull request #1565 from martin-frbg/mipstypo Remove extraneous brace from previous commit of mips dsdot fix	2018-05-17 20:22:58 +02:00
Martin Kroeker	95f7f0229c	Remove extraneous brace from previous commit	2018-05-17 18:43:59 +02:00
Martin Kroeker	5082fe4306	Merge pull request #1564 from martin-frbg/issue1563 Revert changes from PR#1419	2018-05-17 14:04:13 +02:00
Martin Kroeker	7a7619af6d	Revert changes from PR#1419 at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563	2018-05-17 11:40:08 +02:00
Martin Kroeker	893b535540	Use correct data type for initializers of v2f64, v4f32 Fixes #1561	2018-05-15 14:42:12 +02:00
Martin Kroeker	018f2dad27	Switch mips32 target to USE_TRMM to fix complex TRMM	2018-05-02 20:25:32 +02:00
Martin Kroeker	9d5098dbc9	Add MIPS 1004K target (Mediatek MT7621 SOC)	2018-05-02 20:20:44 +02:00
Martin Kroeker	954f1832de	Merge pull request #1540 from martin-frbg/mips32-zasum Fix typo in MIPS P5600 complex ASUM code selection	2018-04-25 23:23:00 +02:00
Martin Kroeker	941ad280a8	Fix typo in MIPS P5600 complex ASUM code selection	2018-04-25 22:50:10 +02:00
Martin Kroeker	1da365312a	Merge pull request #1538 from martin-frbg/arm7utest Fix handling of zero INCX, INCY in ArmV7 AXPY and ROT	2018-04-25 08:38:58 +02:00
Martin Kroeker	2d0929fa7c	Move the test for zero incx,incy in ARMV7 ROT to pass the related utest (see #1469)	2018-04-24 22:43:00 +02:00
Martin Kroeker	125343cc88	Drop test for zero incx,incy in armv7 AXPY ...to pass the related utest (see #1469)	2018-04-24 22:39:50 +02:00
Martin Kroeker	8a3b6fa108	Use generic zrot.c on ppc64/POWER6 to work around utest failure from … (#1535 ) * Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469	2018-04-23 19:05:49 +02:00
Martin Kroeker	9c5518319a	Revert "Fix 32bit HASWELL builds"	2018-04-22 20:20:04 +02:00
Martin Kroeker	2ca0faf495	Merge pull request #1515 from martin-frbg/mipsdot Correct precision of mips dsdot	2018-04-11 08:21:25 +02:00
Martin Kroeker	0fe434598b	Fix precision of mips dsdot	2018-04-10 23:30:59 +02:00
Martin Kroeker	c7b55b6082	Merge pull request #1499 from quickwritereader/develop Implemented missing vsx simd kernels for power8 blas1/2 double. z13 modifications	2018-03-27 21:43:23 +02:00
Martin Kroeker	840e01061f	Merge pull request #1491 from martin-frbg/ddot_mt Add multithreading support for Haswell DDOT	2018-03-27 21:43:05 +02:00
QWR QWR	28ca97015d	power8:Added initial zgemv_(t\|n) ,i(d\|z)amax,i(d\|z)amin,dgemv_t(transposed),zrot z13: improved zgemv_(t\|n)_4,zscal,zaxpy	2018-03-27 14:54:41 +00:00
Martin Kroeker	6a6ffaff1e	Merge pull request #1494 from martin-frbg/x86_dsdot Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT	2018-03-17 15:26:47 +01:00
Martin Kroeker	28ac9ea5a6	Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT to resolve dsdot utest failure seen in #1492	2018-03-17 13:49:15 +01:00
Martin Kroeker	a55694dd5b	Declare dot_compute static to avoid conflicts in multiarch builds	2018-03-16 22:23:36 +01:00
Martin Kroeker	85a41e9cdb	Add multithreading support for Haswell DDOT copied from ashwinyes' implementation in dot_thunderx2t99.c	2018-03-16 16:58:47 +01:00
Martin Kroeker	81215711a2	Re-enable DAXPY microkernels for x86_64 as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add	2018-03-04 19:37:03 +01:00

1 2 3 4 5 ...

999 Commits