OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	83de62c20d	Merge pull request #3026 from martin-frbg/revert747 Revert PR747 - SYRK parameter changes for Haswell and related targets	2020-12-10 16:29:41 +01:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	d71fe4ed4e	Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)	2020-12-08 21:07:57 +01:00
Martin Kroeker	b0b14f4e9b	Change comments to C style for compatibility	2020-12-06 19:12:02 +01:00
Rajalakshmi Srinivasaraghavan	41fe6e864e	POWER10: Update param.h Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps in improving performance ~10% for DGEMM.	2020-12-03 14:40:11 -06:00
Xianyi Zhang	fc35b72ae1	Refs #2899 Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910	2020-11-10 09:38:04 +08:00
Xianyi Zhang	913cc9a4ca	Merge branch 'develop' into risc-v	2020-11-10 09:18:25 +08:00
Rajalakshmi Srinivasaraghavan	dd7a9cc5bf	POWER10: Change dgemm unroll factors Changing the unroll factors for dgemm to 8 shows improved performance with POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.	2020-10-31 18:28:57 -05:00
Zhang Xianyi	d7ba7679b6	Merge branch 'develop' into risc-v	2020-10-16 23:27:38 +08:00
damonyu	ef8e7d0279	Add the support for RISC-V Vector. Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266	2020-10-15 16:09:02 +08:00
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:49:22 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Marius Hillenbrand	e115c97e05	s390x/SGEMM: adjust default P and Q to multiples of M We recently changed the register blocking for SGEMM on s390x to 16x4. However, we did not adjust Q to a multiple of 16 and thus fell back to the 8x4 kernel at each block's margin, without need. Adjust P and Q to multiples of 16 to employ the faster 16x4 kernel for complete full-sized blocks. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-08-11 12:56:46 +02:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	bd2498c886	Use POWER6 GEMM parameters on 32bit POWER8	2020-07-14 18:07:58 +02:00
Rajalakshmi Srinivasaraghavan	d23419accc	powerpc: Optimized SHGEMM kernel for POWER10 This patch introduces new optimized version of SHGEMM kernel using power10 Matrix-Multiply Assist (MMA) feature introduced in POWER ISA v3.1. This patch makes use of new POWER10 compute instructions for matrix multiplication operation. Tested on simulator and there are no new test failures.	2020-06-25 22:19:08 -05:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Martin Kroeker	f16e39554d	Change PPCG4 CGEMM_M to match kernel change	2020-06-03 09:15:29 +02:00
张丹枫	ea5bdc3f72	split cortex-a53 param to match 8x8 kernel	2020-05-20 22:34:47 +08:00
Marius Hillenbrand	1b0b4349a1	s390x/Z14: Change register blocking for SGEMM to 16x4 Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4 by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy implementations. Actually make KERNEL.Z14 more flexible, so that the change in param.h suffices. As a result, performance for SGEMM improves by around 30% on z15. On z14, FP SIMD instructions can operate on float-sized scalars in vector registers, while z13 could do that for double-sized scalars only. Thus, we can double the amount of elements of C that are held in registers in an SGEMM kernel. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 15:59:51 +02:00
Martin Kroeker	03ff213c51	Increase POWER8 ZGEMM_R and use same R values for POWER9 fixes lapack-test zger failures seen in #2299 after application of my PR #2551	2020-04-24 21:46:54 +02:00
Martin Kroeker	00172d440b	Typo fix in MIPS24K addition	2020-04-18 21:16:49 +02:00
Martin Kroeker	61bbae3ac1	Handle MIPS24K like P5600 and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)	2020-04-18 21:09:32 +02:00
Martin Kroeker	a33d177430	Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9 As shown in #2538, default buffersizes on some platforms were smaller than required in memory.c and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used	2020-04-12 19:44:48 +02:00
Martin Kroeker	567d2760e6	Merge pull request #2520 from wjc404/develop Fix avx512 sgemm performance bug when ldc is a multiple of 1024	2020-03-30 20:15:59 +02:00
wjc404	64daad4365	Update param.h	2020-03-20 21:46:18 +00:00
Martin Kroeker	ea8eec5d17	Merge pull request #2422 from wjc404/develop Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM	2020-02-29 19:07:35 +01:00
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00
Xianyi Zhang	265ab484c8	Change default RISC-V 64-bit corename to RISCV64_GENERIC e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc	2020-02-27 14:46:15 +08:00
Xianyi Zhang	4aa2d89217	Merge branch 'develop' into risc-v	2020-02-27 13:53:49 +08:00
Martin Kroeker	8164fd1328	Always assume server-class cpu count for TSV110 and EMAG8180	2020-02-26 22:19:57 +01:00
Martin Kroeker	71e5669c3e	Add preliminary support for EMAG8180 ARMV8 processor	2020-02-19 18:57:26 +01:00
wjc404	b0558c11b9	Update param.h	2020-02-16 23:01:31 +08:00
wjc404	83b6be7976	Update param.h	2020-02-04 19:55:26 +08:00
wjc404	f3f969f681	Update param.h	2020-02-03 21:34:12 +08:00
Wang,Long	fbf4f48f4a	fix a few performance drop in some matrix size per data type Signed-off-by: Wang,Long <long1.wang@intel.com>	2020-01-22 15:15:04 +00:00
wjc404	1c67567008	improve skylakex paralleled sgemm performance	2020-01-13 16:26:03 +08:00
wjc404	b7b408a120	optimize AVX2 SGEMM	2020-01-06 12:16:09 +08:00
wjc404	6362c34ee6	Update param.h	2019-12-30 16:08:19 +08:00
wjc404	64639f440f	Update param.h	2019-12-27 18:06:42 +08:00
wjc404	611445c7f8	Update param.h	2019-12-23 23:44:55 +08:00
wjc404	105e26e12a	Adjust Haswell ZGEMM blocking parameters	2019-12-21 14:38:51 +08:00
wjc404	e20709e976	Update param.h	2019-11-28 19:57:50 +08:00
Martin Kroeker	6082e556cd	Use "generic" S/CGEMM unroll M on big-endian PPC970 as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian	2019-11-17 15:10:26 +01:00
Martin Kroeker	4c6a457358	Merge pull request #2300 from wjc404/develop Optimize SGEMM on SKYLAKEX CPUs	2019-11-06 07:27:33 +01:00
wjc404	ae43b75a6a	Add files via upload	2019-11-02 10:09:19 +08:00
wjc404	274ff5cdb8	update sgemm_q on skylakex cpus	2019-11-01 23:59:18 +08:00
Martin Kroeker	df857551c0	Remove special parameter set for obsolete IOS/ARMV8 workaround	2019-10-25 23:07:00 +02:00
wjc404	5da9484d93	Add files via upload	2019-10-16 02:01:13 +08:00
Martin Kroeker	6b83079368	Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267 ) There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.	2019-09-25 23:13:24 +02:00
Martin Kroeker	6b6c9b1441	Merge pull request #2172 from quickwritereader/develop power9 cgemm/ctrmm. new sgemm 8x16	2019-07-01 21:06:02 +02:00
AbdelRauf	a97b301aaa	cgemm/ctrmm power9	2019-07-01 14:07:54 +00:00
pkubaj	7c7505a778	Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too.	2019-06-28 10:31:45 +00:00
AbdelRauf	cdbfb891da	new sgemm 8x16	2019-06-17 15:33:38 +00:00
AbdelRauf	d0c3543c3f	power9 zgemm ztrmm optimized	2019-06-05 20:07:16 +00:00
AbdelRauf	a469b32cf4	sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52	2019-06-04 07:11:30 +00:00
AbdelRauf	8fe794f059	improved zgemm power9 based on power8	2019-05-30 15:31:25 +00:00
xoviat	6cfd6195c5	param: define constant as blaslong to prevent overflow	2019-05-05 13:10:36 -05:00
AbdelRauf	628b335e83	Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop	2019-04-29 08:57:44 +00:00
AbdelRauf	0f105dd8a5	sgemm/strmm	2019-04-29 08:49:50 +00:00
Martin Kroeker	7c51cc8527	Merge branch 'develop' into develop	2019-03-29 19:36:29 +01:00
AbdelRauf	853a18bc17	power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself	2019-03-29 15:49:40 +00:00
Martin Kroeker	03d7110900	Merge pull request #2042 from maomao194313/develop add TARGET support for HiSilicon tsv110 CPUs	2019-03-12 22:57:39 +01:00
maomao194313	7e3eb9b25d	make DYNAMIC_ARCH=1 package work on TSV110	2019-03-12 16:11:01 +08:00
ken-cunningham-webuse	b0c714ef60	param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^	2019-03-07 12:03:25 -08:00
Martin Kroeker	bdc73a49e0	Add parameters for Z14 from patch provided by aarnez in #991	2019-01-31 21:14:37 +01:00
Martin Kroeker	bbfdd6c0fe	Increase Zen SWITCH_RATIO to 16 following GEMM benchmarks on Ryzen2700X. For #1464	2019-01-19 23:01:31 +01:00
Arjan van de Ven	b28f75cd7e	set GEMM_PREFERED_SIZE for HASWELL Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the threading code does to make it a nice multiple of the SIMD kernel size	2018-12-16 23:09:27 +00:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Renato Golin	310ea55f29	Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for all cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.	2018-11-19 16:41:49 +00:00
Arjan van de Ven	5b708e5eb1	sgemm/dgemm: add a way for an arch kernel to specify prefered sizes The current gemm threading code can make very unfortunate choices, for example on my 10 core system a 1024x1024x1024 matrix multiply ends up chunking into blocks of 102... which is not a vector friendly size and performance ends up horrible. this patch adds a helper define where an architecture can specify a preference for size multiples. This is different from existing defines that are minimum sizes and such. The performance increase with this patch for the 1024x1024x1024 sgemm is 2.3x (!!)	2018-11-01 01:43:20 +00:00
Ashwin Sekhar T K	d50abc8903	ARM64: Move parameters from parameter.c to param.h Remove the runtime setting of P, Q, R parameters for targets ARMV8, THUNDERX2T99. Instead set them as constants in param.h at compile time.	2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K	21f46a1cf2	ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Currently the generic ARMV8 target uses C implementations for many routines. Replace these with the neon implementations written for THUNDERX2T99 target which are upto 6x faster for certain routines.	2018-10-17 10:44:37 -07:00
Martin Kroeker	4cf7315a5d	Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS	2018-09-06 21:41:54 +02:00
Arjan van de Ven	6eb4b9ae7c	Tune HASWELL SWITCH_RATIO as well Similar to the SKYLAKEX patch, 32 seems to work best (much better than 4 or 16) Before (4) Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 15554.3 7.2 0.2% 30353.8 3.7 0.3% 64 x 64 30346.8 8.7 1.6% 63495.0 4.1 -0.1% 65 x 65 81668.1 3.4 -123.3% 82705.2 3.3 -21.2% 80 x 80 105045.9 4.9 -95.5% 115226.0 4.5 -2.2% 96 x 96 152461.2 5.8 -74.3% 148156.3 6.0 16.4% 112 x 112 188505.2 7.5 -42.2% 171187.3 8.2 36.4% 128 x 128 257884.0 8.1 -39.5% 224764.8 9.3 46.0% Intermediate (16) Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 15565.7 7.2 0.2% 30378.9 3.7 0.2% 64 x 64 30430.2 8.7 1.3% 63046.4 4.2 0.6% 65 x 65 27306.0 10.1 25.3% 38879.2 7.1 43.0% 80 x 80 51008.7 10.1 5.1% 61007.6 8.4 45.9% 96 x 96 70856.7 12.5 19.0% 83403.1 10.6 53.0% 112 x 112 84769.9 16.6 36.0% 99920.1 14.1 62.9% 128 x 128 84213.2 25.0 54.5% 113024.2 18.6 72.8% After (32) Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 15537.3 7.2 0.3% 30537.0 3.6 -0.3% 64 x 64 30352.7 8.7 1.6% 62597.8 4.2 1.3% 65 x 65 36857.0 7.5 -0.8% 56167.6 4.9 17.7% 80 x 80 42552.6 12.1 20.8% 69536.7 7.4 38.3% 96 x 96 52101.5 17.1 40.5% 91016.1 9.7 48.7% 112 x 112 63853.7 22.1 51.8% 110507.4 12.7 58.9% 128 x 128 73966.1 28.4 60.0% 163146.4 12.9 60.8%	2018-06-17 17:08:36 +00:00
Arjan van de Ven	5c6f008365	Tune param.h for SkylakeX param.h defines a per-platform SWITCH_RATIO, which is used as a measure for how fine grained the blocks for gemm need to be split up. Many platforms define this to 4. The reality is that the gemm low level implementation for SkylakeX likes bigger blocks due to the nature of SIMD... by tuning the SWITCH_RATIO to 32 the threading performance improves significantly: Before Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 10756.0 10.5 -0.5% 18296.7 6.1 -1.7% 64 x 64 20490.0 12.9 1.4% 40615.0 6.5 0.0% 65 x 65 83528.3 3.3 -210.9% 96319.0 2.9 -83.3% 80 x 80 101453.5 5.1 -166.3% 128021.7 4.0 -76.6% 96 x 96 149795.1 5.9 -143.1% 168059.4 5.3 -47.4% 112 x 112 191481.2 7.3 -105.8% 204165.0 6.9 -14.6% 128 x 128 265019.2 7.9 -99.0% 272006.4 7.7 -5.3% After Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 10666.3 10.6 0.4% 18236.9 6.2 -1.4% 64 x 64 20410.1 13.0 1.8% 39925.8 6.6 1.7% 65 x 65 34983.0 7.9 -30.2% 51494.6 5.4 2.0% 80 x 80 39769.1 13.0 -4.4% 63805.2 8.1 12.0% 96 x 96 45169.6 19.7 26.7% 80065.8 11.1 29.8% 112 x 112 57026.1 24.7 38.7% 99535.5 14.2 44.1% 128 x 128 64789.8 32.5 51.3% 117407.2 17.9 54.6% With this change, threading starts to be a win already at 96x96	2018-06-17 15:47:50 +00:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	d94d7baf7e	Add mips32r2 api target	2018-05-02 20:17:26 +02:00
Jerry Zhao	0ee395db35	Fixed TRMM and SYMM for RISCV	2018-04-18 18:03:32 -07:00
Jerry Zhao	c167a3d6f4	Added RISCV build	2018-04-16 14:08:31 -07:00
Shivraj Patil	e3d844b062	Added mips I6500 core Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2017-09-22 11:57:43 +05:30
Gian-Carlo Pascutto	832a272784	Revert Zen param.h to Haswell values (instead of Excavator).	2017-04-18 12:40:25 +02:00
Denis Steckelmacher	c9ff735da6	Add ZEN support (tested for auto-detected static backend)	2017-03-19 15:32:50 +01:00
Martin Kroeker	cd135e2b59	Merge pull request #1130 from quickwritereader/develop Blas 3 for single precision	2017-03-15 10:00:52 +01:00
Abdurrauf	08786c4b95	strmm and ctrmm	2017-03-13 01:23:16 +04:00
Abdurrauf	82e80fa82b	initial strmm(sgemm). not tuned yet	2017-03-06 04:27:40 +04:00
Martin Kroeker	ffc1d6c468	Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99 Optimized Implementations for ThunderX2T99	2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K	19ba133383	THUNDERX2T99: Add Optimized ZGEMM Implementation	2017-02-28 05:31:41 +00:00
Abdurrauf	0d96b0e2a7	Merge branch 'z13' into develop	2017-02-26 06:17:33 +04:00
Abdurrauf	848cb27b1e	ztrmm kernel.	2017-02-26 06:14:12 +04:00
Ashwin Sekhar T K	2757b49767	THUNDERX2T99: Add Optimized CGEMM Implementation	2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K	f279ff4789	THUNDERX2T99: Add Optimized SGEMM Implementation	2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K	4b55fae337	ARM64: Add Cavium THUNDERX2T99 Target	2017-01-11 11:18:40 +05:30
Andrew Pinski	fb200c7245	ARM64: Add Cavium THUNDERX Target	2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K	4713e7c47f	ARM64: Add the VULCAN Target	2017-01-10 15:01:17 +05:30
Zhang Xianyi	b678471d65	Merge branch 'z13' into develop Conflicts: CONTRIBUTORS.md	2017-01-09 05:52:42 -05:00
Abdurrauf	6418667818	dtrmm and dgemm for z13	2017-01-04 19:32:33 +04:00
Shivraj Patil	9687437928	MIPS n32 ABI and build time mips simd support check Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-08-10 17:44:22 +05:30
Shivraj Patil	d1c6469283	MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-08-08 11:58:01 +05:30
Shivraj Patil	beb1d076a4	Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-07-15 18:38:25 +05:30
Zhang Xianyi	8a592ee386	Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714 Improvements to Aarch64 kernels	2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K	0a5ff9f9f9	Improvements to TRMM and GEMM kernels	2016-07-14 13:56:04 +05:30
Shivraj Patil	57df7956ee	Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM. Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-06-28 17:51:10 +05:30
Shivraj Patil	c4ba40e308	SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-05-19 11:04:42 +05:30
Werner Saar	88011f625d	Merge pull request #876 from wernsaar/develop optimized dgemm on power8 for 20 threads	2016-05-16 14:52:40 +02:00
Werner Saar	8310d4d3f7	optimized dgemm for 20 threads	2016-05-16 14:14:25 +02:00
Shivraj Patil	085cf236c2	conflict resolved by syncing with 'xianyi:develop' Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-05-04 11:07:14 +05:30
Shivraj Patil	b7b3d8ec8e	DGEMM optimization for MIPS P5600 and I6400 using MSA Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-05-03 14:42:26 +05:30
Zhang Xianyi	cd7af5260a	Merge pull request #847 from sva-img/develop MIPS P5600(32 bit) and I6400(64 bit) cores support added.	2016-04-29 11:44:36 -04:00
Werner Saar	782f75ba94	optimized param.h for POWER8	2016-04-27 15:48:09 +02:00
Werner Saar	0d0c6f7d7d	optimized dgemm for POWER8	2016-04-27 14:01:08 +02:00
Werner Saar	40ac64ae4f	updated param.h for EXCAVATOR	2016-04-25 10:40:04 +02:00
Werner Saar	089aad57f7	updated param.h for POWER8	2016-04-23 14:26:24 +02:00
Werner Saar	879a51165f	Optimized zgemm and tested zgemm again	2016-04-22 13:07:12 +02:00
Shivraj Patil	2c3dfe2bf3	MIPS P5600(32 bit) and I6400(64 bit) cores support added. Seperated mips and mips64 files. Configurations support for mips 32 bit. Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-04-22 14:03:18 +05:30
Werner Saar	3c6294ca3d	added optimized sgemm_tcopy for power8	2016-04-19 16:08:54 +02:00
Zhang Xianyi	dd43661cfd	Init IBM z system (s390x) porting.	2016-04-15 18:02:24 -04:00
Werner Saar	e173c51c04	updated zgemm- and ztrmm-kernel for POWER8	2016-04-08 09:05:37 +02:00
Werner Saar	9c42f0374a	Updated cgemm- and sgemm-kernel for POWER8 SMP	2016-04-07 15:08:15 +02:00
Werner Saar	a51102e9b7	bugfixes for sgemm- and cgemm-kernel	2016-04-06 11:15:21 +02:00
Werner Saar	c5b1fbcb2e	updated optimized cgemm- and ctrmm-kernel for POWER8	2016-04-04 09:12:08 +02:00
Werner Saar	6a9bbfc227	updated sgemm- and strmm-kernel for POWER8	2016-04-02 17:16:36 +02:00
Werner Saar	e1df5a6e23	fixed sgemm- and strmm-kernel	2016-03-18 12:12:03 +01:00
Werner Saar	5c658f8746	add optimized cgemm- and ctrmm-kernel for POWER8	2016-03-18 08:17:25 +01:00
Werner Saar	96284ab295	added sgemm- and strmm-kernel for POWER8	2016-03-14 13:52:44 +01:00
Werner Saar	91e1c5080c	modified configuration, to use power6 sgemm kernel for power8	2016-03-04 13:38:57 +01:00
Werner Saar	b752858d6c	added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8	2016-03-01 07:33:56 +01:00
Zhang Xianyi	3e8d6ea74f	Init POWER8 kernels by POWER6.	2015-11-03 12:34:23 +08:00
Werner Saar	b07d733a71	added updates for syrk and syr2k	2016-01-21 13:16:44 +01:00
Ashwin Sekhar T K	39937d15cd	Change BUFFER_SIZE for Cortex A57 to 20 MB Change the GEMM_P, GEMM_Q, GEMM_R values for Cortex A57	2015-11-20 01:12:04 +05:30
Ashwin Sekhar T K	1397b47197	Optimized zgemm kernel for CORTEXA57	2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K	45f78963ac	Optimized cgemm kernel for CORTEXA57 Also, add a generic ztrmm 4x4 kernel	2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K	402443bf9c	Optimized dgemm kernel for CORTEXA57	2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K	f2f8a0fe8b	Adding arm64 target CORTEXA57 Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>	2015-11-09 14:15:50 +05:30
Werner Saar	9bd962f655	modified haswell parameter dgemm_unroll_n	2015-06-13 10:28:27 +02:00
Zhang Xianyi	51ff17d46e	Add AMD Excavator target.	2015-05-13 16:16:30 -05:00
Zhang Xianyi	229ce2ccd1	Add cortex-a9 and cortex-a15 targets.	2015-01-12 08:55:29 +00:00
Werner Saar	ddf983d643	added optimizations for steamroller	2014-12-30 20:14:45 +08:00
Werner Saar	4319769b79	added target processor STEAMROLLER	2014-12-28 20:16:46 +08:00
Werner Saar	587e16fba3	Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel	2014-12-22 17:01:18 +01:00
Zhang Xianyi	2fb02626da	Update organization info.	2014-11-25 15:28:58 +08:00
Zhang Xianyi	a85c2785ae	Refs #467 . Added generic kernel file for x86_64.	2014-11-24 15:34:48 +08:00
Benedikt Huber	58c90d5937	# The first commit's message is: Optimizations for APM's xgene-1 (aarch64). 1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8. 2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it. 3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN. Added Dave Nuechterlein to the contributors list.	2014-11-11 22:19:23 +08:00
wernsaar	9d7057366d	bugfix for GEMM3M functions	2014-09-21 11:41:43 +02:00
wernsaar	7aae4a62e7	enabled use of GEMM3M functions	2014-09-20 14:27:10 +02:00
wernsaar	5087096711	optimization of sandybridge cgemm-kernel	2014-07-29 19:07:21 +02:00
wernsaar	1cc02b4337	optimized sgemm kernel for haswell	2014-07-28 11:50:01 +02:00
wernsaar	125610d23b	allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk	2014-07-24 18:43:31 +02:00
Zhang Xianyi	99efbbbad5	Fixed #395 . Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and barcelona on Windows. Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop Conflicts: kernel/Makefile.L1 kernel/x86_64/KERNEL param.h	2014-06-29 10:34:51 +08:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00

1 2 3 4 5 ...

298 Commits