OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00
Xianyi Zhang	265ab484c8	Change default RISC-V 64-bit corename to RISCV64_GENERIC e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc	2020-02-27 14:46:15 +08:00
Xianyi Zhang	4aa2d89217	Merge branch 'develop' into risc-v	2020-02-27 13:53:49 +08:00
Martin Kroeker	ddcbed6690	Merge pull request #2437 from martin-frbg/issue2434 [WIP] Add support for Ampere EMAG8180 ARMV8 cpu	2020-02-25 18:42:52 +01:00
Martin Kroeker	f8ec538c82	Add Ampere EMAG8180	2020-02-25 14:30:00 +01:00
Martin Kroeker	76b2cec6ce	Get endianness into Makefile variable	2020-02-19 18:08:20 +01:00
Martin Kroeker	e3e8b5cdca	Add NetBSD	2019-10-25 12:51:06 +02:00
Martin Kroeker	7c51cc8527	Merge branch 'develop' into develop	2019-03-29 19:36:29 +01:00
AbdelRauf	853a18bc17	power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself	2019-03-29 15:49:40 +00:00
maomao194313	760842dda1	add TARGET support for HiSilicon tsv110 CPUs	2019-03-04 16:45:22 +08:00
Martin Kroeker	72d3e7c9b4	Add FORCE Z14 from patch provided by aarnez in #991	2019-01-31 21:15:50 +01:00
Martin Kroeker	21eda8b577	Report SkylakeX as Haswell if compiler does not support AVX512 ... or make was invoked with NO_AVX512=1	2019-01-22 18:47:12 +01:00
TiborGY	d11554c88f	Validate user supplied TARGET (#1941 ) the build will now abort with an error message when an undefined build TARGET is named Fixes #1938	2018-12-31 23:19:44 +01:00
Renato Golin	310ea55f29	Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for all cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.	2018-11-19 16:41:49 +00:00
Martin Kroeker	e9cd11768c	Enable parallel make on MS Windows by default fixes #874	2018-06-09 17:54:36 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Jerry Zhao	c167a3d6f4	Added RISCV build	2018-04-16 14:08:31 -07:00
Alex Arslan	a41d241a0e	Add support for DragonFly BSD	2018-04-03 16:39:29 -07:00
Alex Arslan	8da6b6ae52	Allow building on OpenBSD With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2 using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.	2018-04-02 10:48:22 -07:00
Martin Kroeker	efa84afd00	Use get_corename for SPARC as well	2018-02-01 18:20:38 +01:00
Shivraj Patil	e3d844b062	Added mips I6500 core Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2017-09-22 11:57:43 +05:30
Sébastien Villemot	7543e578a4	Add support for TARGET=ZARCH_GENERIC and TARGET=Z13	2017-09-19 12:16:42 +02:00
Martin Kroeker	166d64eb7c	Fix FORCE_ZEN option in getarch.c	2017-04-19 14:20:42 +02:00
Denis Steckelmacher	c9ff735da6	Add ZEN support (tested for auto-detected static backend)	2017-03-19 15:32:50 +01:00
Ashwin Sekhar T K	4b55fae337	ARM64: Add Cavium THUNDERX2T99 Target	2017-01-11 11:18:40 +05:30
Andrew Pinski	fb200c7245	ARM64: Add Cavium THUNDERX Target	2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K	4713e7c47f	ARM64: Add the VULCAN Target	2017-01-10 15:01:17 +05:30
Zhang Xianyi	b678471d65	Merge branch 'z13' into develop Conflicts: CONTRIBUTORS.md	2017-01-09 05:52:42 -05:00
Martin Kroeker	570bc9afbd	Fix spurious define in openblas_config.h TARGET as specified with make is already return-terminated when getarch reads it. This led to an empty line written to config_last.h that awk in Makefile.install then expanded to a spurious "#define OPENBLAS_" in openblas_config.h (as noted by "kmb" on the mailing list)	2016-11-06 17:29:33 +01:00
Howard Su	ff1da01476	USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN to determine the number of CPU. In ARM platform, online CPU will increasing when there is more workload. while configure cpu is the max number of CPU.	2016-10-13 12:37:50 +00:00
Shivraj Patil	beb1d076a4	Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-07-15 18:38:25 +05:30
Shivraj Patil	2c3dfe2bf3	MIPS P5600(32 bit) and I6400(64 bit) cores support added. Seperated mips and mips64 files. Configurations support for mips 32 bit. Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>	2016-04-22 14:03:18 +05:30
Zhang Xianyi	dd43661cfd	Init IBM z system (s390x) porting.	2016-04-15 18:02:24 -04:00
Jerome Robert	7aac0aff8e	Allow to force to do not use -j as make argument Close #828 (hopefully)	2016-03-31 23:03:52 +02:00
Werner Saar	b752858d6c	added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8	2016-03-01 07:33:56 +01:00
Zhang Xianyi	3e8d6ea74f	Init POWER8 kernels by POWER6.	2015-11-03 12:34:23 +08:00
Zhang Xianyi	aaa8551c57	Merge pull request #749 from lotheac/illumos_fixes illumos fixes	2016-01-26 08:42:20 -06:00
Lauri Tirkkonen	8635d425c1	make parallel make work on illumos	2016-01-22 18:55:48 +02:00
Jerome Robert	ba024fcfc0	Allow to force the number of parallel make job This is particularly useful when using distcc	2015-12-28 19:45:29 +01:00
Ashwin Sekhar T K	f2f8a0fe8b	Adding arm64 target CORTEXA57 Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>	2015-11-09 14:15:50 +05:30
Zhang Xianyi	94b125255f	Merge branch 'develop' into cmake Conflicts: driver/others/memory.c	2015-10-13 04:46:08 +08:00
Zhang Xianyi	f27942a68a	Fixed make TARGET=CORTEXA9 and CORTEXA15 bug.	2015-09-26 14:42:44 +00:00
Grazvydas Ignotas	d38a1ddc7a	use real armv5 support there is no more requirement for ARMv6 instructions, and VFP on ARMv5 is uncommon	2015-08-16 18:59:18 +02:00
Fábio Perez	b8d64a856a	Add POWER7/POWER8 as targets	2015-08-05 11:02:39 -03:00
Zhang Xianyi	dcd5ba4443	Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake	2015-07-22 04:06:39 +08:00
Zhang Xianyi	51ff17d46e	Add AMD Excavator target.	2015-05-13 16:16:30 -05:00
Zhang Xianyi	c674fa32be	Add ARM targets.	2015-03-24 12:17:04 -05:00
Zhang Xianyi	229ce2ccd1	Add cortex-a9 and cortex-a15 targets.	2015-01-12 08:55:29 +00:00
Hank Anderson	1a41022e3e	Added MSVC defines to cpuid.h and getarch.c.	2015-01-01 21:01:28 -06:00
Werner Saar	4319769b79	added target processor STEAMROLLER	2014-12-28 20:16:46 +08:00
Zhang Xianyi	2fb02626da	Update organization info.	2014-11-25 15:28:58 +08:00
Benedikt Huber	58c90d5937	# The first commit's message is: Optimizations for APM's xgene-1 (aarch64). 1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8. 2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it. 3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN. Added Dave Nuechterlein to the contributors list.	2014-11-11 22:19:23 +08:00
Zhang Xianyi	552119c484	Fixed #407 . Support outputing the CPU corename on runtime. The user can use char * openblas_get_config() or char * openblas_get_corename().	2014-07-08 12:48:08 +08:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	43fbdb7a5a	added ARMV5 as reference platform	2014-05-13 17:25:19 +02:00
wernsaar	b3254eecaf	Merge remote branch 'origin/haswell' into develop	2013-12-01 18:09:12 +01:00
wernsaar	d13aa79d26	modified getarch.c	2013-12-01 17:31:22 +01:00
Zhang Xianyi	2638370844	Init code base for Intel Haswell.	2013-08-13 00:54:59 +08:00
Zhang Xianyi	673e453b3f	Enable bulldozer kernels.	2013-08-05 16:07:54 +08:00
Zhang Xianyi	5b504d6c23	Refs #263 . Rollback bulldozer and piledriver kernels to barcelona kernels.	2013-07-28 17:39:24 +08:00
Zhang Xianyi	fbb75e58b1	Fixed the typo in getarch.c	2013-07-09 16:26:59 +08:00
Zhang Xianyi	f54f5bac9e	Refs #248 . Fixed the LSB compatiable issue for BLAS only. For example, make CC=lsbcc NO_LAPACK=1.	2013-07-09 15:38:03 +08:00
Zhang Xianyi	886cbaf4e4	Support AMD Piledriver by bulldozer kernels.	2013-07-06 12:06:43 -03:00
Zhang Xianyi	48bdc1ad3b	Added NO_PARALLEL_MAKE flag to disable parallel make.	2013-04-15 21:37:30 +08:00
Explorer09	53588bc786	getarch.c: Minor re-ordering of architecture list	2013-03-17 23:09:23 +08:00
Explorer09	b47f13ee4c	getarch.c: Minor re-ordering of architecture list	2013-03-17 23:07:48 +08:00
Zhang Xianyi	bfaaa975e6	Added BULLDOZER target. So far it uses barcelona kernels.	2012-12-07 00:53:31 +08:00
Zhang Xianyi	b7c0fa6bd2	Init AMD Bulldozer codebase.	2012-12-06 07:29:54 -05:00
Xianyi Zhang	0a958b6a02	Refs #118 . Detect AMD Bulldozer as Barcelona.	2012-06-25 17:28:49 +08:00
Zhang Xianyi	d3b67d0bd8	Refs #113 . Fixed the typo BOBCATE -> BOBCAT	2012-05-31 22:40:15 +08:00
Zhang Xianyi	d6cab3f37e	Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.	2012-05-31 18:17:45 +08:00
Xianyi Zhang	19a48b82cf	Init Sandybridge codes based on Nehalem.	2012-03-30 20:01:03 +08:00
Xianyi Zhang	b95ad4cfaf	Support detecting ICT Loongson-3B CPU.	2011-11-09 19:29:50 +00:00
traits	9fc6764fa7	refs #55 . Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime.	2011-09-05 17:37:07 +08:00
Xianyi Zhang	ff6ae89d3e	Fixed #19 . Provided an error msg when the arch is not supported.	2011-04-22 20:21:42 +08:00
Xianyi Zhang	46ce7270d9	fixed a typo.	2011-01-25 15:55:56 +08:00
Xianyi Zhang	0597c1076f	Added the configures of loongson 3a. refs #1	2011-01-24 22:45:35 +00:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

1 2 3

128 Commits