OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	f72fdf525c	Merge pull request #1875 from martin-frbg/issue1851 Serialize accesses to parallelized level3 functions from multiple cal…	2018-11-25 20:53:46 +01:00
Martin Kroeker	5393759a98	Merge pull request #1869 from martin-frbg/axpy0 Handle special case INCX=0,INCY=0 in the axpy interface	2018-11-25 20:52:49 +01:00
Martin Kroeker	5cf18e2875	Merge pull request #1878 from kiwifb/PGI_f_check Correct link flags for PGI compiler.	2018-11-25 20:51:50 +01:00
Martin Kroeker	910050985a	Merge pull request #1876 from rengolin/armv8-cleanup Simplifying ARMv8 build parameters	2018-11-25 20:51:24 +01:00
François Bissey	0184713e1a	Correct link flags for PGI compiler.	2018-11-21 14:24:56 +13:00
Martin Kroeker	45c3c459e1	Merge pull request #1868 from martin-frbg/aix_cpuid Use prtconf to determine CPU type on AIX	2018-11-20 17:25:57 +01:00
Martin Kroeker	113cb00b95	fix missing parenthesis	2018-11-19 21:01:36 +01:00
Martin Kroeker	5192651706	Add CriticalSection handling instead of mutexes for Windows	2018-11-19 17:58:22 +01:00
Renato Golin	310ea55f29	Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for all cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.	2018-11-19 16:41:49 +00:00
Martin Kroeker	2e6fae2aad	Serialize accesses to parallelized level3 functions from multiple callers for #1851	2018-11-19 14:02:50 +01:00
Martin Kroeker	368d14f8c8	Fix harmless typo fixes #1872	2018-11-16 14:58:28 +01:00
Martin Kroeker	42bc2a9202	Fix copy-paste errors (POWER8/9 and extraneous return)	2018-11-16 12:10:44 +01:00
fengruilin	43bb386b10	fix dot problem on 64bit mips	2018-11-15 11:11:59 +08:00
Martin Kroeker	c171b8ad13	Handle special case INCX=0,INCY=0 in the axpy interface	2018-11-13 13:57:18 +01:00
Martin Kroeker	2f04cf22ac	Detect POWER9 as POWER8 on AIX and Linux (already supported by the *BSD version)	2018-11-13 08:16:14 +01:00
Martin Kroeker	807f6e6922	Use prtconf to determine CPU type on AIX for #1803	2018-11-12 18:52:29 +01:00
Martin Kroeker	ecbeb802a0	Merge pull request #1865 from martin-frbg/issue1844 Optimize gemv for small M, large N only if it can be done in a threadsafe manner	2018-11-12 17:30:44 +01:00
Martin Kroeker	2c5725cc39	Merge pull request #1864 from aytekinar/patch-1 Add ARM tests on Travis	2018-11-12 14:30:28 +01:00
Arda Aytekin	e3666931d8	Update .travis.yml Updated `.travis.yml` file to add emulated tests for `ARMV6` and `ARMV8` architectures with `gcc` and `clang`. Created prebuilt images with required dependencies. Squashed layers into one.	2018-11-11 20:50:38 +01:00
Martin Kroeker	ae02a57261	Merge pull request #1866 from martin-frbg/issue1859 Fix argument in SLASET call to zero S	2018-11-10 19:23:31 +01:00
Martin Kroeker	a6a52a73f7	Fix argument in SLASET call to zero S fixes #1859 in accordance with https://github.com/LAPACK-Reference/issue/296	2018-11-10 17:16:53 +01:00
Martin Kroeker	0427277cef	Allow optimization for small m, large n only if it can be made threadsafe otherwise the introduction of a static array in `8e5a108` to improve #532 breaks concurrent calls from multiple threads as seen in #1844	2018-11-10 15:45:54 +01:00
Martin Kroeker	4f43668eec	Merge pull request #2 from xianyi/develop merge develop	2018-11-10 15:37:25 +01:00
Martin Kroeker	b0c15bacc1	Merge pull request #1863 from martin-frbg/aix_install3 Set LIBSONAME suffix to .a for AIX	2018-11-09 13:12:06 +01:00
Martin Kroeker	cfb0f5b0f8	Set LIBSONAME suffix to .a for AIX another fix for #1803	2018-11-08 22:39:10 +01:00
Martin Kroeker	667fed579d	Merge pull request #1856 from rengolin/armv8-a57 [Arm64) Revert A53 detection as A57	2018-11-07 21:01:29 +01:00
Martin Kroeker	96d2f2c9b2	Merge pull request #1831 from brada4/hemv disable threading in C/ZSWAP copying from S/DSWAP	2018-11-07 08:49:21 +01:00
Martin Kroeker	653e657a58	Merge pull request #1857 from brada4/fc-1847 Add gfortran -frecursive option from upstream and #1847	2018-11-07 08:48:31 +01:00
Martin Kroeker	5f8f0583d4	Merge branch 'develop' into fc-1847	2018-11-07 08:47:52 +01:00
Martin Kroeker	974a6a30f2	Merge pull request #1858 from brada4/buff-1847 Add minimum threshold for number of buffers	2018-11-07 08:46:55 +01:00
Andrew	9531d0e175	lets fit it in one 4k page	2018-11-06 17:51:24 +00:00
Andrew	40cce0e353	handle cmake too	2018-11-06 09:45:49 +00:00
Andrew	3fd41313fc	add low bound for number of buffers	2018-11-06 09:40:13 +00:00
Andrew	a931afe269	init	2018-11-06 09:39:05 +00:00
Andrew	7d3502b500	Add -frecursive gfortran option by default	2018-11-06 08:20:55 +00:00
Andrew	066f8065d1	init	2018-11-06 08:19:08 +00:00
Renato Golin	fb5b2177ca	[Arm64) Revert A53 detection as A57 This patch reverts the decision of treating A53 like A57, which was based on an analysis done on server class hardware and is not representative of all A53s out there. Fixes #1855.	2018-11-05 11:34:49 +00:00
Martin Kroeker	f1c02273cb	Merge pull request #1846 from fenrus75/threadsize gemm/dgemm: add a way for an arch kernel to specify preferred sizes	2018-11-02 13:18:01 +01:00
Martin Kroeker	661035477c	Merge pull request #1850 from martin-frbg/issue1811 Restore Android/ARMv7 build fix from #778	2018-11-02 09:50:51 +01:00
Martin Kroeker	aa7e47aa0a	Merge pull request #1849 from martin-frbg/aix_install2 Use installbsd on AIX	2018-11-01 20:39:16 +01:00
Martin Kroeker	9c177d270b	Restore Android/ARMv7 build fix from #778 for #1811	2018-11-01 18:50:25 +01:00
Martin Kroeker	b025523197	Use installbsd on AIX (and fix misplaced parenthesis from previous commit). See #1803	2018-11-01 18:26:08 +01:00
Martin Kroeker	5b50bd36f7	Merge pull request #1845 from martin-frbg/aix_install Accomodate AIX install, which has different syntax	2018-11-01 09:53:10 +01:00
Arjan van de Ven	5b708e5eb1	sgemm/dgemm: add a way for an arch kernel to specify prefered sizes The current gemm threading code can make very unfortunate choices, for example on my 10 core system a 1024x1024x1024 matrix multiply ends up chunking into blocks of 102... which is not a vector friendly size and performance ends up horrible. this patch adds a helper define where an architecture can specify a preference for size multiples. This is different from existing defines that are minimum sizes and such. The performance increase with this patch for the 1024x1024x1024 sgemm is 2.3x (!!)	2018-11-01 01:43:20 +00:00
Arjan van de Ven	dcc5d6291e	skylakex: Make the sgemm/dgemm beta code robust for a N=0 or M=0 case in the threading code there are cases where N or M can become 0, and the optimized beta code did not handle this well, leading to a crash during the audit for the crash a few edge conditions on the if statements were found and fixed as well	2018-11-01 01:42:09 +00:00
Martin Kroeker	7b5aea52bb	Accomodate AIX install, which has different syntax for #1803	2018-10-31 21:50:34 +01:00
Martin Kroeker	f5595d0262	Merge pull request #1843 from martin-frbg/aix_numprocs Add get_num_procs implementation for AIX	2018-10-31 21:25:15 +01:00
Martin Kroeker	326d394a0f	Add get_num_procs implementation for AIX (and copy HAIKU implementation to the non-TLS version of the code as well)	2018-10-31 18:38:22 +01:00
Martin Kroeker	6af8e35a24	Merge pull request #1837 from embray/set-num-thread-after-fork Ensure that blas_thread_init has been called in openblas_set_num_threads	2018-10-30 12:41:24 +01:00
Erik M. Bray	38cf5d9364	ensure that threading has been initialized in the first place before calling openblas_set_num_threads	2018-10-28 21:16:52 +00:00

... 82 83 84 85 86 ...

7452 Commits All Branches Search

7452 Commits

All Branches