Commit Graph

2812 Commits

Author SHA1 Message Date
Alex Arslan
8da6b6ae52 Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
07ed01e97f Merge pull request #1500 from martin-frbg/issue1474
Correct index variables used in MFlops calculation
2018-03-28 09:15:34 +02:00
Martin Kroeker
35c5a32309 Correct index variables used in MFlops calculation
Fixes #1474
2018-03-27 21:52:29 +02:00
Martin Kroeker
c7b55b6082 Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd  kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
Martin Kroeker
840e01061f Merge pull request #1491 from martin-frbg/ddot_mt
Add multithreading support for Haswell DDOT
2018-03-27 21:43:05 +02:00
QWR QWR
28ca97015d power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker
73c5ca74fa Merge pull request #1495 from martin-frbg/aff
Disable CPU affinity by default again
2018-03-19 18:03:25 +01:00
Martin Kroeker
e453555d97 Disable CPU affinity by default again
This setting must have been changed unintentionally by my PR #1214 (probably leftover from unrelated tests)
2018-03-19 18:02:23 +01:00
Martin Kroeker
6a6ffaff1e Merge pull request #1494 from martin-frbg/x86_dsdot
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
2018-03-17 15:26:47 +01:00
Martin Kroeker
28ac9ea5a6 Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker
a55694dd5b Declare dot_compute static to avoid conflicts in multiarch builds 2018-03-16 22:23:36 +01:00
Martin Kroeker
85a41e9cdb Add multithreading support for Haswell DDOT
copied from ashwinyes' implementation in dot_thunderx2t99.c
2018-03-16 16:58:47 +01:00
Martin Kroeker
2c7392f07b Merge pull request #1482 from martin-frbg/haswell_axpy
Re-enable DAXPY AVX microkernels  for x86_64
2018-03-04 22:21:18 +01:00
Martin Kroeker
81215711a2 Re-enable DAXPY microkernels for x86_64
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker
809fd0d451 Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480)
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
2018-03-04 17:39:56 +01:00
Martin Kroeker
72e65157df Merge pull request #1481 from martin-frbg/utest-fixup
Fix transposition of expected and computed values in error message
2018-03-03 22:43:56 +01:00
Martin Kroeker
69a8aa6de2 Fix transposition of expected and computed values in error message 2018-03-03 18:01:51 +01:00
Martin Kroeker
0ab5bf1746 Merge pull request #1476 from xsacha/patch-1
Fix CMake cross-compiling
2018-02-28 18:47:57 +01:00
Martin Kroeker
22167170b3 Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Martin Kroeker
69d9f36ff4 Merge pull request #1468 from martin-frbg/martin-frbg-patch-1
Limit the additional locking from PRs 1052,1299 to non-OpenMP cases
2018-02-28 18:40:31 +01:00
Sacha
f81815e48a Fix CMake cross-compiling
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
2018-02-28 10:25:25 +10:00
Martin Kroeker
5f855d965d Merge pull request #1475 from ashwinyes/develop_20180227_utest_dsdot_fixes
ARM64: Fix utest dsdot errors
2018-02-27 14:04:16 +01:00
Ashwin Sekhar T K
fa9ca65c0e ARM64: Fix utest dsdot errors 2018-02-27 10:47:55 +00:00
Martin Kroeker
719b68f077 Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker
fe9f15f2d8 Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker
497f0c3d8a Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
Martin Kroeker
ea37db828e Convert .align to .p2align for OSX compatibility 2018-02-26 20:48:03 +01:00
Martin Kroeker
e6a0a3de73 Merge pull request #1471 from martin-frbg/p2align
Use .p2align instead of .align for portability on Haswell and Sandybridge
2018-02-26 12:28:01 +01:00
Martin Kroeker
6e70287776 Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker
58f236ad73 Use generic/dot.c for DSDOT on zarch 2018-02-25 19:52:14 +01:00
Martin Kroeker
e207107150 Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker
c9d408064a Use dot.S also for DSDOT on CORTEXA57 2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e Use dot.S also for DSDOT on ARMV8 2018-02-25 19:45:16 +01:00
Martin Kroeker
7c1925acec Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
Martin Kroeker
2359c7c1a9 Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker
7646974227 Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker
e3a80e6aa8 Merge pull request #1466 from xianyi/revert-1464-issue1461
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:38 +01:00
Martin Kroeker
8866e393a2 Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker
9e87e6f3d8 Merge pull request #1464 from martin-frbg/issue1461
Add locks only for non-OPENMP multithreading
2018-02-20 14:08:24 +01:00
Martin Kroeker
3119b2ab4c Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Martin Kroeker
e7366a4161 Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Martin Kroeker
6c0f79b787 Merge pull request #1463 from martin-frbg/rotmg2
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
2018-02-18 15:12:42 +01:00
Martin Kroeker
7cd7acf71e Merge pull request #1460 from martin-frbg/issue1425
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
2018-02-18 15:12:20 +01:00
Martin Kroeker
72f14a0363 Fix conditionals in the rescaling against GAMSQ 2018-02-18 12:54:52 +01:00
Martin Kroeker
53026dc63a Update single and double precision BLAS1 tests from LAPACK 3.8.0
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
2018-02-18 12:44:14 +01:00
Martin Kroeker
798f1595d5 Fix condition in both second scaling loops 2018-02-18 12:37:09 +01:00
the mslm
2c0a008281 dgemm_ncopy_4_ save/restore 2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b power8 ?gemm_tcopy save/restore 2018-02-16 23:36:46 +00:00
Martin Kroeker
eaab622f03 Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
To allow compiling with gcc versions older than 4.9
2018-02-14 22:58:14 +01:00
Martin Kroeker
3cda1ce50a Revert insiduous suppression of the -fopenmp flag in the LAPACK subtree
This was added in #1046 citing a problem with mingw, but in effect it quietly reduces thread safety on all non-Windows platforms (while -fopenmp is already disabled for Windows builds through the toplevel Makefile.system). Removing the filter fixes #1425
2018-02-13 22:44:45 +01:00