Commit Graph

4617 Commits

Author SHA1 Message Date
Alex Arslan 8da6b6ae52
Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker 01c4b82f04
Update memory.c 2018-03-31 22:32:06 +02:00
Martin Kroeker 93db123f7e
Update memory.c 2018-03-29 13:13:49 +02:00
Martin Kroeker 752fdb5dd8
Add workaround for old gcc and clang versions
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker 07ed01e97f
Merge pull request #1500 from martin-frbg/issue1474
Correct index variables used in MFlops calculation
2018-03-28 09:15:34 +02:00
Martin Kroeker 35c5a32309
Correct index variables used in MFlops calculation
Fixes #1474
2018-03-27 21:52:29 +02:00
Martin Kroeker c7b55b6082
Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd  kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
Martin Kroeker 840e01061f
Merge pull request #1491 from martin-frbg/ddot_mt
Add multithreading support for Haswell DDOT
2018-03-27 21:43:05 +02:00
QWR QWR 28ca97015d power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker 73c5ca74fa
Merge pull request #1495 from martin-frbg/aff
Disable CPU affinity by default again
2018-03-19 18:03:25 +01:00
Martin Kroeker e453555d97
Disable CPU affinity by default again
This setting must have been changed unintentionally by my PR #1214 (probably leftover from unrelated tests)
2018-03-19 18:02:23 +01:00
Martin Kroeker 6a6ffaff1e
Merge pull request #1494 from martin-frbg/x86_dsdot
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
2018-03-17 15:26:47 +01:00
Martin Kroeker 28ac9ea5a6
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker a55694dd5b
Declare dot_compute static to avoid conflicts in multiarch builds 2018-03-16 22:23:36 +01:00
Martin Kroeker 85a41e9cdb
Add multithreading support for Haswell DDOT
copied from ashwinyes' implementation in dot_thunderx2t99.c
2018-03-16 16:58:47 +01:00
Martin Kroeker 40160ff3c1
Use _Atomic instead of volatile for thread safety where C11 is supported 2018-03-10 00:15:44 +01:00
Martin Kroeker 6a99fcce94
Use _Atomic instead of volatile for thread safety where C11 is supported
Suggested by dodomorandi in #660
2018-03-10 00:03:49 +01:00
Martin Kroeker 2c7392f07b
Merge pull request #1482 from martin-frbg/haswell_axpy
Re-enable DAXPY AVX microkernels  for x86_64
2018-03-04 22:21:18 +01:00
Martin Kroeker 81215711a2
Re-enable DAXPY microkernels for x86_64
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker 809fd0d451
Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480)
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
2018-03-04 17:39:56 +01:00
Martin Kroeker 72e65157df
Merge pull request #1481 from martin-frbg/utest-fixup
Fix transposition of expected and computed values in error message
2018-03-03 22:43:56 +01:00
Martin Kroeker 69a8aa6de2
Fix transposition of expected and computed values in error message 2018-03-03 18:01:51 +01:00
Martin Kroeker 0ab5bf1746
Merge pull request #1476 from xsacha/patch-1
Fix CMake cross-compiling
2018-02-28 18:47:57 +01:00
Martin Kroeker 22167170b3
Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Martin Kroeker 69d9f36ff4
Merge pull request #1468 from martin-frbg/martin-frbg-patch-1
Limit the additional locking from PRs 1052,1299 to non-OpenMP cases
2018-02-28 18:40:31 +01:00
Sacha f81815e48a
Fix CMake cross-compiling
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
2018-02-28 10:25:25 +10:00
Martin Kroeker 5f855d965d
Merge pull request #1475 from ashwinyes/develop_20180227_utest_dsdot_fixes
ARM64: Fix utest dsdot errors
2018-02-27 14:04:16 +01:00
Ashwin Sekhar T K fa9ca65c0e ARM64: Fix utest dsdot errors 2018-02-27 10:47:55 +00:00
Martin Kroeker 719b68f077
Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker fe9f15f2d8
Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker 497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
Martin Kroeker ea37db828e
Convert .align to .p2align for OSX compatibility 2018-02-26 20:48:03 +01:00
Martin Kroeker e6a0a3de73
Merge pull request #1471 from martin-frbg/p2align
Use .p2align instead of .align for portability on Haswell and Sandybridge
2018-02-26 12:28:01 +01:00
Martin Kroeker 6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker 58f236ad73
Use generic/dot.c for DSDOT on zarch 2018-02-25 19:52:14 +01:00
Martin Kroeker e207107150
Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker c9d408064a
Use dot.S also for DSDOT on CORTEXA57 2018-02-25 19:48:09 +01:00
Martin Kroeker 288d1a3f6e
Use dot.S also for DSDOT on ARMV8 2018-02-25 19:45:16 +01:00
Martin Kroeker 7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker 7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker e3a80e6aa8
Merge pull request #1466 from xianyi/revert-1464-issue1461
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:38 +01:00
Martin Kroeker 8866e393a2
Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker 9e87e6f3d8
Merge pull request #1464 from martin-frbg/issue1461
Add locks only for non-OPENMP multithreading
2018-02-20 14:08:24 +01:00
Martin Kroeker 3119b2ab4c
Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Martin Kroeker e7366a4161
Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Martin Kroeker 6c0f79b787
Merge pull request #1463 from martin-frbg/rotmg2
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
2018-02-18 15:12:42 +01:00
Martin Kroeker 7cd7acf71e
Merge pull request #1460 from martin-frbg/issue1425
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
2018-02-18 15:12:20 +01:00
Martin Kroeker 72f14a0363
Fix conditionals in the rescaling against GAMSQ 2018-02-18 12:54:52 +01:00
Martin Kroeker 53026dc63a
Update single and double precision BLAS1 tests from LAPACK 3.8.0
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
2018-02-18 12:44:14 +01:00