Commit Graph

3414 Commits

Author SHA1 Message Date
Martin Kroeker 8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot
Fix typo in sdot function
2018-08-12 18:17:01 +02:00
Martin Kroeker 1ee6d087c3
Merge pull request #1729 from fenrus75/dscal
Add an AVX512 enabled DSCAL function
2018-08-12 18:16:45 +02:00
Martin Kroeker a95a784ab2
Merge pull request #1723 from maamountki/develop
Disable zgemv scale in gemv benchmark by default
2018-08-11 21:08:45 +02:00
Arjan van de Ven 9bec34cb67 Add an AVX512 enabled DSYMV (L) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:46:24 +00:00
Arjan van de Ven 87bebdbd8a Add an AVX512 enabled DGEMV (n) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:38:12 +00:00
Arjan van de Ven 9493f26309 add short blurb about avx512 and needed compiler to README 2018-08-11 17:21:46 +00:00
Arjan van de Ven 36add7570a Fix typo in sdot function
it looks like my previous pull request was short the final commit;
fix a typo in sdot
2018-08-11 17:16:45 +00:00
Arjan van de Ven cacacc8007 Add an AVX512 enabled DSCAL function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:14:57 +00:00
Martin Kroeker 1a00ef3d27
Merge pull request #1725 from fenrus75/axpy
Add a AVX512 enabled SAXPY/DAXPY functions
2018-08-11 11:01:20 +02:00
Martin Kroeker 4c0d832ec3
Merge pull request #1724 from fenrus75/sdot
Add an AVX512 enabled SDOT function
2018-08-11 11:00:56 +02:00
Martin Kroeker fc33cbc7bb
Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
2018-08-10 13:24:36 +02:00
Martin Kroeker c52a831ae4
Add changes from the 0.3.x releases
fixes #1727
2018-08-10 13:23:47 +02:00
Arjan van de Ven 2e99873ff7 Add a AVX512 enabled SAXPY/DAXPY functions
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:58:32 +00:00
Arjan van de Ven 00abaa865b Add an AVX512 enabled SDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:33:43 +00:00
maamountki 33043f563f
Disable scal to benchmark zgemv separately by default 2018-08-10 01:54:18 +03:00
Martin Kroeker 66da7677bd
Merge pull request #1721 from fenrus75/ddot2
Add an AVX512 enabled DDOT function
2018-08-09 15:39:06 +02:00
Arjan van de Ven 7932ff3ea9 Add an AVX512 enabled DDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-09 03:55:52 +00:00
Martin Kroeker 62f4c69708
Merge pull request #1717 from martin-frbg/issue1708
Add workaround for avx512 compilations on Cygwin
2018-08-06 22:05:47 +02:00
Martin Kroeker 73478664d4
Add workaround for avx512 compilations on Cygwin
fixes #1708
2018-08-06 16:40:32 +02:00
Martin Kroeker ee955757f9
Merge pull request #1715 from stevengj/patch-1
fix blasabs for windows
2018-08-05 22:48:44 +02:00
Steven G. Johnson 48610a4524
fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker 4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker e788102c10
Merge pull request #1709 from stevengj/patch-1
fabs -> fabsl
2018-08-04 23:51:10 +02:00
Martin Kroeker 165f00c159
fabs -> fabsl 2018-08-04 20:14:51 +02:00
Martin Kroeker 40c068a875
Introduce blasabs() to switch between abs() and labs() for INTERFACE64 2018-08-04 20:07:59 +02:00
Martin Kroeker 933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
Steven G. Johnson a4e321400b
fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
2018-08-03 13:00:10 -04:00
Martin Kroeker 9e65430504
Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
2018-08-02 23:48:42 +02:00
Martin Kroeker 2cfa86b406
Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
2018-08-02 22:27:00 +02:00
Scott Thornton 2a9a9389ef Added target_include_directories() 2018-08-02 14:58:52 -05:00
Zoltán Mizsei 6463bffd59 Haiku supporting patches 2018-08-02 20:49:14 +02:00
Martin Kroeker 8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner 6400868e55 Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly.  Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Scott Thornton 8ebf541e97 Set EXPORT_NAME to match OpenBLASConfig.cmake 2018-07-30 15:18:29 -05:00
Martin Kroeker b03ae3f4dc
Set version to 0.3.3.dev 2018-07-30 08:23:13 +02:00
Martin Kroeker 2cc8fb0ad2
Set version to 0.3.3.dev 2018-07-30 08:22:38 +02:00
Martin Kroeker e8a68ef261
Merge pull request #1702 from xianyi/develop
Merge develop for 0.3.2
2018-07-30 07:25:01 +02:00
Martin Kroeker 64826a0d7d
Merge branch 'release-0.3.0' into develop 2018-07-29 22:37:09 +02:00
Martin Kroeker 25f2d25cfe
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
2018-07-25 19:55:29 +02:00
Martin Kroeker 73131fa30a
Do not treat WIndows UWB builds as cross-compiling 2018-07-24 17:46:33 +02:00
Martin Kroeker 66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker 43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker 7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker e71d70ba87
Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
2018-07-16 22:47:05 +02:00
Martin Kroeker d671870f5f
Merge pull request #1684 from martin-frbg/issue1672
Work around utest failures in the MIPS64 SICORTEX target
2018-07-16 22:46:49 +02:00
Martin Kroeker 4e103c822c
typo fix 2018-07-16 12:56:39 +02:00
Martin Kroeker d2142760e0
Fix precision problem in DSDOT 2018-07-15 17:11:40 +02:00
Martin Kroeker 2fbfc64da8
Use C kernels for default c/zAXPY, xROT, c/zSWAP 2018-07-15 17:09:55 +02:00
Martin Kroeker 8d5b33b6be
Add cpu identification via mfpvr call for the BSDs
fixes #1671
2018-07-12 23:39:00 +02:00