Martin Kroeker
8aeab0601e
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
...
fixes #1754
2018-09-06 16:39:52 +02:00
Dumi Loghin
a1bdc308b8
override ARCH (archiver) in lapack-netlib/make.inc
2018-09-06 13:13:36 +08:00
Dumi Loghin
0b7ccb9e38
Revert "replace ARCH with AR in lapack-netlib"
...
This reverts commit db17ce896f
.
2018-09-06 13:08:30 +08:00
Dumi Loghin
db17ce896f
replace ARCH with AR in lapack-netlib
2018-09-05 12:49:37 +08:00
Martin Kroeker
1cb7b9015e
Conditional compilation of assembly files that IOS does not like
2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2
Fix paths to C kernels for nrm2
2018-09-04 10:51:19 +02:00
Martin Kroeker
9e2bb0c641
Update with the changes from 0.3.3
2018-08-31 00:21:13 +02:00
Martin Kroeker
dbfd7524cd
Update version to 0.3.4.dev
2018-08-31 00:19:21 +02:00
Martin Kroeker
2982ce505d
Update version to 0.3.4.dev
2018-08-31 00:18:37 +02:00
Martin Kroeker
fd8d1868a1
Updates for 0.3.3
2018-08-31 00:07:48 +02:00
Martin Kroeker
f0563f14ba
Version 0.3.3
2018-08-30 23:43:57 +02:00
Martin Kroeker
3197f86762
Version 0.3.3
2018-08-30 23:43:14 +02:00
Martin Kroeker
422a8fa953
Merge pull request #1747 from xianyi/develop
...
Merge develop into 0.3.x for 0.3.3
2018-08-30 23:42:19 +02:00
Martin Kroeker
5bac15adbd
Merge pull request #1746 from martin-frbg/issue1674
...
Assume cross-compilation if host and target os differ
2018-08-30 17:48:07 +02:00
Martin Kroeker
e17f969fa0
Assume cross-compilation if host and target os differ
...
fixes 1674
2018-08-30 13:28:46 +02:00
Martin Kroeker
e11126b26a
Merge pull request #1745 from martin-frbg/issue1743
...
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…
2018-08-29 07:43:58 +02:00
Martin Kroeker
74608e470d
Merge pull request #1744 from martin-frbg/lapack272
...
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
2018-08-28 22:58:58 +02:00
Martin Kroeker
f3fd44a731
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
...
fixes #1743
2018-08-28 21:34:07 +02:00
Martin Kroeker
9e917b16db
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
...
This could cause spurious "parameter has an illegal value" errors in DSYEVR and related routines, see https://github.com/Reference-LAPACK/lapack/issues/262
2018-08-28 21:11:54 +02:00
Martin Kroeker
8440a4cb1a
Merge pull request #1742 from martin-frbg/interim033
...
Add combination of old and new thread memory code selectable by new option USE_TLS
2018-08-28 08:02:15 +02:00
Martin Kroeker
b55690a659
typo fix
2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986
Rewrite glibc version check
2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd
Update memory.c
2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434
Merge branch 'develop' into interim033
2018-08-25 19:45:19 +02:00
Martin Kroeker
2caa2210bb
Add USE_TLS option to choose between old and new implementation of memory.c
2018-08-25 19:37:11 +02:00
Martin Kroeker
2a589c4b28
Add USE_TLS option to switch between old and new memory.c
2018-08-25 19:36:12 +02:00
Martin Kroeker
fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739
2018-08-25 19:35:16 +02:00
Martin Kroeker
52d3f7af50
Merge pull request #1738 from sharkcz/s390x
...
detect z14 arch on s390x
2018-08-16 09:46:34 +02:00
Dan Horák
5c6e020f49
detect z14 arch on s390x
2018-08-14 12:30:38 +02:00
maamountki
e6c0e39492
Optimize Zgemv
2018-08-13 12:23:40 +03:00
Martin Kroeker
d4d3113adc
Merge pull request #1731 from fenrus75/readme
...
add short blurb about avx512 and needed compiler to README
2018-08-13 00:01:37 +02:00
Martin Kroeker
375dff54fc
Merge pull request #1733 from fenrus75/dsymv
...
Add an AVX512 enabled DSYMV (L) function
2018-08-12 18:18:36 +02:00
Martin Kroeker
a5f165275a
Merge pull request #1732 from fenrus75/dgemv
...
Add an AVX512 enabled DGEMV (n) function
2018-08-12 18:17:42 +02:00
Martin Kroeker
8c13aa495a
Merge pull request #1730 from fenrus75/fix-sdot
...
Fix typo in sdot function
2018-08-12 18:17:01 +02:00
Martin Kroeker
1ee6d087c3
Merge pull request #1729 from fenrus75/dscal
...
Add an AVX512 enabled DSCAL function
2018-08-12 18:16:45 +02:00
Martin Kroeker
a95a784ab2
Merge pull request #1723 from maamountki/develop
...
Disable zgemv scale in gemv benchmark by default
2018-08-11 21:08:45 +02:00
Arjan van de Ven
9bec34cb67
Add an AVX512 enabled DSYMV (L) function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:46:24 +00:00
Arjan van de Ven
87bebdbd8a
Add an AVX512 enabled DGEMV (n) function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:38:12 +00:00
Arjan van de Ven
9493f26309
add short blurb about avx512 and needed compiler to README
2018-08-11 17:21:46 +00:00
Arjan van de Ven
36add7570a
Fix typo in sdot function
...
it looks like my previous pull request was short the final commit;
fix a typo in sdot
2018-08-11 17:16:45 +00:00
Arjan van de Ven
cacacc8007
Add an AVX512 enabled DSCAL function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:14:57 +00:00
Martin Kroeker
1a00ef3d27
Merge pull request #1725 from fenrus75/axpy
...
Add a AVX512 enabled SAXPY/DAXPY functions
2018-08-11 11:01:20 +02:00
Martin Kroeker
4c0d832ec3
Merge pull request #1724 from fenrus75/sdot
...
Add an AVX512 enabled SDOT function
2018-08-11 11:00:56 +02:00
Martin Kroeker
fc33cbc7bb
Merge pull request #1728 from martin-frbg/changelog
...
Add changes from the 0.3.x releases
2018-08-10 13:24:36 +02:00
Martin Kroeker
c52a831ae4
Add changes from the 0.3.x releases
...
fixes #1727
2018-08-10 13:23:47 +02:00
Arjan van de Ven
2e99873ff7
Add a AVX512 enabled SAXPY/DAXPY functions
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:58:32 +00:00
Arjan van de Ven
00abaa865b
Add an AVX512 enabled SDOT function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:33:43 +00:00
maamountki
33043f563f
Disable scal to benchmark zgemv separately by default
2018-08-10 01:54:18 +03:00
Martin Kroeker
66da7677bd
Merge pull request #1721 from fenrus75/ddot2
...
Add an AVX512 enabled DDOT function
2018-08-09 15:39:06 +02:00
Arjan van de Ven
7932ff3ea9
Add an AVX512 enabled DDOT function
...
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-09 03:55:52 +00:00