Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Ashwin Sekhar T K
caf339412f
ARM64: Remove dependency of THUNDERX2T99 Makefile on CORTEXA57 Makefile
2018-10-17 08:02:40 -07:00
Ashwin Sekhar T K
8001fdcd2a
ARM64: Remove dependency of THUNDERX Makefile on ARMV8 Makefile
2018-10-17 08:02:16 -07:00
Ashwin Sekhar T K
162e312832
ARM64: Remove dependency of CORTEXA57 Makefile on ARMV8 Makefile
2018-10-17 08:01:45 -07:00
Ashwin Sekhar T K
c3d93caa8d
ARM64: Remove dependency of XGENE1 Makefile on ARMV8 Makefile
2018-10-17 08:01:27 -07:00
Martin Kroeker
a71923514f
Merge pull request #1815 from fenrus75/sgemm_beta_fix
...
enable the SGEMM/SKX C based kernel
2018-10-14 19:57:34 +02:00
Arjan van de Ven
55b244ca0d
enable the SGEMM/SKX C based kernel
...
In QA the final bug was found so now the sklyakex sgemm C based kernel can
be activated....
2018-10-12 09:30:35 +00:00
Martin Kroeker
2263d3906c
Merge pull request #1812 from martin-frbg/issue1806-2
...
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake…
2018-10-11 21:51:31 +02:00
Martin Kroeker
81c9985c3a
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512
2018-10-11 11:03:27 +02:00
Martin Kroeker
56ebc7b53e
Merge pull request #1808 from martin-frbg/issue1806
...
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
2018-10-11 07:48:08 +02:00
Martin Kroeker
c5f88f5a57
Merge pull request #1807 from xianyi/revert-1798-cmake-avx512
...
Revert "Add -march=skylake-avx512 when required"
2018-10-11 07:47:53 +02:00
Martin Kroeker
8a11ec19d1
Syntax fix
2018-10-10 23:47:35 +02:00
Martin Kroeker
fa53b903db
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
...
Should fix 1806 and #1801
2018-10-10 19:22:01 +02:00
Martin Kroeker
84bcdf9c66
Revert "Add -march=skylake-avx512 when required"
2018-10-10 19:15:32 +02:00
Martin Kroeker
8f7e986184
Merge pull request #1802 from martin-frbg/issue1801
...
Use avx512 workaround with msys2/mingw64 as well
2018-10-10 08:52:53 +02:00
Martin Kroeker
d0e83666ad
Merge pull request #1804 from fenrus75/sgemm
...
Add a C+intrinsics version of the SGEMM/skylakex kernel
2018-10-10 08:50:44 +02:00
Arjan van de Ven
d4bad73834
Add a C+intrinsics version of the SGEMM/skylakex kernel
...
for most sizes this is 1.2x to 1.4x faster than the current code
2018-10-10 01:49:22 +00:00
Martin Kroeker
065763adde
Merge pull request #1800 from fengrl/patch-1
...
Update common_mips64.h for the 1st loop of blas_memory_alloc
2018-10-09 10:56:37 +02:00
Martin Kroeker
210b03b543
Merge pull request #1792 from martin-frbg/cmakesuffix
...
Improve CMake help output and add SYMBOLPREFIX and -SUFFIX options
2018-10-09 10:34:52 +02:00
Martin Kroeker
6234a32656
Use cygwin compilation workaround for avx512 on msys2/mingw64 as well
2018-10-09 10:31:59 +02:00
Martin Kroeker
c0d7cd3dac
Merge pull request #1799 from martin-frbg/issue1796
...
Handle conflicting usage of ARCH in at least some BSD environments
2018-10-09 08:20:52 +02:00
Martin Kroeker
667f0cc1cb
Merge pull request #1793 from fenrus75/ncopy
...
Add optimized *copy versions for skylakex
2018-10-09 08:19:14 +02:00
fengrl
d4c8853a02
Update common_mips64.h
2018-10-09 11:20:16 +08:00
Martin Kroeker
d3d58f8ee5
Catch conflicting usage of ARCH in at least some BSD environments
...
fixes #1796
2018-10-08 22:29:35 +02:00
Martin Kroeker
697dc1baf8
Use override for ARCH in make.inc
...
in case a conflicting setting of ARCH (for architecture) gets pulled in from the environment
(originally suggested by dloghin in #1753 )
2018-10-08 22:26:59 +02:00
Martin Kroeker
a9b51b8448
Merge pull request #1798 from martin-frbg/cmake-avx512
...
Add -march=skylake-avx512 when required
2018-10-08 21:15:17 +02:00
Martin Kroeker
eba394c711
Add -march=skylake-avx512 when required
...
fixes #1797
2018-10-08 19:18:12 +02:00
Arjan van de Ven
582c589727
dgemm/skylakex: replace discrete mul/add with fma
...
very minor gains since it's not super hot code, but general principles
2018-10-06 23:13:26 +00:00
Arjan van de Ven
adbf6afa25
Add vector optimizations for ncopy as well for dgemm/skylakex
2018-10-06 21:18:12 +00:00
Arjan van de Ven
32bec8afbb
add a skylakex optimized dgemm beta function
2018-10-06 16:36:26 +00:00
Martin Kroeker
6e2c494556
Merge pull request #1791 from dev-zero/develop
...
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
2018-10-06 16:29:29 +02:00
Arjan van de Ven
20c5d668fe
dgemm/avx512 simplify and speed up the 4x4 kernel
2018-10-06 14:12:32 +00:00
Arjan van de Ven
6d43c51ccf
undo slow dgemm/skylake microoptimization
...
the compare is more costly than the work
2018-10-06 14:00:37 +00:00
Arjan van de Ven
d74dc39b0f
Add optimized *copy versions for skylakex
...
Add optimized n/t copy versions for skylakex; in the patch the
tcopy is also rewritten using intrinsics; the ncopy file
will be worked on in a future commit
2018-10-06 13:51:44 +00:00
Martin Kroeker
41951da6d4
Merge pull request #6 from xianyi/develop
...
merge develop
2018-10-06 14:36:36 +02:00
Martin Kroeker
474f7e9583
Add SYMBOLPREFIX and -SUFFIX options and improve help output
2018-10-06 14:28:04 +02:00
Tiziano Müller
79ea839b63
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
...
The problem is that OpenBLAS sets the LAPACKE_LIB and the TMGLIB to the
same object and uses the `ar` feature to update the archive file. If the
underlying filesystem does not have sub-second timestamp resolution and
the system is fast enough (or `ccache` is used), the timestamp of the
builds which should be added to the previously generated archive is the
same as the archive file itself and therefore `make` does not update the
archive.
Since OpenBLAS takes care to not run the different targets updating the
archive in parallel, the easiest solution is to declare the respective
targets `.PHONY`, forcing `make` to always update them.
fixes #1682
2018-10-06 14:10:05 +02:00
Martin Kroeker
f7f97c6148
Merge pull request #1789 from brada4/develop
...
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 20:42:37 +02:00
Martin Kroeker
6f22e1cfb8
Merge pull request #1788 from fenrus75/avx512-8x16
...
skylake dgemm: Add a 16x8 kernel
2018-10-05 20:40:38 +02:00
Arjan van de Ven
66b43affbc
Add a 24x8 kernel to the skylakex dgemm implementation
...
Minor gains for small matrixes, but at 512x512 and above the gain
gets more significant.
2018-10-05 13:22:21 +00:00
Arjan van de Ven
1938819c25
skylake dgemm: Add a 16x8 kernel
...
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
2018-10-05 13:11:35 +00:00
Andrew
bda3dbe2eb
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 15:47:55 +03:00
Andrew
c3e0f0eb38
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 15:41:52 +03:00
Martin Kroeker
a980953bd7
Merge pull request #1785 from brada4/develop
...
address #1782 2nd loop
2018-10-05 08:25:38 +02:00
Martin Kroeker
78c99d5231
Merge pull request #1784 from fenrus75/dgemm-avx512
...
Create a AVX512 enabled version of DGEMM
2018-10-05 08:03:27 +02:00
Martin Kroeker
b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch
2018-10-04 19:14:59 +02:00
Martin Kroeker
95f4e87579
Merge pull request #1787 from jeromerobert/develop
...
Fix unknown type name __WAIT_STATUS on RHEL5
2018-10-04 18:41:47 +02:00
Jerome Robert
b095f2fad6
Fix unknown type name __WAIT_STATUS on RHEL5
...
With glibc 2.5 one must have #define _XOPEN_SOURCE >= 500 to use wait.
But reading glibc code this is actually needed only if stdlib.h was
included before sys/wait.h. This was the case here through
openblas_utest.h. So changing include fix compilation on RHEL5 and
should ne hurt with more recent distro.
* Problem found when using with gcc 5.5 and 4.7.2 on RHEL5/CENTOS5
* Fix #1519
2018-10-04 14:37:08 +02:00
Martin Kroeker
02ef20a1e4
Merge pull request #1786 from martin-frbg/immintrin
...
Check for Immintrin.h presence in the AVX512 compatibility test as well
2018-10-04 09:07:09 +02:00
Martin Kroeker
4c3643ed7f
Check availability of immintrin.h in the AVX512 compatibility test
2018-10-04 07:36:49 +02:00