Compare commits

..

237 Commits

Author SHA1 Message Date
Martin Kroeker
84bcdf9c66 Revert "Add -march=skylake-avx512 when required" 2018-10-10 19:15:32 +02:00
Martin Kroeker
8f7e986184 Merge pull request #1802 from martin-frbg/issue1801
Use avx512 workaround with msys2/mingw64 as well
2018-10-10 08:52:53 +02:00
Martin Kroeker
d0e83666ad Merge pull request #1804 from fenrus75/sgemm
Add a C+intrinsics version of the SGEMM/skylakex kernel
2018-10-10 08:50:44 +02:00
Arjan van de Ven
d4bad73834 Add a C+intrinsics version of the SGEMM/skylakex kernel
for most sizes this is 1.2x to 1.4x faster than the current code
2018-10-10 01:49:22 +00:00
Martin Kroeker
065763adde Merge pull request #1800 from fengrl/patch-1
Update common_mips64.h for the 1st loop of blas_memory_alloc
2018-10-09 10:56:37 +02:00
Martin Kroeker
210b03b543 Merge pull request #1792 from martin-frbg/cmakesuffix
Improve CMake help output and add SYMBOLPREFIX and -SUFFIX options
2018-10-09 10:34:52 +02:00
Martin Kroeker
6234a32656 Use cygwin compilation workaround for avx512 on msys2/mingw64 as well 2018-10-09 10:31:59 +02:00
Martin Kroeker
c0d7cd3dac Merge pull request #1799 from martin-frbg/issue1796
Handle conflicting usage of ARCH in at least some BSD environments
2018-10-09 08:20:52 +02:00
Martin Kroeker
667f0cc1cb Merge pull request #1793 from fenrus75/ncopy
Add optimized *copy versions for skylakex
2018-10-09 08:19:14 +02:00
fengrl
d4c8853a02 Update common_mips64.h 2018-10-09 11:20:16 +08:00
Martin Kroeker
d3d58f8ee5 Catch conflicting usage of ARCH in at least some BSD environments
fixes #1796
2018-10-08 22:29:35 +02:00
Martin Kroeker
697dc1baf8 Use override for ARCH in make.inc
in case a conflicting setting of ARCH (for architecture) gets pulled in from the environment
(originally suggested by dloghin in #1753)
2018-10-08 22:26:59 +02:00
Martin Kroeker
a9b51b8448 Merge pull request #1798 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 when required
2018-10-08 21:15:17 +02:00
Martin Kroeker
eba394c711 Add -march=skylake-avx512 when required
fixes #1797
2018-10-08 19:18:12 +02:00
Arjan van de Ven
582c589727 dgemm/skylakex: replace discrete mul/add with fma
very minor gains since it's not super hot code, but general principles
2018-10-06 23:13:26 +00:00
Arjan van de Ven
adbf6afa25 Add vector optimizations for ncopy as well for dgemm/skylakex 2018-10-06 21:18:12 +00:00
Arjan van de Ven
32bec8afbb add a skylakex optimized dgemm beta function 2018-10-06 16:36:26 +00:00
Martin Kroeker
6e2c494556 Merge pull request #1791 from dev-zero/develop
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
2018-10-06 16:29:29 +02:00
Arjan van de Ven
20c5d668fe dgemm/avx512 simplify and speed up the 4x4 kernel 2018-10-06 14:12:32 +00:00
Arjan van de Ven
6d43c51ccf undo slow dgemm/skylake microoptimization
the compare is more costly than the work
2018-10-06 14:00:37 +00:00
Arjan van de Ven
d74dc39b0f Add optimized *copy versions for skylakex
Add optimized n/t copy versions for skylakex; in the patch the
tcopy is also rewritten using intrinsics; the ncopy file
will be worked on in a future commit
2018-10-06 13:51:44 +00:00
Martin Kroeker
41951da6d4 Merge pull request #6 from xianyi/develop
merge develop
2018-10-06 14:36:36 +02:00
Martin Kroeker
474f7e9583 Add SYMBOLPREFIX and -SUFFIX options and improve help output 2018-10-06 14:28:04 +02:00
Tiziano Müller
79ea839b63 fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
The problem is that OpenBLAS sets the LAPACKE_LIB and the TMGLIB to the
same object and uses the `ar` feature to update the archive file. If the
underlying filesystem does not have sub-second timestamp resolution and
the system is fast enough (or `ccache` is used), the timestamp of the
builds which should be added to the previously generated archive is the
same as the archive file itself and therefore `make` does not update the
archive.

Since OpenBLAS takes care to not run the different targets updating the
archive in parallel, the easiest solution is to declare the respective
targets `.PHONY`, forcing `make` to always update them.

fixes #1682
2018-10-06 14:10:05 +02:00
Martin Kroeker
f7f97c6148 Merge pull request #1789 from brada4/develop
update travis alpine chroot with avx512 intrinsics headers
2018-10-05 20:42:37 +02:00
Martin Kroeker
6f22e1cfb8 Merge pull request #1788 from fenrus75/avx512-8x16
skylake dgemm: Add a 16x8 kernel
2018-10-05 20:40:38 +02:00
Arjan van de Ven
66b43affbc Add a 24x8 kernel to the skylakex dgemm implementation
Minor gains for small matrixes, but at 512x512 and above the gain
gets more significant.
2018-10-05 13:22:21 +00:00
Arjan van de Ven
1938819c25 skylake dgemm: Add a 16x8 kernel
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
2018-10-05 13:11:35 +00:00
Andrew
bda3dbe2eb update travis alpine chroot with avx512 intrinsics headers 2018-10-05 15:47:55 +03:00
Andrew
c3e0f0eb38 update travis alpine chroot with avx512 intrinsics headers 2018-10-05 15:41:52 +03:00
Martin Kroeker
a980953bd7 Merge pull request #1785 from brada4/develop
address #1782 2nd loop
2018-10-05 08:25:38 +02:00
Martin Kroeker
78c99d5231 Merge pull request #1784 from fenrus75/dgemm-avx512
Create a AVX512 enabled version of DGEMM
2018-10-05 08:03:27 +02:00
Martin Kroeker
b7496c3638 Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch 2018-10-04 19:14:59 +02:00
Martin Kroeker
95f4e87579 Merge pull request #1787 from jeromerobert/develop
Fix unknown type name __WAIT_STATUS on RHEL5
2018-10-04 18:41:47 +02:00
Jerome Robert
b095f2fad6 Fix unknown type name __WAIT_STATUS on RHEL5
With glibc 2.5 one must have #define _XOPEN_SOURCE >= 500 to use wait.
But reading glibc code this is actually needed only if stdlib.h was
included before sys/wait.h. This was the case here through
openblas_utest.h. So changing include fix compilation on RHEL5 and
should ne hurt with more recent distro.

* Problem found when using with gcc 5.5 and 4.7.2 on RHEL5/CENTOS5
* Fix #1519
2018-10-04 14:37:08 +02:00
Martin Kroeker
02ef20a1e4 Merge pull request #1786 from martin-frbg/immintrin
Check for Immintrin.h presence in the AVX512 compatibility test as well
2018-10-04 09:07:09 +02:00
Martin Kroeker
4c3643ed7f Check availability of immintrin.h in the AVX512 compatibility test 2018-10-04 07:36:49 +02:00
Martin Kroeker
591cca7cb0 Check availability of immintrin.h in the AVX512 compatibility test 2018-10-04 07:35:30 +02:00
Andrew
3439158dea address #1782 2nd loop 2018-10-03 21:20:50 +02:00
Arjan van de Ven
45fe8cb0c5 Create a AVX512 enabled version of DGEMM
This patch adds dgemm_kernel_4x8_skylakex.c which is
* dgemm_kernel_4x8_haswell.s converted to C + intrinsics
* 8x8 support added
* 8x8 kernel implemented using AVX512

Performance is a work in progress, but already shows a 10% - 20%
increase for a wide range of matrix sizes.
2018-10-03 14:45:25 +00:00
Martin Kroeker
544b069e85 Merge pull request #1780 from martin-frbg/issue1774-2
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-29 09:27:47 +02:00
Martin Kroeker
9b2a7ad40d Convert fldmia/fstmia instructions to UAL syntax for clang7
second part of fix for #1774, containing files missed in #1775
2018-09-28 23:05:15 +02:00
Martin Kroeker
10ce70701a Merge pull request #1778 from fengrl/develop
test_axpy work error on LOONGSON3A platform #1777
2018-09-26 11:14:58 +02:00
fengruilin
6fc85a6359 test_axpy work error on LOONGSON3A platform #1777 2018-09-26 15:14:04 +08:00
Martin Kroeker
831c661386 Merge pull request #1775 from martin-frbg/issue1774
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-25 18:58:39 +02:00
Martin Kroeker
7e5df34e6a Convert fldmia/fstmia instructions to UAL syntax for clang7
fixes #1774
2018-09-25 09:41:58 +02:00
Martin Kroeker
4f45040b89 Merge pull request #1773 from martin-frbg/issue1767
Include thread numbers in failure message from blas_thread_init
2018-09-23 23:25:15 +02:00
Martin Kroeker
28aa94bf4b Include thread numbers in failure message from blas_thread_init
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker
56e7c68810 Merge pull request #1771 from staticfloat/sf/ldflags
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
2018-09-22 13:11:39 +02:00
Martin Kroeker
cf6df9464c Document the stub status of the QUAD_PRECiSION code (#1772)
* Document the stub status of the QUAD_PRECiSION code inherited from GotoBLAS2

in response to #1769
2018-09-22 12:31:37 +02:00
Elliot Saba
6f77af2eef Add $(LDFLAGS) to $(CC) and $(FC) invocations within exports/Makefile 2018-09-21 09:19:51 +00:00
Martin Kroeker
4d183e5567 Merge pull request #1765 from martin-frbg/issue1761
Do not use the new TLS-enabled memory allocator for non-threaded builds, and disable TLS by default in gmake as well
2018-09-19 22:02:21 +02:00
Martin Kroeker
34d55fd165 Merge pull request #1764 from yurivict/64-suffix
Allow to install the 'interface64' version concurrently with the regular version
2018-09-19 18:16:38 +02:00
Martin Kroeker
b991570210 Merge pull request #1762 from martin-frbg/issue1710-2
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
288aeea8a2 Fix default settings - USE_TLS and USE_SIMPLE_THREADED_LEVEL3 should both be off 2018-09-19 18:08:31 +02:00
Martin Kroeker
1ad1e79062 Catch inadvertent USE_TLS=0 declaration
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509 Do not use the new TLS code for non-threaded builds even if USE_TLS is set
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
ec0cac1669 Merge pull request #4 from xianyi/develop
Update branch
2018-09-16 12:36:49 +02:00
Yuri
2349e15149 Allow to install the 'interfare64' version concurrently with the regular version 2018-09-15 21:00:03 -07:00
Martin Kroeker
f3c262156e Add an explicit cast to silence a warning
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8 Add explicit cast to silence a warning
for #1710
2018-09-13 14:23:31 +02:00
Martin Kroeker
fd081a91e4 Merge pull request #1759 from martin-frbg/lapack283
Remove an unused variable from several LAPACKE 2stage_work functions
2018-09-11 13:52:09 +02:00
Martin Kroeker
094f8c3b57 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:53:47 +02:00
Martin Kroeker
5cf090f516 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:52:30 +02:00
Martin Kroeker
58363542e7 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:51:17 +02:00
Martin Kroeker
3abc22a5bf Merge pull request #1757 from brada4/develop
fix small typo in strmm_ LN
2018-09-09 22:55:15 +02:00
Andrew
1e531701b7 fix small typo 2018-09-09 16:52:25 +02:00
Martin Kroeker
5d42b6ea04 Merge pull request #1756 from martin-frbg/issue1754
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
2018-09-07 11:02:18 +02:00
Martin Kroeker
ba4f433321 Merge pull request #1749 from martin-frbg/issue1531
Fix ARMV8 cross-compilation for IOS
2018-09-07 11:02:01 +02:00
Martin Kroeker
4cf7315a5d Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS 2018-09-06 21:41:54 +02:00
Martin Kroeker
b57af93792 just make CBLAS_LAYOUT an alias of the existing CBLAS_ORDER
to avoid having to change all instances of enum CBLAS_ORDER in this file
2018-09-06 16:54:31 +02:00
Martin Kroeker
8aeab0601e Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
fixes #1754
2018-09-06 16:39:52 +02:00
Martin Kroeker
1cb7b9015e Conditional compilation of assembly files that IOS does not like 2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2 Fix paths to C kernels for nrm2 2018-09-04 10:51:19 +02:00
Martin Kroeker
9e2bb0c641 Update with the changes from 0.3.3 2018-08-31 00:21:13 +02:00
Martin Kroeker
dbfd7524cd Update version to 0.3.4.dev 2018-08-31 00:19:21 +02:00
Martin Kroeker
2982ce505d Update version to 0.3.4.dev 2018-08-31 00:18:37 +02:00
Martin Kroeker
5bac15adbd Merge pull request #1746 from martin-frbg/issue1674
Assume cross-compilation if host and target os differ
2018-08-30 17:48:07 +02:00
Martin Kroeker
e17f969fa0 Assume cross-compilation if host and target os differ
fixes 1674
2018-08-30 13:28:46 +02:00
Martin Kroeker
e11126b26a Merge pull request #1745 from martin-frbg/issue1743
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…
2018-08-29 07:43:58 +02:00
Martin Kroeker
74608e470d Merge pull request #1744 from martin-frbg/lapack272
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
2018-08-28 22:58:58 +02:00
Martin Kroeker
f3fd44a731 Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
fixes #1743
2018-08-28 21:34:07 +02:00
Martin Kroeker
9e917b16db Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
This could cause spurious "parameter has an illegal value" errors in DSYEVR and related routines, see https://github.com/Reference-LAPACK/lapack/issues/262
2018-08-28 21:11:54 +02:00
Martin Kroeker
8440a4cb1a Merge pull request #1742 from martin-frbg/interim033
Add combination of old and new thread memory code selectable by new option USE_TLS
2018-08-28 08:02:15 +02:00
Martin Kroeker
b55690a659 typo fix 2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986 Rewrite glibc version check 2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd Update memory.c 2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434 Merge branch 'develop' into interim033 2018-08-25 19:45:19 +02:00
Martin Kroeker
2caa2210bb Add USE_TLS option to choose between old and new implementation of memory.c 2018-08-25 19:37:11 +02:00
Martin Kroeker
2a589c4b28 Add USE_TLS option to switch between old and new memory.c 2018-08-25 19:36:12 +02:00
Martin Kroeker
fd42ca462d Combo of default pre-0.3.1 memory.c and band-aided version of PR1739 2018-08-25 19:35:16 +02:00
Martin Kroeker
52d3f7af50 Merge pull request #1738 from sharkcz/s390x
detect z14 arch on s390x
2018-08-16 09:46:34 +02:00
Dan Horák
5c6e020f49 detect z14 arch on s390x 2018-08-14 12:30:38 +02:00
Martin Kroeker
d4d3113adc Merge pull request #1731 from fenrus75/readme
add short blurb about avx512 and needed compiler to README
2018-08-13 00:01:37 +02:00
Martin Kroeker
375dff54fc Merge pull request #1733 from fenrus75/dsymv
Add an AVX512 enabled DSYMV (L) function
2018-08-12 18:18:36 +02:00
Martin Kroeker
a5f165275a Merge pull request #1732 from fenrus75/dgemv
Add an AVX512 enabled DGEMV (n)  function
2018-08-12 18:17:42 +02:00
Martin Kroeker
8c13aa495a Merge pull request #1730 from fenrus75/fix-sdot
Fix typo in sdot function
2018-08-12 18:17:01 +02:00
Martin Kroeker
1ee6d087c3 Merge pull request #1729 from fenrus75/dscal
Add an AVX512 enabled DSCAL function
2018-08-12 18:16:45 +02:00
Martin Kroeker
a95a784ab2 Merge pull request #1723 from maamountki/develop
Disable zgemv scale in gemv benchmark by default
2018-08-11 21:08:45 +02:00
Arjan van de Ven
9bec34cb67 Add an AVX512 enabled DSYMV (L) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:46:24 +00:00
Arjan van de Ven
87bebdbd8a Add an AVX512 enabled DGEMV (n) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:38:12 +00:00
Arjan van de Ven
9493f26309 add short blurb about avx512 and needed compiler to README 2018-08-11 17:21:46 +00:00
Arjan van de Ven
36add7570a Fix typo in sdot function
it looks like my previous pull request was short the final commit;
fix a typo in sdot
2018-08-11 17:16:45 +00:00
Arjan van de Ven
cacacc8007 Add an AVX512 enabled DSCAL function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:14:57 +00:00
Martin Kroeker
1a00ef3d27 Merge pull request #1725 from fenrus75/axpy
Add a AVX512 enabled SAXPY/DAXPY functions
2018-08-11 11:01:20 +02:00
Martin Kroeker
4c0d832ec3 Merge pull request #1724 from fenrus75/sdot
Add an AVX512 enabled SDOT function
2018-08-11 11:00:56 +02:00
Martin Kroeker
fc33cbc7bb Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
2018-08-10 13:24:36 +02:00
Martin Kroeker
c52a831ae4 Add changes from the 0.3.x releases
fixes #1727
2018-08-10 13:23:47 +02:00
Arjan van de Ven
2e99873ff7 Add a AVX512 enabled SAXPY/DAXPY functions
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:58:32 +00:00
Arjan van de Ven
00abaa865b Add an AVX512 enabled SDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-10 02:33:43 +00:00
maamountki
33043f563f Disable scal to benchmark zgemv separately by default 2018-08-10 01:54:18 +03:00
Martin Kroeker
66da7677bd Merge pull request #1721 from fenrus75/ddot2
Add an AVX512 enabled DDOT function
2018-08-09 15:39:06 +02:00
Arjan van de Ven
7932ff3ea9 Add an AVX512 enabled DDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-09 03:55:52 +00:00
Martin Kroeker
62f4c69708 Merge pull request #1717 from martin-frbg/issue1708
Add workaround for avx512 compilations on Cygwin
2018-08-06 22:05:47 +02:00
Martin Kroeker
73478664d4 Add workaround for avx512 compilations on Cygwin
fixes #1708
2018-08-06 16:40:32 +02:00
Martin Kroeker
ee955757f9 Merge pull request #1715 from stevengj/patch-1
fix blasabs for windows
2018-08-05 22:48:44 +02:00
Steven G. Johnson
48610a4524 fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker
4a553e8678 Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
e788102c10 Merge pull request #1709 from stevengj/patch-1
fabs -> fabsl
2018-08-04 23:51:10 +02:00
Martin Kroeker
165f00c159 fabs -> fabsl 2018-08-04 20:14:51 +02:00
Martin Kroeker
40c068a875 Introduce blasabs() to switch between abs() and labs() for INTERFACE64 2018-08-04 20:07:59 +02:00
Martin Kroeker
933896a1d0 Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
Steven G. Johnson
a4e321400b fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
2018-08-03 13:00:10 -04:00
Martin Kroeker
9e65430504 Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
2018-08-02 23:48:42 +02:00
Martin Kroeker
2cfa86b406 Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
2018-08-02 22:27:00 +02:00
Scott Thornton
2a9a9389ef Added target_include_directories() 2018-08-02 14:58:52 -05:00
Zoltán Mizsei
6463bffd59 Haiku supporting patches 2018-08-02 20:49:14 +02:00
Martin Kroeker
8ef7d4fb54 Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner
6400868e55 Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly.  Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Scott Thornton
8ebf541e97 Set EXPORT_NAME to match OpenBLASConfig.cmake 2018-07-30 15:18:29 -05:00
Martin Kroeker
b03ae3f4dc Set version to 0.3.3.dev 2018-07-30 08:23:13 +02:00
Martin Kroeker
2cc8fb0ad2 Set version to 0.3.3.dev 2018-07-30 08:22:38 +02:00
Martin Kroeker
64826a0d7d Merge branch 'release-0.3.0' into develop 2018-07-29 22:37:09 +02:00
Martin Kroeker
25f2d25cfe Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
2018-07-25 19:55:29 +02:00
Martin Kroeker
73131fa30a Do not treat WIndows UWB builds as cross-compiling 2018-07-24 17:46:33 +02:00
Martin Kroeker
66fcdd5be8 Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker
43ac839c16 Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker
7ba5936ecd Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker
b14f44d2ad Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker
e71d70ba87 Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
2018-07-16 22:47:05 +02:00
Martin Kroeker
d671870f5f Merge pull request #1684 from martin-frbg/issue1672
Work around utest failures in the MIPS64 SICORTEX target
2018-07-16 22:46:49 +02:00
Martin Kroeker
4e103c822c typo fix 2018-07-16 12:56:39 +02:00
Martin Kroeker
d2142760e0 Fix precision problem in DSDOT 2018-07-15 17:11:40 +02:00
Martin Kroeker
2fbfc64da8 Use C kernels for default c/zAXPY, xROT, c/zSWAP 2018-07-15 17:09:55 +02:00
Martin Kroeker
8d5b33b6be Add cpu identification via mfpvr call for the BSDs
fixes #1671
2018-07-12 23:39:00 +02:00
Martin Kroeker
36aea5ce2d Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
2018-07-12 14:05:13 +02:00
Martin Kroeker
1309711e24 Fix declaration of snprintf for older MSVC
_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)
2018-07-12 11:47:52 +02:00
Martin Kroeker
571e9de2ac Fix definition of snprintf for MSVC
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
2018-07-12 11:42:25 +02:00
Martin Kroeker
448ed15115 Merge pull request #1678 from martin-frbg/issue1677
Define snprintf for older versions of MSVC
2018-07-12 09:21:34 +02:00
Martin Kroeker
045fb5ea2c Define snprintf for older versions of MSVC
for #1677
2018-07-12 07:30:58 +02:00
Martin Kroeker
4dd70d98d7 Merge pull request #1667 from xianyi/revert-1642-develop
Revert "Rewrite &= -> = and simplify the initial blocking phase."
2018-07-04 08:27:21 +02:00
Martin Kroeker
504310eeb9 Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
2018-07-04 08:19:40 +02:00
Martin Kroeker
ea1f39518f Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
2018-07-04 08:19:11 +02:00
Martin Kroeker
5f2a3c05cd Revert "Rewrite &= -> = and simplify the initial blocking phase." 2018-07-03 21:42:28 +02:00
Martin Kroeker
d0ec4325cf Add cpuid for AMD Ryzen 2 2018-07-03 21:03:24 +02:00
Martin Kroeker
3f73e8b8cf Add cpuid for AMD Ryzen 2
for #1664
2018-07-03 21:01:35 +02:00
Martin Kroeker
a83f01e0ee Merge pull request #1662 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
2018-07-03 17:40:09 +02:00
Martin Kroeker
a49203b48c Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker
b74aef2816 Add -march=skylake-avx512 to AVX512 compile check and suppress its output 2018-07-03 14:41:44 +02:00
Martin Kroeker
a9fa805007 Merge pull request #1660 from martin-frbg/issue1659
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
2018-07-02 17:48:19 +02:00
Martin Kroeker
9d15a3bd16 Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
fixes 1659
2018-07-02 14:40:41 +02:00
Martin Kroeker
c6aec89d10 Merge pull request #1657 from martin-frbg/release-0.3.0
Release 0.3.1
2018-07-01 12:03:07 +02:00
Martin Kroeker
bbf2124970 set version number to 0.3.2.dev 2018-07-01 12:01:51 +02:00
Martin Kroeker
1392eba488 set version number to 0.3.2.dev 2018-07-01 12:01:16 +02:00
Martin Kroeker
e6d7711199 remove dev suffix from version number 2018-07-01 11:59:47 +02:00
Martin Kroeker
7a914347c5 remove dev suffix from version number 2018-07-01 11:58:57 +02:00
Martin Kroeker
61659f8765 Merge pull request #1648 from martin-frbg/nofort
Handle NOFORTRAN=0
2018-07-01 11:56:40 +02:00
Martin Kroeker
3a8f0a6a1f Merge pull request #1656 from xianyi/develop
Update the 0.3 branch from develop
2018-07-01 11:55:21 +02:00
Martin Kroeker
3d3c19717c Merge pull request #1655 from martin-frbg/issue1641
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
2018-07-01 08:41:22 +02:00
Martin Kroeker
24e344038d Merge pull request #1654 from martin-frbg/avx512check
Add compiler option to avx512 test and hide test output
2018-07-01 01:17:03 +02:00
Martin Kroeker
4e9c34018e Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
fixes #1641
2018-06-30 23:57:50 +02:00
Martin Kroeker
f5243e8e1f Add compiler option to avx512 test and hide test output 2018-06-30 23:47:44 +02:00
Martin Kroeker
ba8388cee0 Merge pull request #1651 from martin-frbg/avx512-nodgemm
Disable the 16x2 DTRMM kernel on SkylakeX as well
2018-06-30 17:48:03 +02:00
Martin Kroeker
6e54b0a027 Disable the 16x2 DTRMM kernel on SkylakeX as well 2018-06-30 17:31:06 +02:00
Martin Kroeker
40c8cbc3bf Merge pull request #1650 from martin-frbg/avx512-nodgemm
Disable the AVX512 DGEMM kernel for now
2018-06-30 13:05:46 +02:00
Martin Kroeker
d3c9eb4c7d Merge pull request #1639 from martin-frbg/dyn_list
Add DYNAMIC_LIST option for user-defined list of dynamic targets
2018-06-30 13:05:30 +02:00
Martin Kroeker
f0a8dc2eec Disable the AVX512 DGEMM kernel for now
due to #1643
2018-06-30 11:34:48 +02:00
Martin Kroeker
cc92257ea6 Update Makefile 2018-06-27 00:09:21 +02:00
Martin Kroeker
2aba1b1658 Merge branch 'develop' into nofort 2018-06-27 00:07:32 +02:00
Martin Kroeker
8396e9e777 Handle NOFORTRAN=0 2018-06-27 00:00:27 +02:00
Martin Kroeker
bfad307ed7 Merge pull request #1647 from martin-frbg/armv7-dot
Remove premature exits from ARMV7 xdot codes
2018-06-26 22:27:30 +02:00
Martin Kroeker
b83e4c60c7 Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:46:42 +02:00
Martin Kroeker
e344db269b Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:45:57 +02:00
Martin Kroeker
545b82efd3 Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:45:00 +02:00
Martin Kroeker
e322a951fe Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:44:13 +02:00
Martin Kroeker
ff2f171036 Merge pull request #1644 from martin-frbg/revert-filterout
Revert changes to NOFORTRAN handling in Makefile
2018-06-26 10:15:15 +02:00
Martin Kroeker
092175cfec Revert changes to NOFORTRAN handling from 952541e 2018-06-26 08:09:52 +02:00
Martin Kroeker
750162a05f Try gradual fallback for cores not in the dynamic core list 2018-06-25 21:02:31 +02:00
Martin Kroeker
e6d93f20f1 Merge pull request #2 from martin-frbg/develop
merge develop
2018-06-25 20:48:10 +02:00
Martin Kroeker
c38c65eb65 Merge pull request #1 from xianyi/develop
Merge xianyi:develop into develop
2018-06-25 20:45:56 +02:00
Martin Kroeker
ce3651516f Merge pull request #1642 from oon3m0oo/develop
Rewrite &= -> = and simplify the initial blocking phase.
2018-06-25 19:23:40 +02:00
Craig Donner
0144068537 Rewrite &= -> = and simplify the initial blocking phase. 2018-06-25 15:08:55 +01:00
Martin Kroeker
1833a67071 Add support for a user-defined list of dynamic targets 2018-06-23 19:42:15 +02:00
Martin Kroeker
0b2b83d9ed Add support for a user-defined list of dynamic targets 2018-06-23 19:41:32 +02:00
Martin Kroeker
62cf769aa6 Merge pull request #1638 from martin-frbg/issue1637
Expose the CBLAS interface to the IxAMIN functions and have make build it
2018-06-23 15:01:02 +02:00
Martin Kroeker
eb71d61c7c Expose CBLAS interface to BLAS extensions iXamin 2018-06-23 13:31:09 +02:00
Martin Kroeker
9cf22b7d91 Build cblas_iXamin interfaces 2018-06-23 13:27:30 +02:00
Martin Kroeker
cc66743b66 Merge pull request #1634 from oon3m0oo/develop
Fix data races reported by TSAN.
2018-06-21 21:01:03 +02:00
oon3m0oo
2aa0a5804e Use BLAS rather than CBLAS in test_fork.c (#1626)
This is handy for people not using lapack.
2018-06-21 18:47:45 +02:00
Craig Donner
28c28ed275 Fix data races reported by TSAN. 2018-06-21 16:41:02 +01:00
oon3m0oo
a399d00425 Further improvements to memory.c. (#1625)
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
  over-allocating; this saves having external structures do the bookkeeping, and
  reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
  my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
  read than before
2018-06-20 22:04:03 +02:00
Martin Kroeker
f66b9c8826 Merge pull request #1630 from martin-frbg/x86-march
Add -march=skylake-avx512 to flags if target is skylake x
2018-06-20 21:51:57 +02:00
Martin Kroeker
2946c46024 Merge pull request #1631 from oon3m0oo/stack
Avoid declaring arrays of size 0 when making large stack allocations.
2018-06-20 21:51:38 +02:00
Craig Donner
05978528c3 Avoid declaring arrays of size 0 when making large stack allocations. 2018-06-20 17:03:18 +01:00
Martin Kroeker
ef6f0b645e Merge pull request #1629 from martin-frbg/issue1628
Make gfortran link libomp for clang in the tests; avoid two typical gotchas with NOFORTRAN
2018-06-20 16:41:13 +02:00
Martin Kroeker
0c5b7b400b Add -march=skylake-avx512 to flags if target is skylake x 2018-06-20 15:16:19 +02:00
Martin Kroeker
952541e840 Need to use filter-out to handle NOFORTRAN not set 2018-06-20 13:20:30 +02:00
Martin Kroeker
9369d3e6e5 Modify NOFORTRAN tests to always check the value; fix rewriting of NO_FORTRAN 2018-06-19 23:28:06 +02:00
Martin Kroeker
10b70c904d Handle erroneous user settings NOFORTRAN=0 and NO_FORTRAN 2018-06-19 20:53:19 +02:00
Martin Kroeker
6a5ab083b7 Handle special case of gfortran+clang+OpenMP 2018-06-19 20:47:33 +02:00
Martin Kroeker
1f9e4f3193 Handle special case of gfortran+clang+OpenMP 2018-06-19 20:46:36 +02:00
Martin Kroeker
5a6a2bed9a Merge pull request #1623 from fenrus75/fast-thread
Initialize only the required subset of the jobs array, fix barriers and improve switch ratio on SkylakeX and Haswell. For issue #1622
2018-06-18 09:02:40 +02:00
Martin Kroeker
2d8cc7193a Support upcoming Intel Cannon Lake CPUs as Skylake X (#1621)
* Support  upcoming Cannon Lake as Skylake X
2018-06-17 23:38:14 +02:00
Arjan van de Ven
2ddc96c9e5 make WMB / MB safer on x86-64
make it so that

if (foo)
	RMB;
else
	MB;

is always done correctly and without syntax surprises
2018-06-17 18:06:24 +00:00
Arjan van de Ven
7e39ffe113 On x86-64, make MB/WMB compiler barriers
Whie on x86(64) one does not normally need full memory barriers, it's
good practice to at least use compiler barriers for places where on other
architectures memory barriers are used; this prevents the compiler
from over-optimizing.
2018-06-17 17:53:15 +00:00
Arjan van de Ven
73de17664d Add missing barriers in gemm scheduler
a few places in the gemm scheduler code were missing barriers;
the code likely worked OK due to heavy use of volatile / _Atomic
but there's no reason to get this incorrect
2018-06-17 17:50:43 +00:00
Arjan van de Ven
6eb4b9ae7c Tune HASWELL SWITCH_RATIO as well
Similar to the SKYLAKEX patch, 32 seems to work best
(much better than 4 or 16)

Before (4)

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               15554.3    7.2       0.2%                             30353.8      3.7       0.3%
  64 x 64               30346.8    8.7       1.6%                             63495.0      4.1      -0.1%
  65 x 65               81668.1    3.4    -123.3%                             82705.2      3.3     -21.2%
  80 x 80              105045.9    4.9     -95.5%                            115226.0      4.5      -2.2%
  96 x 96              152461.2    5.8     -74.3%                            148156.3      6.0      16.4%
 112 x 112             188505.2    7.5     -42.2%                            171187.3      8.2      36.4%
 128 x 128             257884.0    8.1     -39.5%                            224764.8      9.3      46.0%

Intermediate (16)

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               15565.7    7.2       0.2%                             30378.9      3.7       0.2%
  64 x 64               30430.2    8.7       1.3%                             63046.4      4.2       0.6%
  65 x 65               27306.0   10.1      25.3%                             38879.2      7.1      43.0%
  80 x 80               51008.7   10.1       5.1%                             61007.6      8.4      45.9%
  96 x 96               70856.7   12.5      19.0%                             83403.1     10.6      53.0%
 112 x 112              84769.9   16.6      36.0%                             99920.1     14.1      62.9%
 128 x 128              84213.2   25.0      54.5%                            113024.2     18.6      72.8%

After (32)

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               15537.3    7.2       0.3%                             30537.0      3.6      -0.3%
  64 x 64               30352.7    8.7       1.6%                             62597.8      4.2       1.3%
  65 x 65               36857.0    7.5      -0.8%                             56167.6      4.9      17.7%
  80 x 80               42552.6   12.1      20.8%                             69536.7      7.4      38.3%
  96 x 96               52101.5   17.1      40.5%                             91016.1      9.7      48.7%
 112 x 112              63853.7   22.1      51.8%                            110507.4     12.7      58.9%
 128 x 128              73966.1   28.4      60.0%                            163146.4     12.9      60.8%
2018-06-17 17:08:36 +00:00
Arjan van de Ven
5c6f008365 Tune param.h for SkylakeX
param.h defines a per-platform SWITCH_RATIO, which is used as a measure for how fine
grained the blocks for gemm need to be split up. Many platforms define this to 4.

The reality is that the gemm low level implementation for SkylakeX likes bigger blocks
due to the nature of SIMD... by tuning the SWITCH_RATIO to 32 the threading performance
improves significantly:

Before
   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10756.0   10.5      -0.5%                             18296.7      6.1      -1.7%
  64 x 64               20490.0   12.9       1.4%                             40615.0      6.5       0.0%
  65 x 65               83528.3    3.3    -210.9%                             96319.0      2.9     -83.3%
  80 x 80              101453.5    5.1    -166.3%                            128021.7      4.0     -76.6%
  96 x 96              149795.1    5.9    -143.1%                            168059.4      5.3     -47.4%
 112 x 112             191481.2    7.3    -105.8%                            204165.0      6.9     -14.6%
 128 x 128             265019.2    7.9     -99.0%                            272006.4      7.7      -5.3%

After
   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10666.3   10.6       0.4%                             18236.9      6.2      -1.4%
  64 x 64               20410.1   13.0       1.8%                             39925.8      6.6       1.7%
  65 x 65               34983.0    7.9     -30.2%                             51494.6      5.4       2.0%
  80 x 80               39769.1   13.0      -4.4%                             63805.2      8.1      12.0%
  96 x 96               45169.6   19.7      26.7%                             80065.8     11.1      29.8%
 112 x 112              57026.1   24.7      38.7%                             99535.5     14.2      44.1%
 128 x 128              64789.8   32.5      51.3%                            117407.2     17.9      54.6%

With this change, threading starts to be a win already at 96x96
2018-06-17 15:47:50 +00:00
Arjan van de Ven
d148ec4ea1 Don't use _Atomic for jobs sometimes...
The use of _Atomic leads to really bad code generation in the compiler
(on x86, you get 2 "mfence" memory barriers around each access with gcc8, despite
x86 being ordered and cache coherent). But there's a fallback in the code that
just uses volatile which is more than plenty in practice.

If we're nervous about cross thread synchronization for these variables, we should
make the YIELD function be a compiler/memory barrier instead.

performance before (after last commit)

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10630.0   10.6       0.7%                             18112.8      6.2      -0.7%
  64 x 64               20374.8   13.0       1.9%                             40487.0      6.5       0.4%
  65 x 65              141955.2    1.9    -428.3%                            146708.8      1.9    -179.2%
  80 x 80              178921.1    2.9    -369.6%                            186032.7      2.8    -156.6%
  96 x 96              205436.2    4.3    -233.4%                            224513.1      3.9     -97.0%
 112 x 112             244408.2    5.8    -162.7%                            262158.7      5.4     -47.1%
 128 x 128             321334.5    6.5    -141.3%                            333829.0      6.3     -29.2%

Performance with this patch (roughly a 2x improvement):

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10756.0   10.5      -0.5%                             18296.7      6.1      -1.7%
  64 x 64               20490.0   12.9       1.4%                             40615.0      6.5       0.0%
  65 x 65               83528.3    3.3    -210.9%                             96319.0      2.9     -83.3%
  80 x 80              101453.5    5.1    -166.3%                            128021.7      4.0     -76.6%
  96 x 96              149795.1    5.9    -143.1%                            168059.4      5.3     -47.4%
 112 x 112             191481.2    7.3    -105.8%                            204165.0      6.9     -14.6%
 128 x 128             265019.2    7.9     -99.0%                            272006.4      7.7      -5.3%
2018-06-17 15:39:15 +00:00
Arjan van de Ven
9e162146a9 Only initialize the part of the jobs array that will get used
The jobs array is getting initialized in O(compiled cpus^2) complexity.
Distros and people with bigger systems will use pretty high values
(128 or 256 or more) for this value, leading to interesting bubbles
in performance.

Baseline (single threaded performance) gets roughly 13 - 15 multiplications per cycle
in the interesting range (threading kicks in at 65x65 mult by 65x65).
The hardware is capable of 32 multiplications per cycle theoretically.

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10703.9   10.6       0.0%                             17990.6      6.3       0.0%
  64 x 64               20778.4   12.8       0.0%                             40629.2      6.5       0.0%
  65 x 65               26869.9   10.3       0.0%                             52545.7      5.3       0.0%
  80 x 80               38104.5   13.5       0.0%                             72492.7      7.1       0.0%
  96 x 96               61626.4   14.4       0.0%                            113983.8      7.8       0.0%
 112 x 112              91803.8   15.3       0.0%                            180987.3      7.8       0.0%
 128 x 128             133161.4   15.8       0.0%                            258374.3      8.1       0.0%

When threading is turned on
TARGET=SKYLAKEX F_COMPILER=GFORTRAN  SHARED=1 DYNAMIC_THREADS=1 USE_OPENMP=0  NUM_THREADS=128

  Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10725.9   10.5      -0.2%                             18134.9      6.2      -0.8%
  64 x 64               20500.6   12.9       1.3%                             40929.1      6.5      -0.7%
  65 x 65             2040832.1    0.1   -7495.2%                           2097633.6      0.1   -3892.0%
  80 x 80             2063129.1    0.2   -5314.4%                           2119925.2      0.2   -2824.3%
  96 x 96             2070374.5    0.4   -3259.6%                           2173604.4      0.4   -1806.9%
 112 x 112            2111721.5    0.7   -2169.6%                           2263330.8      0.6   -1170.0%
 128 x 128            2276181.5    0.9   -1609.3%                           2377228.9      0.9    -820.1%

There is a deep deep cliff once you hit 65x65

With this patch

   Matrix          SGEMM cycles    MPC                                   DGEMM cycles      MPC
  48 x 48               10630.0   10.6       0.7%                             18112.8      6.2      -0.7%
  64 x 64               20374.8   13.0       1.9%                             40487.0      6.5       0.4%
  65 x 65              141955.2    1.9    -428.3%                            146708.8      1.9    -179.2%
  80 x 80              178921.1    2.9    -369.6%                            186032.7      2.8    -156.6%
  96 x 96              205436.2    4.3    -233.4%                            224513.1      3.9     -97.0%
 112 x 112             244408.2    5.8    -162.7%                            262158.7      5.4     -47.1%
 128 x 128             321334.5    6.5    -141.3%                            333829.0      6.3     -29.2%

The cliff is very significantly reduced.
(more to follow)
2018-06-17 15:32:03 +00:00
Martin Kroeker
47bf0dba8f Add build-time option for OMP scheduler; document MULTITHREAD_THRESHOLD range (#1620)
* Allow choosing the OpenMP scheduler and add range hint for GEMM_MULTITHREAD_THRESHOLD
* Amended description of GEMM_MULTITHREAD_THRESHOLD
to reflect #742 making it track floating point operations rather than matrix size
2018-06-15 11:25:05 +02:00
Martin Kroeker
12603b7dbb Merge pull request #1618 from oon3m0oo/less_locking
Remove the need for most locking in memory.c.
2018-06-15 00:10:29 +02:00
Craig Donner
bf40f806ef Remove the need for most locking in memory.c.
Using thread local storage for tracking memory allocations means that threads
no longer have to lock at all when doing memory allocations / frees. This
particularly helps the gemm driver since it does an allocation per invocation.
Even without threading at all, this helps, since even calling a lock with
no contention has a cost:

Before this change, no threading:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          102 ns        102 ns   13504412
BM_SGEMM/6          175 ns        175 ns    7997580
BM_SGEMM/8          205 ns        205 ns    6842073
BM_SGEMM/10         266 ns        266 ns    5294919
BM_SGEMM/16         478 ns        478 ns    2963441
BM_SGEMM/20         690 ns        690 ns    2144755
BM_SGEMM/32        1906 ns       1906 ns     716981
BM_SGEMM/40        2983 ns       2983 ns     473218
BM_SGEMM/64        9421 ns       9422 ns     148450
BM_SGEMM/72       12630 ns      12631 ns     112105
BM_SGEMM/80       15845 ns      15846 ns      89118
BM_SGEMM/90       25675 ns      25676 ns      54332
BM_SGEMM/100      29864 ns      29865 ns      47120
BM_SGEMM/112      37841 ns      37842 ns      36717
BM_SGEMM/128      56531 ns      56532 ns      25361
BM_SGEMM/140      75886 ns      75888 ns      18143
BM_SGEMM/150      98493 ns      98496 ns      14299
BM_SGEMM/160     102620 ns     102622 ns      13381
BM_SGEMM/170     135169 ns     135173 ns      10231
BM_SGEMM/180     146170 ns     146172 ns       9535
BM_SGEMM/189     190226 ns     190231 ns       7397
BM_SGEMM/200     194513 ns     194519 ns       7210
BM_SGEMM/256     396561 ns     396573 ns       3531
```
with this change:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   14500387
BM_SGEMM/6          166 ns        166 ns    8381763
BM_SGEMM/8          196 ns        196 ns    7277044
BM_SGEMM/10         256 ns        256 ns    5515721
BM_SGEMM/16         463 ns        463 ns    3025197
BM_SGEMM/20         636 ns        636 ns    2070213
BM_SGEMM/32        1885 ns       1885 ns     739444
BM_SGEMM/40        2969 ns       2969 ns     472152
BM_SGEMM/64        9371 ns       9372 ns     148932
BM_SGEMM/72       12431 ns      12431 ns     112919
BM_SGEMM/80       15615 ns      15616 ns      89978
BM_SGEMM/90       25397 ns      25398 ns      55041
BM_SGEMM/100      29445 ns      29446 ns      47540
BM_SGEMM/112      37530 ns      37531 ns      37286
BM_SGEMM/128      55373 ns      55375 ns      25277
BM_SGEMM/140      76241 ns      76241 ns      18259
BM_SGEMM/150     102196 ns     102200 ns      13736
BM_SGEMM/160     101521 ns     101525 ns      13556
BM_SGEMM/170     136182 ns     136184 ns      10567
BM_SGEMM/180     146861 ns     146864 ns       9035
BM_SGEMM/189     192632 ns     192632 ns       7231
BM_SGEMM/200     198547 ns     198555 ns       6995
BM_SGEMM/256     392316 ns     392330 ns       3539
```

Before, when built with USE_THREAD=1, GEMM_MULTITHREAD_THRESHOLD = 4, the cost
of small matrix operations was overshadowed by thread locking (look smaller than
32) even when not explicitly spawning threads:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          328 ns        328 ns    4170562
BM_SGEMM/6          396 ns        396 ns    3536400
BM_SGEMM/8          418 ns        418 ns    3330102
BM_SGEMM/10         491 ns        491 ns    2863047
BM_SGEMM/16         710 ns        710 ns    2028314
BM_SGEMM/20         871 ns        871 ns    1581546
BM_SGEMM/32        2132 ns       2132 ns     657089
BM_SGEMM/40        3197 ns       3196 ns     437969
BM_SGEMM/64        9645 ns       9645 ns     144987
BM_SGEMM/72       35064 ns      32881 ns      50264
BM_SGEMM/80       37661 ns      35787 ns      42080
BM_SGEMM/90       36507 ns      36077 ns      40091
BM_SGEMM/100      32513 ns      31850 ns      48607
BM_SGEMM/112      41742 ns      41207 ns      37273
BM_SGEMM/128      67211 ns      65095 ns      21933
BM_SGEMM/140      68263 ns      67943 ns      19245
BM_SGEMM/150     121854 ns     115439 ns      10660
BM_SGEMM/160     116826 ns     115539 ns      10000
BM_SGEMM/170     126566 ns     122798 ns      11960
BM_SGEMM/180     130088 ns     127292 ns      11503
BM_SGEMM/189     120309 ns     116634 ns      13162
BM_SGEMM/200     114559 ns     110993 ns      10000
BM_SGEMM/256     217063 ns     207806 ns       6417
```
and after, it's gone (note this includes my other change which reduces calls
to num_cpu_avail):
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   12347650
BM_SGEMM/6          166 ns        166 ns    8259683
BM_SGEMM/8          193 ns        193 ns    7162210
BM_SGEMM/10         258 ns        258 ns    5415657
BM_SGEMM/16         471 ns        471 ns    2981009
BM_SGEMM/20         666 ns        666 ns    2148002
BM_SGEMM/32        1903 ns       1903 ns     738245
BM_SGEMM/40        2969 ns       2969 ns     473239
BM_SGEMM/64        9440 ns       9440 ns     148442
BM_SGEMM/72       37239 ns      33330 ns      46813
BM_SGEMM/80       57350 ns      55949 ns      32251
BM_SGEMM/90       36275 ns      36249 ns      42259
BM_SGEMM/100      31111 ns      31008 ns      45270
BM_SGEMM/112      43782 ns      40912 ns      34749
BM_SGEMM/128      67375 ns      64406 ns      22443
BM_SGEMM/140      76389 ns      67003 ns      21430
BM_SGEMM/150      72952 ns      71830 ns      19793
BM_SGEMM/160      97039 ns      96858 ns      11498
BM_SGEMM/170     123272 ns     122007 ns      11855
BM_SGEMM/180     126828 ns     126505 ns      11567
BM_SGEMM/189     115179 ns     114665 ns      11044
BM_SGEMM/200      89289 ns      87259 ns      16147
BM_SGEMM/256     226252 ns     222677 ns       7375
```

I've also tested this with ThreadSanitizer and found no data races during
execution.  I'm not sure why 200 is always faster than it's neighbors, we must
be hitting some optimal cache size or something.
2018-06-14 16:54:58 +01:00
Martin Kroeker
ed682a4a0c Merge pull request #1619 from martin-frbg/issue1580
Update OSX deployment target to 10.8
2018-06-14 17:48:51 +02:00
Martin Kroeker
fcb77ab129 Update OSX deployment target to 10.8
fixes #1580
2018-06-14 16:57:58 +02:00
Martin Kroeker
26e1cfb653 Merge pull request #1607 from martin-frbg/dynarch
Move some x86_64 DYNAMIC_ARCH targets to new DYNAMIC_OLDER option
2018-06-14 16:52:55 +02:00
Martin Kroeker
c628c6fa59 Merge pull request #1612 from oon3m0oo/cpus
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-14 16:51:31 +02:00
Martin Kroeker
67d81ab49d Merge pull request #1609 from martin-frbg/issue1529
Create OpenBLASConfig.cmake in cmake builds as well
2018-06-12 23:00:24 +02:00
Martin Kroeker
2f957947a6 Merge pull request #1613 from xianyi/revert-1600-noyield
Revert "Use usleep instead of sched_yield by default"
2018-06-11 17:14:49 +02:00
Craig Donner
c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Martin Kroeker
e65f451409 include CMakePackageConfigHelpers 2018-06-10 15:09:43 +02:00
Martin Kroeker
02634b549b Add template for OpenBLASConfig.cmake 2018-06-10 09:25:46 +02:00
Martin Kroeker
0bea6bb9e7 Create OpenBLASConfig.cmake from cmake as well 2018-06-10 09:24:37 +02:00
Martin Kroeker
63f7395fb4 Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:31:38 +02:00
Martin Kroeker
1cbd8f3ae4 Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:30:46 +02:00
Martin Kroeker
6c2d90ba77 Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:29:17 +02:00
Martin Kroeker
939452ea9d Merge pull request #1570 from xianyi/develop
Update release-0.3.0 branch to match develop
2018-05-23 15:12:20 +02:00
167 changed files with 9628 additions and 1497 deletions

View File

@@ -85,8 +85,8 @@ jobs:
sudo: true
language: minimal
before_install:
- "wget 'https://raw.githubusercontent.com/alpinelinux/alpine-chroot-install/v0.6.0/alpine-chroot-install' \
&& echo 'a827a4ba3d0817e7c88bae17fe34e50204983d1e alpine-chroot-install' | sha1sum -c || exit 1"
- "wget 'https://raw.githubusercontent.com/alpinelinux/alpine-chroot-install/v0.9.0/alpine-chroot-install' \
&& echo 'e5dfbbdc0c4b3363b99334510976c86bfa6cb251 alpine-chroot-install' | sha1sum -c || exit 1"
- alpine() { /alpine/enter-chroot -u "$USER" "$@"; }
install:
- sudo sh alpine-chroot-install -p 'build-base gfortran perl linux-headers'

View File

@@ -6,21 +6,30 @@ cmake_minimum_required(VERSION 2.8.5)
project(OpenBLAS C ASM)
set(OpenBLAS_MAJOR_VERSION 0)
set(OpenBLAS_MINOR_VERSION 3)
set(OpenBLAS_PATCH_VERSION 1.dev)
set(OpenBLAS_PATCH_VERSION 4.dev)
set(OpenBLAS_VERSION "${OpenBLAS_MAJOR_VERSION}.${OpenBLAS_MINOR_VERSION}.${OpenBLAS_PATCH_VERSION}")
# Adhere to GNU filesystem layout conventions
include(GNUInstallDirs)
set(OpenBLAS_LIBNAME openblas)
include(CMakePackageConfigHelpers)
#######
if(MSVC)
option(BUILD_WITHOUT_LAPACK "Without LAPACK and LAPACKE (Only BLAS or CBLAS)" ON)
option(BUILD_WITHOUT_LAPACK "Do not build LAPACK and LAPACKE (Only BLAS or CBLAS)" ON)
endif()
option(BUILD_WITHOUT_CBLAS "Without CBLAS" OFF)
option(DYNAMIC_ARCH "Build with DYNAMIC_ARCH" OFF)
option(BUILD_RELAPACK "Build with ReLAPACK (recursive LAPACK" OFF)
option(BUILD_WITHOUT_CBLAS "Do not build the C interface (CBLAS) to the BLAS functions" OFF)
option(DYNAMIC_ARCH "Include support for multiple CPU targets, with automatic selection at runtime (x86/x86_64 only)" OFF)
option(DYNAMIC_OLDER "Include specific support for older cpu models (Penryn,Dunnington,Atom,Nano,Opteron) with DYNAMIC_ARCH" OFF)
option(BUILD_RELAPACK "Build with ReLAPACK (recursive implementation of several LAPACK functions on top of standard LAPACK)" OFF)
# Add a prefix or suffix to all exported symbol names in the shared library.
# Avoids conflicts with other BLAS libraries, especially when using
# 64 bit integer interfaces in OpenBLAS.
set(SYMBOLPREFIX "" CACHE STRING "Add a prefix to all exported symbol names in the shared library to avoid conflicts with other BLAS libraries" )
set(SYMBOLSUFFIX "" CACHE STRING "Add a suffix to all exported symbol names in the shared library, e.g. _64 for INTERFACE64 builds" )
#######
if(BUILD_WITHOUT_LAPACK)
set(NO_LAPACK 1)
@@ -34,11 +43,13 @@ endif()
#######
message(WARNING "CMake support is experimental. This will not produce the same Makefiles that OpenBLAS ships with. Only x86 support is currently available.")
message(WARNING "CMake support is experimental. It does not yet support all build options and may not produce the same Makefiles that OpenBLAS ships with.")
include("${PROJECT_SOURCE_DIR}/cmake/utils.cmake")
include("${PROJECT_SOURCE_DIR}/cmake/system.cmake")
set(OpenBLAS_LIBNAME openblas${SUFFIX64_UNDERSCORE})
set(BLASDIRS interface driver/level2 driver/level3 driver/others)
if (NOT DYNAMIC_ARCH)
@@ -146,6 +157,7 @@ endif()
# add objects to the openblas lib
add_library(${OpenBLAS_LIBNAME} ${LA_SOURCES} ${LAPACKE_SOURCES} ${RELA_SOURCES} ${TARGET_OBJS} ${OpenBLAS_DEF_FILE})
target_include_directories(${OpenBLAS_LIBNAME} INTERFACE $<INSTALL_INTERFACE:include>)
# Android needs to explicitly link against libm
if(ANDROID)
@@ -165,6 +177,7 @@ endif()
# Set output for libopenblas
set_target_properties( ${OpenBLAS_LIBNAME} PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib)
set_target_properties( ${OpenBLAS_LIBNAME} PROPERTIES LIBRARY_OUTPUT_NAME_DEBUG "${OpenBLAS_LIBNAME}_d")
set_target_properties( ${OpenBLAS_LIBNAME} PROPERTIES EXPORT_NAME "OpenBLAS")
foreach (OUTPUTCONFIG ${CMAKE_CONFIGURATION_TYPES})
string( TOUPPER ${OUTPUTCONFIG} OUTPUTCONFIG )
@@ -204,14 +217,84 @@ set_target_properties(${OpenBLAS_LIBNAME} PROPERTIES
SOVERSION ${OpenBLAS_MAJOR_VERSION}
)
if (BUILD_SHARED_LIBS AND NOT ${SYMBOLPREFIX}${SYMBOLSUFIX} STREQUAL "")
if (NOT DEFINED ARCH)
set(ARCH_IN "x86_64")
else()
set(ARCH_IN ${ARCH})
endif()
if (${CORE} STREQUAL "generic")
set(ARCH_IN "GENERIC")
endif ()
if (NOT DEFINED EXPRECISION)
set(EXPRECISION_IN 0)
else()
set(EXPRECISION_IN ${EXPRECISION})
endif()
if (NOT DEFINED NO_CBLAS)
set(NO_CBLAS_IN 0)
else()
set(NO_CBLAS_IN ${NO_CBLAS})
endif()
if (NOT DEFINED NO_LAPACK)
set(NO_LAPACK_IN 0)
else()
set(NO_LAPACK_IN ${NO_LAPACK})
endif()
if (NOT DEFINED NO_LAPACKE)
set(NO_LAPACKE_IN 0)
else()
set(NO_LAPACKE_IN ${NO_LAPACKE})
endif()
if (NOT DEFINED NEED2UNDERSCORES)
set(NEED2UNDERSCORES_IN 0)
else()
set(NEED2UNDERSCORES_IN ${NEED2UNDERSCORES})
endif()
if (NOT DEFINED ONLY_CBLAS)
set(ONLY_CBLAS_IN 0)
else()
set(ONLY_CBLAS_IN ${ONLY_CBLAS})
endif()
if (NOT DEFINED BU)
set(BU _)
endif()
if (NOT ${SYMBOLPREFIX} STREQUAL "")
message(STATUS "adding prefix ${SYMBOLPREFIX} to names of exported symbols in ${OpenBLAS_LIBNAME}")
endif()
if (NOT ${SYMBOLSUFFIX} STREQUAL "")
message(STATUS "adding suffix ${SYMBOLSUFFIX} to names of exported symbols in ${OpenBLAS_LIBNAME}")
endif()
add_custom_command(TARGET ${OpenBLAS_LIBNAME} POST_BUILD
COMMAND perl ${PROJECT_SOURCE_DIR}/exports/gensymbol "objcopy" "${ARCH}" "${BU}" "${EXPRECISION_IN}" "${NO_CBLAS_IN}" "${NO_LAPACK_IN}" "${NO_LAPACKE_IN}" "${NEED2UNDERSCORES_IN}" "${ONLY_CBLAS_IN}" \"${SYMBOLPREFIX}\" \"${SYMBOLSUFFIX}\" "${BUILD_LAPACK_DEPRECATED}" > ${PROJECT_BINARY_DIR}/objcopy.def
COMMAND objcopy -v --redefine-syms ${PROJECT_BINARY_DIR}/objcopy.def ${PROJECT_BINARY_DIR}/lib/lib${OpenBLAS_LIBNAME}.so
COMMENT "renaming symbols"
)
endif()
# Install project
# Install libraries
install(TARGETS ${OpenBLAS_LIBNAME}
EXPORT "OpenBLAS${SUFFIX64}Targets"
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} )
# Install headers
set(CMAKE_INSTALL_INCLUDEDIR ${CMAKE_INSTALL_INCLUDEDIR}/openblas${SUFFIX64})
set(CMAKE_INSTALL_FULL_INCLUDEDIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR})
message(STATUS "Generating openblas_config.h in ${CMAKE_INSTALL_INCLUDEDIR}")
set(OPENBLAS_CONFIG_H ${CMAKE_BINARY_DIR}/openblas_config.h)
@@ -259,11 +342,31 @@ if(NOT NO_LAPACKE)
ADD_CUSTOM_TARGET(genlapacke
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/lapack-netlib/LAPACKE/include/lapacke_mangling_with_flags.h.in "${CMAKE_BINARY_DIR}/lapacke_mangling.h"
)
install (FILES ${CMAKE_BINARY_DIR}/lapacke_mangling.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
install (FILES ${CMAKE_BINARY_DIR}/lapacke_mangling.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/openblas${SUFFIX64})
endif()
include(FindPkgConfig QUIET)
if(PKG_CONFIG_FOUND)
configure_file(${PROJECT_SOURCE_DIR}/cmake/openblas.pc.in ${PROJECT_BINARY_DIR}/openblas.pc @ONLY)
install (FILES ${PROJECT_BINARY_DIR}/openblas.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig/)
configure_file(${PROJECT_SOURCE_DIR}/cmake/openblas.pc.in ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc @ONLY)
install (FILES ${PROJECT_BINARY_DIR}/openblas${SUFFIX64}.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig/)
endif()
# GNUInstallDirs "DATADIR" wrong here; CMake search path wants "share".
set(PN OpenBLAS)
set(CMAKECONFIG_INSTALL_DIR "share/cmake/${PN}${SUFFIX64}")
configure_package_config_file(cmake/${PN}Config.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/${PN}${SUFFIX64}Config.cmake"
INSTALL_DESTINATION ${CMAKECONFIG_INSTALL_DIR})
write_basic_package_version_file(${CMAKE_CURRENT_BINARY_DIR}/${PN}ConfigVersion.cmake
VERSION ${${PN}_VERSION}
COMPATIBILITY AnyNewerVersion)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/${PN}${SUFFIX64}Config.cmake
DESTINATION ${CMAKECONFIG_INSTALL_DIR})
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/${PN}ConfigVersion.cmake
RENAME ${PN}${SUFFIX64}ConfigVersion.cmake
DESTINATION ${CMAKECONFIG_INSTALL_DIR})
install(EXPORT "${PN}${SUFFIX64}Targets"
NAMESPACE "${PN}${SUFFIX64}::"
DESTINATION ${CMAKECONFIG_INSTALL_DIR})

View File

@@ -1,4 +1,142 @@
OpenBLAS ChangeLog
====================================================================
Version 0.3.3
31-Aug-2018
common:
* thread memory allocation has been switched back to the method
used before version 0.3.1 due to unexpected problems caused by
the new code under some circumstances. A new compile-time option
USE_TLS has been added to enable the new code, and it is hoped
that this can become the default again in the next version.
* LAPAck PR272 has been integrated, which fixes spurious errors
in DSYEVR and related functions caused by missing conversion
from ILAENV to ILAENV_2STAGE in several _2stage routines.
* the cmake-generated OpenBLASConfig.cmake now uses correct case
for the name of the library
* added support for Haiku OS
x86_64:
* added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY,
DSCAL, DGEMVN and DSYMVL
* added a workaround for a cygwin issue that prevented compilation
of AVX512 code
IBM Z:
* added autodetection of Z14
* fixed TRMM errors in the generic target
====================================================================
Version 0.3.2
30-Jul-2018
common:
* fixes for regressions caused by the rewrite of the thread
initialization code in 0.3.1
POWER:
* fixed cpu autodetection for the BSDs
MIPS64:
* fixed utest errors in AXPY, DSDOT, ROT and SWAP
x86_64:
* added autodetection of AMD Ryzen 2
* fixed build with older versions of MSVC
====================================================================
Version 0.3.1
01-Jul-2018
common:
* rewritten thread initialization code with significantly reduced overhead
* added CBLAS interfaces to the IxAMIN BLAS extension functions
* fixed the lapack-test target
* CMAKE builds now create an OpenBLASConfig.cmake file
* ZAXPY now uses a single thread for small input sizes
* the LAPACK code was updated from Reference-LAPACK/lapack#253
(fixing LAPACKE interfaces to Aasen's functions)
POWER:
* corrected CROT and ZROT behaviour with zero INC_X
ARMV7:
* corrected xDOT behaviour with zero INC_X or INC_Y
x86_64:
* retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
(which will still be supported via the slower PRESCOTT kernels when this option is not set)
* added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to
specify the list of x86_64 targets to include. Any target not on the list will be supported
by the Sandybridge or Nehalem kernels if available, or by Prescott.
* improved SWITCH_RATIO on Haswell for increased GEMM throughput
* added initial support for Intel Skylake X, including an AVX512 SGEMM kernel
* added autodetection of Intel Cannon Lake series as Skylake X
* added a default L2 cache size for hypervisors that return zero here (Chromebook)
* fixed a name clash with recent Windows10 headers that broke the build with (at least)
recent mingw from MSYS2
* fixed a link error in mixed clang/gfortran builds with OpenMP
* updated the OSX deployment target to 10.8
* switched on parallel make for builds on MS Windows by default
x86:
* fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y
====================================================================
Version 0.3.0
23-May-2108
common:
* fixed some more thread race and locking bugs
* added preliminary support for calling an OpenMP build of the library from multiple threads
* removed performance impact of thread locks added in 0.2.20 on OpenMP code
* general code cleanup
* optimized DSDOT implementation
* improved thread distribution for GEMM
* corrected IMATCOPY/OMATCOPY implementation
* fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations
* cmake build improvements
* pkgconfig file now contains build options
* openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build
* corrections and improvements for systems with more than 64 cpus
* LAPACK code updated to 3.8.0 including later fixes
* added ReLAPACK, a recursive implementation of several LAPACK functions
* Rewrote ROTMG to handle cases that the netlib code failed to address
* Disabled (broken) multithreading code for xTRMV
* corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard
* shared memory access failures on startup are now handled more gracefully
* restored utests from earlier releases (and made them pass on all affected systems)
SPARC:
* several fixes for cpu autodetection
POWER:
* corrected vector register overwriting in several Power8 kernels
* optimized additional BLAS functions
ARM:
* added support for CortexA53 and A72
* added autodetection for ThunderX2T99
* made most optimized kernels the default for generic ARMv8 targets
x86_64:
* parallelized DDOT kernel for Haswell
* changed alignment directives in assembly kernels to boost performance on OSX
* fixed register handling in the GEMV microkernels (bug exposed by gcc7)
* added support for building on OpenBSD and Dragonfly
* updated compiler options to work with Intel release 2018
* support fully optimized build with clang/flang on Microsoft Windows
* fixed building on AIX
IBM Z:
* added optimized BLAS 1/2 functions
MIPS:
* fixed cpu autodetection helper code
* added mips32 1004K cpu (Mediatek MT7621 and similar SoC)
* added mips64 I6500 cpu
====================================================================
Version 0.2.20
24-Jul-2017

View File

@@ -21,6 +21,17 @@ ifeq ($(BUILD_RELAPACK), 1)
RELA = re_lapack
endif
ifeq ($(NO_FORTRAN), 1)
define NOFORTRAN
1
endef
define NO_LAPACK
1
endef
export NOFORTRAN
export NO_LAPACK
endif
LAPACK_NOOPT := $(filter-out -O0 -O1 -O2 -O3 -Ofast,$(LAPACK_FFLAGS))
SUBDIRS_ALL = $(SUBDIRS) test ctest utest exports benchmark ../laswp ../bench
@@ -47,7 +58,7 @@ endif
endif
@echo " C compiler ... $(C_COMPILER) (command line : $(CC))"
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
@echo " Fortran compiler ... $(F_COMPILER) (command line : $(FC))"
endif
ifneq ($(OSNAME), AIX)
@@ -86,7 +97,7 @@ endif
shared :
ifndef NO_SHARED
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android))
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
@$(MAKE) -C exports so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so.$(MAJOR_VERSION)
@@ -108,7 +119,7 @@ endif
endif
tests :
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
touch $(LIBNAME)
ifndef NO_FBLAS
$(MAKE) -C test all
@@ -153,6 +164,9 @@ ifeq ($(DYNAMIC_ARCH), 1)
do $(MAKE) GOTOBLAS_MAKEFILE= -C kernel TARGET_CORE=$$d kernel || exit 1 ;\
done
@echo DYNAMIC_ARCH=1 >> Makefile.conf_last
ifeq ($(DYNAMIC_OLDER), 1)
@echo DYNAMIC_OLDER=1 >> Makefile.conf_last
endif
endif
ifdef USE_THREAD
@echo USE_THREAD=$(USE_THREAD) >> Makefile.conf_last
@@ -207,7 +221,7 @@ netlib :
else
netlib : lapack_prebuild
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
@$(MAKE) -C $(NETLIB_LAPACK_DIR) lapacklib
@$(MAKE) -C $(NETLIB_LAPACK_DIR) tmglib
endif
@@ -228,7 +242,7 @@ prof_lapack : lapack_prebuild
@$(MAKE) -C $(NETLIB_LAPACK_DIR) lapack_prof
lapack_prebuild :
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
-@echo "FORTRAN = $(FC)" > $(NETLIB_LAPACK_DIR)/make.inc
-@echo "OPTS = $(LAPACK_FFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "POPTS = $(LAPACK_FPFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
@@ -237,7 +251,7 @@ ifndef NOFORTRAN
-@echo "LOADOPTS = $(FFLAGS) $(EXTRALIB)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "CC = $(CC)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "override CFLAGS = $(LAPACK_CFLAGS)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "ARCH = $(AR)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "override ARCH = $(AR)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "ARCHFLAGS = $(ARFLAGS) -ru" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "RANLIB = $(RANLIB)" >> $(NETLIB_LAPACK_DIR)/make.inc
-@echo "LAPACKLIB = ../$(LIBNAME)" >> $(NETLIB_LAPACK_DIR)/make.inc
@@ -253,6 +267,8 @@ ifeq ($(F_COMPILER), GFORTRAN)
ifdef SMP
ifeq ($(OSNAME), WINNT)
-@echo "LOADER = $(FC)" >> $(NETLIB_LAPACK_DIR)/make.inc
else ifeq ($(OSNAME), Haiku)
-@echo "LOADER = $(FC)" >> $(NETLIB_LAPACK_DIR)/make.inc
else
-@echo "LOADER = $(FC) -pthread" >> $(NETLIB_LAPACK_DIR)/make.inc
endif
@@ -271,21 +287,21 @@ endif
endif
large.tgz :
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
if [ ! -a $< ]; then
-wget http://www.netlib.org/lapack/timing/large.tgz;
fi
endif
timing.tgz :
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
if [ ! -a $< ]; then
-wget http://www.netlib.org/lapack/timing/timing.tgz;
fi
endif
lapack-timing : large.tgz timing.tgz
ifndef NOFORTRAN
ifeq ($(NOFORTRAN), $(filter 0,$(NOFORTRAN)))
(cd $(NETLIB_LAPACK_DIR); $(TAR) zxf ../timing.tgz TIMING)
(cd $(NETLIB_LAPACK_DIR)/TIMING; $(TAR) zxf ../../large.tgz )
$(MAKE) -C $(NETLIB_LAPACK_DIR)/TIMING

View File

@@ -66,7 +66,7 @@ endif
#for install shared library
ifndef NO_SHARED
@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android))
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
@install -pm755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBSONAME) $(LIBPREFIX).so ; \
@@ -98,7 +98,7 @@ endif
@echo Generating openblas.pc in "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)"
@echo 'libdir='$(OPENBLAS_LIBRARY_DIR) > "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'includedir='$(OPENBLAS_INCLUDE_DIR) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'openblas_config= USE_64BITINT='$(USE_64BITINT) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'openblas_config= USE_64BITINT='$(USE_64BITINT) 'DYNAMIC_ARCH='$(DYNAMIC_ARCH) 'DYNAMIC_OLDER='$(DYNAMIC_OLDER) 'NO_CBLAS='$(NO_CBLAS) 'NO_LAPACK='$(NO_LAPACK) 'NO_LAPACKE='$(NO_LAPACKE) 'NO_AFFINITY='$(NO_AFFINITY) 'USE_OPENMP='$(USE_OPENMP) $(CORE) 'MAX_THREADS='$(NUM_THREADS)>> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'version='$(VERSION) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@echo 'extralib='$(EXTRALIB) >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"
@cat openblas.pc.in >> "$(DESTDIR)$(OPENBLAS_PKGCONFIG_DIR)/openblas.pc"

View File

@@ -3,7 +3,7 @@
#
# This library's version
VERSION = 0.3.1.dev
VERSION = 0.3.4.dev
# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library
@@ -17,6 +17,11 @@ VERSION = 0.3.1.dev
# If you want to support multiple architecture in one binary
# DYNAMIC_ARCH = 1
# If you want the full list of x86_64 architectures supported in DYNAMIC_ARCH
# mode (including individual optimizied codes for PENRYN, DUNNINGTON, OPTERON,
# OPTERON_SSE3, ATOM and NANO rather than fallbacks to older architectures)
# DYNAMIC_OLDER = 1
# C compiler including binary type(32bit / 64bit). Default is gcc.
# Don't use Intel Compiler or PGI, it won't generate right codes as I expect.
# CC = gcc
@@ -55,6 +60,14 @@ VERSION = 0.3.1.dev
# This flag is always set for POWER8. Don't modify the flag
# USE_OPENMP = 1
# The OpenMP scheduler to use - by default this is "static" and you
# will normally not want to change this unless you know that your main
# workload will involve tasks that have highly unbalanced running times
# for individual threads. Changing away from "static" may also adversely
# affect memory access locality in NUMA systems. Setting to "runtime" will
# allow you to select the scheduler from the environment variable OMP_SCHEDULE
# CCOMMON_OPT += -DOMP_SCHED=dynamic
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
@@ -96,6 +109,12 @@ BUILD_LAPACK_DEPRECATED = 1
# If you want to use legacy threaded Level 3 implementation.
# USE_SIMPLE_THREADED_LEVEL3 = 1
# If you want to use the new, still somewhat experimental code that uses
# thread-local storage instead of a central memory buffer in memory.c
# Note that if your system uses GLIBC, it needs to have at least glibc 2.21
# for this to work.
# USE_TLS = 1
# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure(equivalent to "-i8" option).
@@ -133,6 +152,9 @@ NO_AFFINITY = 1
# FUNCTION_PROFILE = 1
# Support for IEEE quad precision(it's *real* REAL*16)( under testing)
# This option should not be used - it is a holdover from unfinished code present
# in the original GotoBLAS2 library that may be usable as a starting point but
# is not even expected to compile in its present form.
# QUAD_PRECISION = 1
# Theads are still working for a while after finishing BLAS operation
@@ -151,8 +173,11 @@ NO_AFFINITY = 1
# CONSISTENT_FPCSR = 1
# If any gemm arguement m, n or k is less or equal this threshold, gemm will be execute
# with single thread. You can use this flag to avoid the overhead of multi-threading
# in small matrix sizes. The default value is 4.
# with single thread. (Actually in recent versions this is a factor proportional to the
# number of floating point operations necessary for the given problem size, no longer
# an individual dimension). You can use this setting to avoid the overhead of multi-
# threading in small matrix sizes. The default value is 4, but values as high as 50 have
# been reported to be optimal for certain workloads (50 is the recommended value for Julia).
# GEMM_MULTITHREAD_THRESHOLD = 4
# If you need santy check by comparing reference BLAS. It'll be very

View File

@@ -9,6 +9,11 @@ ifndef TOPDIR
TOPDIR = .
endif
# Catch conflicting usage of ARCH in some BSD environments
ifeq ($(ARCH), amd64)
override ARCH=x86_64
endif
NETLIB_LAPACK_DIR = $(TOPDIR)/lapack-netlib
# Default C compiler
@@ -248,7 +253,7 @@ endif
ifeq ($(OSNAME), Darwin)
ifndef MACOSX_DEPLOYMENT_TARGET
export MACOSX_DEPLOYMENT_TARGET=10.6
export MACOSX_DEPLOYMENT_TARGET=10.8
endif
MD5SUM = md5 -r
endif
@@ -472,7 +477,18 @@ DYNAMIC_CORE = KATMAI COPPERMINE NORTHWOOD PRESCOTT BANIAS \
endif
ifeq ($(ARCH), x86_64)
DYNAMIC_CORE = PRESCOTT CORE2 PENRYN DUNNINGTON NEHALEM OPTERON OPTERON_SSE3 BARCELONA BOBCAT ATOM NANO
DYNAMIC_CORE = PRESCOTT CORE2
ifeq ($(DYNAMIC_OLDER), 1)
DYNAMIC_CORE += PENRYN DUNNINGTON
endif
DYNAMIC_CORE += NEHALEM
ifeq ($(DYNAMIC_OLDER), 1)
DYNAMIC_CORE += OPTERON OPTERON_SSE3
endif
DYNAMIC_CORE += BARCELONA
ifeq ($(DYNAMIC_OLDER), 1)
DYNAMIC_CORE += BOBCAT ATOM NANO
endif
ifneq ($(NO_AVX), 1)
DYNAMIC_CORE += SANDYBRIDGE BULLDOZER PILEDRIVER STEAMROLLER EXCAVATOR
endif
@@ -486,6 +502,14 @@ endif
endif
endif
ifdef DYNAMIC_LIST
override DYNAMIC_CORE = PRESCOTT $(DYNAMIC_LIST)
XCCOMMON_OPT = -DDYNAMIC_LIST -DDYN_PRESCOTT
XCCOMMON_OPT += $(foreach dcore,$(DYNAMIC_LIST),-DDYN_$(dcore))
CCOMMON_OPT += $(XCCOMMON_OPT)
#CCOMMON_OPT += -DDYNAMIC_LIST='$(DYNAMIC_LIST)'
endif
# If DYNAMIC_CORE is not set, DYNAMIC_ARCH cannot do anything, so force it to empty
ifndef DYNAMIC_CORE
override DYNAMIC_ARCH=
@@ -917,6 +941,10 @@ ifeq ($(DYNAMIC_ARCH), 1)
CCOMMON_OPT += -DDYNAMIC_ARCH
endif
ifeq ($(DYNAMIC_OLDER), 1)
CCOMMON_OPT += -DDYNAMIC_OLDER
endif
ifeq ($(NO_LAPACK), 1)
CCOMMON_OPT += -DNO_LAPACK
#Disable LAPACK C interface
@@ -995,6 +1023,10 @@ ifdef USE_SIMPLE_THREADED_LEVEL3
CCOMMON_OPT += -DUSE_SIMPLE_THREADED_LEVEL3
endif
ifdef USE_TLS
CCOMMON_OPT += -DUSE_TLS
endif
ifndef SYMBOLPREFIX
SYMBOLPREFIX =
endif

View File

@@ -8,6 +8,21 @@ endif
endif
endif
ifeq ($(CORE), SKYLAKEX)
ifndef NO_AVX512
CCOMMON_OPT += -march=skylake-avx512
FCOMMON_OPT += -march=skylake-avx512
ifeq ($(OSNAME), CYGWIN_NT)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
ifeq ($(OSNAME), WINNT)
ifeq ($(C_COMPILER), GCC)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
endif
endif
endif
endif
ifeq ($(OSNAME), Interix)
ARFLAGS = -m x64
endif

View File

@@ -110,6 +110,7 @@ Please read `GotoBLAS_01Readme.txt`.
- **Intel Xeon 56xx (Westmere)**: Used GotoBLAS2 Nehalem codes.
- **Intel Sandy Bridge**: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
- **Intel Haswell**: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
- **Intel Skylake**: Optimized Level-3 and Level-2 BLAS with AVX512 and FMA on x86-64.
- **AMD Bobcat**: Used GotoBLAS2 Barcelona codes.
- **AMD Bulldozer**: x86-64 ?GEMM FMA4 kernels. (Thanks to Werner Saar)
- **AMD PILEDRIVER**: Uses Bulldozer codes with some optimizations.
@@ -200,6 +201,7 @@ Please see Changelog.txt to view the differences between OpenBLAS and GotoBLAS2
* Please use GCC version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MinGW/BSD.
* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture.
Clang 3.0 will generate the wrong AVX binary code.
* Please use GCC version 6 or LLVM version 6 and above to compile Skyalke AVX512 kernels.
* The number of CPUs/cores should less than or equal to 256. On Linux `x86_64` (`amd64`),
there is experimental support for up to 1024 CPUs/cores and 128 numa nodes if you build
the library with `BIGNUMA=1`.

View File

@@ -122,7 +122,7 @@ int main(int argc, char *argv[]){
FLOAT *a, *x, *y;
FLOAT alpha[] = {1.0, 1.0};
FLOAT beta [] = {1.0, 1.0};
FLOAT beta [] = {1.0, 0.0};
char trans='N';
blasint m, i, j;
blasint inc_x=1,inc_y=1;

10
c_check
View File

@@ -64,6 +64,7 @@ $os = WINNT if ($data =~ /OS_WINNT/);
$os = CYGWIN_NT if ($data =~ /OS_CYGWIN_NT/);
$os = Interix if ($data =~ /OS_INTERIX/);
$os = Android if ($data =~ /OS_ANDROID/);
$os = Haiku if ($data =~ /OS_HAIKU/);
$architecture = x86 if ($data =~ /ARCH_X86/);
$architecture = x86_64 if ($data =~ /ARCH_X86_64/);
@@ -204,9 +205,9 @@ $binformat = bin64 if ($data =~ /BINARY_64/);
$no_avx512= 0;
if (($architecture eq "x86") || ($architecture eq "x86_64")) {
$code = '"vbroadcastss -4 * 4(%rsi), %zmm2"';
print $tmpf "int main(void){ __asm__ volatile($code); }\n";
$args = " -o $tmpf.o -x c $tmpf";
my @cmd = ("$compiler_name $args");
print $tmpf "#include <immintrin.h>\n\nint main(void){ __asm__ volatile($code); }\n";
$args = " -march=skylake-avx512 -o $tmpf.o -x c $tmpf";
my @cmd = ("$compiler_name $args >/dev/null 2>/dev/null");
system(@cmd) == 0;
if ($? != 0) {
$no_avx512 = 1;
@@ -223,7 +224,6 @@ $data =~ /globl\s([_\.]*)(.*)/;
$need_fu = $1;
$cross = 0;
$cross = 1 if ($os ne $hostos);
if ($architecture ne $hostarch) {
$cross = 1;
@@ -231,6 +231,8 @@ if ($architecture ne $hostarch) {
$cross = 0 if (($hostarch eq "mips64") && ($architecture eq "mips"));
}
$cross = 1 if ($os ne $hostos);
$openmp = "" if $ENV{USE_OPENMP} != 1;
$linker_L = "";

View File

@@ -51,7 +51,8 @@ typedef enum CBLAS_TRANSPOSE {CblasNoTrans=111, CblasTrans=112, CblasConjTrans=1
typedef enum CBLAS_UPLO {CblasUpper=121, CblasLower=122} CBLAS_UPLO;
typedef enum CBLAS_DIAG {CblasNonUnit=131, CblasUnit=132} CBLAS_DIAG;
typedef enum CBLAS_SIDE {CblasLeft=141, CblasRight=142} CBLAS_SIDE;
typedef CBLAS_ORDER CBLAS_LAYOUT;
float cblas_sdsdot(OPENBLAS_CONST blasint n, OPENBLAS_CONST float alpha, OPENBLAS_CONST float *x, OPENBLAS_CONST blasint incx, OPENBLAS_CONST float *y, OPENBLAS_CONST blasint incy);
double cblas_dsdot (OPENBLAS_CONST blasint n, OPENBLAS_CONST float *x, OPENBLAS_CONST blasint incx, OPENBLAS_CONST float *y, OPENBLAS_CONST blasint incy);
float cblas_sdot(OPENBLAS_CONST blasint n, OPENBLAS_CONST float *x, OPENBLAS_CONST blasint incx, OPENBLAS_CONST float *y, OPENBLAS_CONST blasint incy);
@@ -82,6 +83,11 @@ CBLAS_INDEX cblas_idamax(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPE
CBLAS_INDEX cblas_icamax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izamax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_isamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST float *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);
void cblas_saxpy(OPENBLAS_CONST blasint n, OPENBLAS_CONST float alpha, OPENBLAS_CONST float *x, OPENBLAS_CONST blasint incx, float *y, OPENBLAS_CONST blasint incy);
void cblas_daxpy(OPENBLAS_CONST blasint n, OPENBLAS_CONST double alpha, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx, double *y, OPENBLAS_CONST blasint incy);
void cblas_caxpy(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *alpha, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx, void *y, OPENBLAS_CONST blasint incy);

View File

@@ -0,0 +1,79 @@
# OpenBLASConfig.cmake
# --------------------
#
# OpenBLAS cmake module.
# This module sets the following variables in your project::
#
# OpenBLAS_FOUND - true if OpenBLAS and all required components found on the system
# OpenBLAS_VERSION - OpenBLAS version in format Major.Minor.Release
# OpenBLAS_INCLUDE_DIRS - Directory where OpenBLAS header is located.
# OpenBLAS_INCLUDE_DIR - same as DIRS
# OpenBLAS_LIBRARIES - OpenBLAS library to link against.
# OpenBLAS_LIBRARY - same as LIBRARIES
#
#
# Available components::
#
## shared - search for only shared library
## static - search for only static library
# serial - search for unthreaded library
# pthread - search for native pthread threaded library
# openmp - search for OpenMP threaded library
#
#
# Exported targets::
#
# If OpenBLAS is found, this module defines the following :prop_tgt:`IMPORTED`
## target. Target is shared _or_ static, so, for both, use separate, not
## overlapping, installations. ::
#
# OpenBLAS::OpenBLAS - the main OpenBLAS library #with header & defs attached.
#
#
# Suggested usage::
#
# find_package(OpenBLAS)
# find_package(OpenBLAS 0.2.20 EXACT CONFIG REQUIRED COMPONENTS pthread)
#
#
# The following variables can be set to guide the search for this package::
#
# OpenBLAS_DIR - CMake variable, set to directory containing this Config file
# CMAKE_PREFIX_PATH - CMake variable, set to root directory of this package
# PATH - environment variable, set to bin directory of this package
# CMAKE_DISABLE_FIND_PACKAGE_OpenBLAS - CMake variable, disables
# find_package(OpenBLAS) when not REQUIRED, perhaps to force internal build
@PACKAGE_INIT@
set(PN OpenBLAS)
# need to check that the @USE_*@ evaluate to something cmake can perform boolean logic upon
if(@USE_OPENMP@)
set(${PN}_openmp_FOUND 1)
elseif(@USE_THREAD@)
set(${PN}_pthread_FOUND 1)
else()
set(${PN}_serial_FOUND 1)
endif()
check_required_components(${PN})
#-----------------------------------------------------------------------------
# Don't include targets if this file is being picked up by another
# project which has already built this as a subproject
#-----------------------------------------------------------------------------
if(NOT TARGET ${PN}::OpenBLAS)
include("${CMAKE_CURRENT_LIST_DIR}/${PN}Targets.cmake")
get_property(_loc TARGET ${PN}::OpenBLAS PROPERTY LOCATION)
set(${PN}_LIBRARY ${_loc})
get_property(_ill TARGET ${PN}::OpenBLAS PROPERTY INTERFACE_LINK_LIBRARIES)
set(${PN}_LIBRARIES ${_ill})
get_property(_id TARGET ${PN}::OpenBLAS PROPERTY INCLUDE_DIRECTORIES)
set(${PN}_INCLUDE_DIR ${_id})
get_property(_iid TARGET ${PN}::OpenBLAS PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
set(${PN}_INCLUDE_DIRS ${_iid})
endif()

View File

@@ -49,7 +49,18 @@ if (DYNAMIC_ARCH)
endif ()
if (X86_64)
set(DYNAMIC_CORE PRESCOTT CORE2 PENRYN DUNNINGTON NEHALEM OPTERON OPTERON_SSE3 BARCELONA BOBCAT ATOM NANO)
set(DYNAMIC_CORE PRESCOTT CORE2)
if (DYNAMIC_OLDER)
set (DYNAMIC_CORE ${DYNAMIC_CORE} PENRYN DUNNINGTON)
endif ()
set (DYNAMIC_CORE ${DYNAMIC_CORE} NEHALEM)
if (DYNAMIC_OLDER)
set (DYNAMIC_CORE ${DYNAMIC_CORE} OPTERON OPTERON_SSE3)
endif ()
set (DYNAMIC_CORE ${DYNAMIC_CORE} BARCELONA)
if (DYNAMIC_OLDER)
set (DYNAMIC_CORE ${DYNAMIC_CORE} BOBCAT ATOM NANO)
endif ()
if (NOT NO_AVX)
set(DYNAMIC_CORE ${DYNAMIC_CORE} SANDYBRIDGE BULLDOZER PILEDRIVER STEAMROLLER EXCAVATOR)
endif ()

View File

@@ -3,6 +3,11 @@
## Description: Ported from portion of OpenBLAS/Makefile.system
## Sets Fortran related variables.
if (INTERFACE64)
set(SUFFIX64 64)
set(SUFFIX64_UNDERSCORE _64)
endif()
if (${F_COMPILER} STREQUAL "FLANG")
set(CCOMMON_OPT "${CCOMMON_OPT} -DF_INTERFACE_FLANG")
if (BINARY64 AND INTERFACE64)

View File

@@ -1,10 +1,11 @@
libdir=@CMAKE_INSTALL_FULL_LIBDIR@
libsuffix=@SUFFIX64_UNDERSCORE@
includedir=@CMAKE_INSTALL_FULL_INCLUDEDIR@
openblas_config=USE_64BITINT=@USE_64BITINT@ NO_CBLAS=@NO_CBLAS@ NO_LAPACK=@NO_LAPACK@ NO_LAPACKE=@NO_LAPACKE@ DYNAMIC_ARCH=@DYNAMIC_ARCH@ NO_AFFINITY=@NO_AFFINITY@ USE_OPENMP=@USE_OPENMP@ @CORE@ MAX_THREADS=@NUM_THREADS@
openblas_config=USE_64BITINT=@USE_64BITINT@ NO_CBLAS=@NO_CBLAS@ NO_LAPACK=@NO_LAPACK@ NO_LAPACKE=@NO_LAPACKE@ DYNAMIC_ARCH=@DYNAMIC_ARCH@ DYNAMIC_OLDER=@DYNAMIC_OLDER@ NO_AFFINITY=@NO_AFFINITY@ USE_OPENMP=@USE_OPENMP@ @CORE@ MAX_THREADS=@NUM_THREADS@
Name: OpenBLAS
Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version
Version: @OPENBLAS_VERSION@
URL: https://github.com/xianyi/OpenBLAS
Libs: -L${libdir} -lopenblas
Libs: -L${libdir} -lopenblas${libsuffix}
Cflags: -I${includedir}

View File

@@ -85,7 +85,7 @@ if (NOT NOFORTRAN)
endif ()
# Cannot run getarch on target if we are cross-compiling
if (DEFINED CORE AND CMAKE_CROSSCOMPILING)
if (DEFINED CORE AND CMAKE_CROSSCOMPILING AND NOT (${HOST_OS} STREQUAL "WINDOWSSTORE"))
# Write to config as getarch would
# TODO: Set up defines that getarch sets up based on every other target

View File

@@ -163,6 +163,9 @@ endif ()
if (DYNAMIC_ARCH)
set(CCOMMON_OPT "${CCOMMON_OPT} -DDYNAMIC_ARCH")
if (DYNAMIC_OLDER)
set(CCOMMON_OPT "${CCOMMON_OPT} -DDYNAMIC_OLDER")
endif ()
endif ()
if (NO_LAPACK)
@@ -211,6 +214,10 @@ if (CONSISTENT_FPCSR)
set(CCOMMON_OPT "${CCOMMON_OPT} -DCONSISTENT_FPCSR")
endif ()
if (USE_TLS)
set(CCOMMON_OPT "${CCOMMON_OPT} -DUSE_TLS")
endif ()
# Only for development
# set(CCOMMON_OPT "${CCOMMON_OPT} -DPARAMTEST")
# set(CCOMMON_OPT "${CCOMMON_OPT} -DPREFETCHTEST")

View File

@@ -67,8 +67,8 @@ else()
endif()
if (X86_64 OR X86)
file(WRITE ${PROJECT_BINARY_DIR}/avx512.tmp "int main(void){ __asm__ volatile(\"vbroadcastss -4 * 4(%rsi), %zmm2\"); }")
execute_process(COMMAND ${CMAKE_C_COMPILER} -v -o ${PROJECT_BINARY_DIR}/avx512.o -x c ${PROJECT_BINARY_DIR}/avx512.tmp RESULT_VARIABLE NO_AVX512)
file(WRITE ${PROJECT_BINARY_DIR}/avx512.tmp "#include <immintrin.h>\n\nint main(void){ __asm__ volatile(\"vbroadcastss -4 * 4(%rsi), %zmm2\"); }")
execute_process(COMMAND ${CMAKE_C_COMPILER} -march=skylake-avx512 -v -o ${PROJECT_BINARY_DIR}/avx512.o -x c ${PROJECT_BINARY_DIR}/avx512.tmp OUTPUT_QUIET ERROR_QUIET RESULT_VARIABLE NO_AVX512)
if (NO_AVX512 EQUAL 1)
set (CCOMMON_OPT "${CCOMMON_OPT} -DNO_AVX512")
endif()

View File

@@ -105,6 +105,10 @@ extern "C" {
#endif
#endif
#ifdef OS_HAIKU
#define NO_SYSV_IPC
#endif
#ifdef OS_WINDOWS
#ifdef ATOM
#define GOTO_ATOM ATOM
@@ -253,8 +257,14 @@ typedef unsigned long BLASULONG;
#ifdef USE64BITINT
typedef BLASLONG blasint;
#if defined(OS_WINDOWS) && defined(__64BIT__)
#define blasabs(x) llabs(x)
#else
#define blasabs(x) labs(x)
#endif
#else
typedef int blasint;
#define blasabs(x) abs(x)
#endif
#else
#ifdef USE64BITINT

View File

@@ -94,7 +94,7 @@ static inline unsigned int rpcc(void){
#define RPCC_DEFINED
#ifndef NO_AFFINITY
#define WHEREAMI
//#define WHEREAMI
static inline int WhereAmI(void){
int ret=0;
__asm__ __volatile__(".set push \n"

View File

@@ -47,14 +47,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
* - large enough to support all architectures and kernel
* Chosing a too small SIZE will lead to a stack smashing.
*/
#define STACK_ALLOC(SIZE, TYPE, BUFFER) \
/* make it volatile because some function (ex: dgemv_n.S) */ \
/* do not restore all register */ \
volatile int stack_alloc_size = SIZE; \
if(stack_alloc_size > MAX_STACK_ALLOC / sizeof(TYPE)) \
stack_alloc_size = 0; \
STACK_ALLOC_PROTECT_SET \
TYPE stack_buffer[stack_alloc_size] __attribute__((aligned(0x20))); \
#define STACK_ALLOC(SIZE, TYPE, BUFFER) \
/* make it volatile because some function (ex: dgemv_n.S) */ \
/* do not restore all register */ \
volatile int stack_alloc_size = SIZE; \
if (stack_alloc_size > MAX_STACK_ALLOC / sizeof(TYPE)) stack_alloc_size = 0; \
STACK_ALLOC_PROTECT_SET \
/* Avoid declaring an array of length 0 */ \
TYPE stack_buffer[stack_alloc_size ? stack_alloc_size : 1] \
__attribute__((aligned(0x20))); \
BUFFER = stack_alloc_size ? stack_buffer : (TYPE *)blas_memory_alloc(1);
#else
//Original OpenBLAS/GotoBLAS codes.

View File

@@ -60,8 +60,13 @@
#endif
*/
#define MB
#define WMB
#ifdef __GNUC__
#define MB do { __asm__ __volatile__("": : :"memory"); } while (0)
#define WMB do { __asm__ __volatile__("": : :"memory"); } while (0)
#else
#define MB do {} while (0)
#define WMB do {} while (0)
#endif
static void __inline blas_lock(volatile BLASULONG *address){

View File

@@ -142,6 +142,52 @@ int detect(void){
return CPUTYPE_PPC970;
#endif
#if defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__DragonFly__)
int id;
id = __asm __volatile("mfpvr %0" : "=r"(id));
switch ( id >> 16 ) {
case 0x4e: // POWER9
return return CPUTYPE_POWER8;
break;
case 0x4d:
case 0x4b: // POWER8/8E
return CPUTYPE_POWER8;
break;
case 0x4a:
case 0x3f: // POWER7/7E
return CPUTYPE_POWER6;
break;
case 0x3e:
return CPUTYPE_POWER6;
break;
case 0x3a:
return CPUTYPE_POWER5;
break;
case 0x35:
case 0x38: // POWER4 /4+
return CPUTYPE_POWER4;
break;
case 0x40:
case 0x41: // POWER3 /3+
return CPUTYPE_POWER3;
break;
case 0x39:
case 0x3c:
case 0x44:
case 0x45:
return CPUTYPE_PPC970;
break;
case 0x70:
return CPUTYPE_CELL;
break;
case 0x8003:
return CPUTYPE_PPCG4;
break;
default:
return CPUTYPE_UNKNOWN;
}
#endif
}
void get_architecture(void){

View File

@@ -1339,6 +1339,23 @@ int get_cpuname(void){
return CPUTYPE_NEHALEM;
}
break;
case 6:
switch (model) {
case 6: // Cannon Lake
#ifndef NO_AVX512
return CPUTYPE_SKYLAKEX;
#else
if(support_avx())
#ifndef NO_AVX2
return CPUTYPE_HASWELL;
#else
return CPUTYPE_SANDYBRIDGE;
#endif
else
return CPUTYPE_NEHALEM;
#endif
}
break;
case 9:
case 8:
switch (model) {
@@ -1435,6 +1452,8 @@ int get_cpuname(void){
switch (model) {
case 1:
// AMD Ryzen
case 8:
// AMD Ryzen2
if(support_avx())
#ifndef NO_AVX2
return CPUTYPE_ZEN;

View File

@@ -29,15 +29,18 @@
#define CPU_GENERIC 0
#define CPU_Z13 1
#define CPU_Z14 2
static char *cpuname[] = {
"ZARCH_GENERIC",
"Z13"
"Z13",
"Z14"
};
static char *cpuname_lower[] = {
"zarch_generic",
"z13"
"z13",
"z14"
};
int detect(void)
@@ -62,6 +65,10 @@ int detect(void)
if (strstr(p, "2964")) return CPU_Z13;
if (strstr(p, "2965")) return CPU_Z13;
/* detect z14, but fall back to z13 */
if (strstr(p, "3906")) return CPU_Z13;
if (strstr(p, "3907")) return CPU_Z13;
return CPU_GENERIC;
}
@@ -107,5 +114,9 @@ void get_cpuconfig(void)
printf("#define Z13\n");
printf("#define DTB_DEFAULT_ENTRIES 64\n");
break;
case CPU_Z14:
printf("#define Z14\n");
printf("#define DTB_DEFAULT_ENTRIES 64\n");
break;
}
}

View File

@@ -101,6 +101,10 @@ OS_INTERIX
OS_LINUX
#endif
#if defined(__HAIKU__)
OS_HAIKU
#endif
#if defined(__i386) || defined(_X86)
ARCH_X86
#endif

View File

@@ -102,7 +102,13 @@ clean ::
rm -f x*
FLDFLAGS = $(FFLAGS:-fPIC=) $(LDFLAGS)
CEXTRALIB =
ifeq ($(USE_OPENMP), 1)
ifeq ($(F_COMPILER), GFORTRAN)
ifeq ($(C_COMPILER), CLANG)
CEXTRALIB = -lomp
endif
endif
endif
# Single real
xscblat1: $(stestl1o) c_sblat1.o $(TOPDIR)/$(LIBNAME)

View File

@@ -91,11 +91,7 @@
#endif
typedef struct {
#if __STDC_VERSION__ >= 201112L
_Atomic
#else
volatile
#endif
BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
@@ -351,7 +347,7 @@ static int inner_thread(blas_arg_t *args, BLASLONG *range_m, BLASLONG *range_n,
/* Make sure if no one is using workspace */
START_RPCC();
for (i = 0; i < args -> nthreads; i++)
while (job[mypos].working[i][CACHE_LINE_SIZE * bufferside]) {YIELDING;};
while (job[mypos].working[i][CACHE_LINE_SIZE * bufferside]) {YIELDING;MB;};
STOP_RPCC(waiting1);
#if defined(FUSED_GEMM) && !defined(TIMING)
@@ -413,7 +409,7 @@ static int inner_thread(blas_arg_t *args, BLASLONG *range_m, BLASLONG *range_n,
/* Wait until other region of B is initialized */
START_RPCC();
while(job[current].working[mypos][CACHE_LINE_SIZE * bufferside] == 0) {YIELDING;};
while(job[current].working[mypos][CACHE_LINE_SIZE * bufferside] == 0) {YIELDING;MB;};
STOP_RPCC(waiting2);
/* Apply kernel with local region of A and part of other region of B */
@@ -431,6 +427,7 @@ static int inner_thread(blas_arg_t *args, BLASLONG *range_m, BLASLONG *range_n,
/* Clear synchronization flag if this thread is done with other region of B */
if (m_to - m_from == min_i) {
job[current].working[mypos][CACHE_LINE_SIZE * bufferside] &= 0;
WMB;
}
}
} while (current != mypos);
@@ -492,7 +489,7 @@ static int inner_thread(blas_arg_t *args, BLASLONG *range_m, BLASLONG *range_n,
START_RPCC();
for (i = 0; i < args -> nthreads; i++) {
for (js = 0; js < DIVIDE_RATE; js++) {
while (job[mypos].working[i][CACHE_LINE_SIZE * js] ) {YIELDING;};
while (job[mypos].working[i][CACHE_LINE_SIZE * js] ) {YIELDING;MB;};
}
}
STOP_RPCC(waiting3);
@@ -658,8 +655,8 @@ static int gemm_driver(blas_arg_t *args, BLASLONG *range_m, BLASLONG
}
/* Clear synchronization flags */
for (i = 0; i < MAX_CPU_NUMBER; i++) {
for (j = 0; j < MAX_CPU_NUMBER; j++) {
for (i = 0; i < nthreads; i++) {
for (j = 0; j < nthreads; j++) {
for (k = 0; k < DIVIDE_RATE; k++) {
job[i].working[j][CACHE_LINE_SIZE * k] = 0;
}

View File

@@ -70,7 +70,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/*********************************************************************/
#include "common.h"
#if defined(OS_LINUX) || defined(OS_NETBSD) || defined(OS_DARWIN) || defined(OS_ANDROID) || defined(OS_SUNOS) || defined(OS_FREEBSD) || defined(OS_OPENBSD) || defined(OS_DRAGONFLY)
#if defined(OS_LINUX) || defined(OS_NETBSD) || defined(OS_DARWIN) || defined(OS_ANDROID) || defined(OS_SUNOS) || defined(OS_FREEBSD) || defined(OS_OPENBSD) || defined(OS_DRAGONFLY) || defined(OS_HAIKU)
#include <dlfcn.h>
#include <signal.h>
#include <sys/resource.h>
@@ -582,7 +582,7 @@ int blas_thread_init(void){
if(ret!=0){
struct rlimit rlim;
const char *msg = strerror(ret);
fprintf(STDERR, "OpenBLAS blas_thread_init: pthread_create: %s\n", msg);
fprintf(STDERR, "OpenBLAS blas_thread_init: pthread_create failed for thread %ld of %ld: %s\n", i+1,blas_num_threads,msg);
#ifdef RLIMIT_NPROC
if(0 == getrlimit(RLIMIT_NPROC, &rlim)) {
fprintf(STDERR, "OpenBLAS blas_thread_init: RLIMIT_NPROC "

View File

@@ -48,6 +48,10 @@
#else
#ifndef OMP_SCHED
#define OMP_SCHED static
#endif
int blas_server_avail = 0;
static void * blas_thread_buffer[MAX_PARALLEL_NUMBER][MAX_CPU_NUMBER];
@@ -331,7 +335,7 @@ int exec_blas(BLASLONG num, blas_queue_t *queue){
break;
}
#pragma omp parallel for schedule(static)
#pragma omp parallel for schedule(OMP_SCHED)
for (i = 0; i < num; i ++) {
#ifndef USE_SIMPLE_THREADED_LEVEL3

View File

@@ -49,6 +49,167 @@
#define EXTERN
#endif
#ifdef DYNAMIC_LIST
extern gotoblas_t gotoblas_PRESCOTT;
#ifdef DYN_ATHLON
extern gotoblas_t gotoblas_ATHLON;
#else
#define gotoblas_ATHLON gotoblas_PRESCOTT
#endif
#ifdef DYN_KATMAI
extern gotoblas_t gotoblas_KATMAI;
#else
#define gotoblas_KATMAI gotoblas_PRESCOTT
#endif
#ifdef DYN_BANIAS
extern gotoblas_t gotoblas_BANIAS;
#else
#define gotoblas_BANIAS gotoblas_PRESCOTT
#endif
#ifdef DYN_COPPERMINE
extern gotoblas_t gotoblas_COPPERMINE;
#else
#define gotoblas_COPPERMINE gotoblas_PRESCOTT
#endif
#ifdef DYN_NORTHWOOD
extern gotoblas_t gotoblas_NORTHWOOD;
#else
#define gotoblas_NORTHWOOD gotoblas_PRESCOTT
#endif
#ifdef DYN_CORE2
extern gotoblas_t gotoblas_CORE2;
#else
#define gotoblas_CORE2 gotoblas_PRESCOTT
#endif
#ifdef DYN_NEHALEM
extern gotoblas_t gotoblas_NEHALEM;
#else
#define gotoblas_NEHALEM gotoblas_PRESCOTT
#endif
#ifdef DYN_BARCELONA
extern gotoblas_t gotoblas_BARCELONA;
#elif defined(DYN_NEHALEM)
#define gotoblas_BARCELONA gotoblas_NEHALEM
#else
#define gotoblas_BARCELONA gotoblas_PRESCOTT
#endif
#ifdef DYN_ATOM
extern gotoblas_t gotoblas_ATOM;
elif defined(DYN_NEHALEM)
#define gotoblas_ATOM gotoblas_NEHALEM
#else
#define gotoblas_ATOM gotoblas_PRESCOTT
#endif
#ifdef DYN_NANO
extern gotoblas_t gotoblas_NANO;
#else
#define gotoblas_NANO gotoblas_PRESCOTT
#endif
#ifdef DYN_PENRYN
extern gotoblas_t gotoblas_PENRYN;
#else
#define gotoblas_PENRYN gotoblas_PRESCOTT
#endif
#ifdef DYN_DUNNINGTON
extern gotoblas_t gotoblas_DUNNINGTON;
#else
#define gotoblas_DUNNINGTON gotoblas_PRESCOTT
#endif
#ifdef DYN_OPTERON
extern gotoblas_t gotoblas_OPTERON;
#else
#define gotoblas_OPTERON gotoblas_PRESCOTT
#endif
#ifdef DYN_OPTERON_SSE3
extern gotoblas_t gotoblas_OPTERON_SSE3;
#else
#define gotoblas_OPTERON_SSE3 gotoblas_PRESCOTT
#endif
#ifdef DYN_BOBCAT
extern gotoblas_t gotoblas_BOBCAT;
#elif defined(DYN_NEHALEM)
#define gotoblas_BOBCAT gotoblas_NEHALEM
#else
#define gotoblas_BOBCAT gotoblas_PRESCOTT
#endif
#ifdef DYN_SANDYBRIDGE
extern gotoblas_t gotoblas_SANDYBRIDGE;
#elif defined(DYN_NEHALEM)
#define gotoblas_SANDYBRIDGE gotoblas_NEHALEM
#else
#define gotoblas_SANDYBRIDGE gotoblas_PRESCOTT
#endif
#ifdef DYN_BULLDOZER
extern gotoblas_t gotoblas_BULLDOZER;
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_BULLDOZER gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_BULLDOZER gotoblas_NEHALEM
#else
#define gotoblas_BULLDOZER gotoblas_PRESCOTT
#endif
#ifdef DYN_PILEDRIVER
extern gotoblas_t gotoblas_PILEDRIVER;
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_PILEDRIVER gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_PILEDRIVER gotoblas_NEHALEM
#else
#define gotoblas_PILEDRIVER gotoblas_PRESCOTT
#endif
#ifdef DYN_STEAMROLLER
extern gotoblas_t gotoblas_STEAMROLLER;
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_STEAMROLLER gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_STEAMROLLER gotoblas_NEHALEM
#else
#define gotoblas_STEAMROLLER gotoblas_PRESCOTT
#endif
#ifdef DYN_EXCAVATOR
extern gotoblas_t gotoblas_EXCAVATOR;
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_EXCAVATOR gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_EXCAVATOR gotoblas_NEHALEM
#else
#define gotoblas_EXCAVATOR gotoblas_PRESCOTT
#endif
#ifdef DYN_HASWELL
extern gotoblas_t gotoblas_HASWELL;
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_HASWELL gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_HASWELL gotoblas_NEHALEM
#else
#define gotoblas_HASWELL gotoblas_PRESCOTT
#endif
#ifdef DYN_ZEN
extern gotoblas_t gotoblas_ZEN;
#elif defined(DYN_HASWELL)
#define gotoblas_ZEN gotoblas_HASWELL
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_ZEN gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_ZEN gotoblas_NEHALEM
#else
#define gotoblas_ZEN gotoblas_PRESCOTT
#endif
#ifdef DYN_SKYLAKEX
extern gotoblas_t gotoblas_SKYLAKEX;
#elif defined(DYN_HASWELL)
#define gotoblas_SKYLAKEX gotoblas_HASWELL
#elif defined(DYN_SANDYBRIDGE)
#define gotoblas_SKYLAKEX gotoblas_SANDYBRIDGE
#elif defined(DYN_NEHALEM)
#define gotoblas_SKYLAKEX gotoblas_NEHALEM
#else
#define gotoblas_SKYLAKEX gotoblas_PRESCOTT
#endif
#else // not DYNAMIC_LIST
EXTERN gotoblas_t gotoblas_KATMAI;
EXTERN gotoblas_t gotoblas_COPPERMINE;
EXTERN gotoblas_t gotoblas_NORTHWOOD;
@@ -56,16 +217,27 @@ EXTERN gotoblas_t gotoblas_BANIAS;
EXTERN gotoblas_t gotoblas_ATHLON;
extern gotoblas_t gotoblas_PRESCOTT;
extern gotoblas_t gotoblas_CORE2;
extern gotoblas_t gotoblas_NEHALEM;
extern gotoblas_t gotoblas_BARCELONA;
#ifdef DYNAMIC_OLDER
extern gotoblas_t gotoblas_ATOM;
extern gotoblas_t gotoblas_NANO;
extern gotoblas_t gotoblas_CORE2;
extern gotoblas_t gotoblas_PENRYN;
extern gotoblas_t gotoblas_DUNNINGTON;
extern gotoblas_t gotoblas_NEHALEM;
extern gotoblas_t gotoblas_OPTERON;
extern gotoblas_t gotoblas_OPTERON_SSE3;
extern gotoblas_t gotoblas_BARCELONA;
extern gotoblas_t gotoblas_BOBCAT;
#else
#define gotoblas_ATOM gotoblas_NEHALEM
#define gotoblas_NANO gotoblas_NEHALEM
#define gotoblas_PENRYN gotoblas_CORE2
#define gotoblas_DUNNINGTON gotoblas_CORE2
#define gotoblas_OPTERON gotoblas_CORE2
#define gotoblas_OPTERON_SSE3 gotoblas_CORE2
#define gotoblas_BOBCAT gotoblas_CORE2
#endif
#ifndef NO_AVX
extern gotoblas_t gotoblas_SANDYBRIDGE;
extern gotoblas_t gotoblas_BULLDOZER;
@@ -97,6 +269,7 @@ extern gotoblas_t gotoblas_SKYLAKEX;
#define gotoblas_ZEN gotoblas_BARCELONA
#endif
#endif // DYNAMIC_LIST
#define VENDOR_INTEL 1
#define VENDOR_AMD 2
@@ -327,6 +500,23 @@ static gotoblas_t *get_coretype(void){
return &gotoblas_NEHALEM;
}
return NULL;
case 6:
if (model == 6) {
// Cannon Lake
#ifndef NO_AVX512
return &gotoblas_SKYLAKEX;
#else
if(support_avx())
#ifndef NO_AVX2
return &gotoblas_HASWELL;
#else
return &gotoblas_SANDYBRIDGE;
#endif
else
return &gotoblas_NEHALEM;
#endif
}
return NULL;
case 9:
case 8:
if (model == 14 ) { // Kaby Lake
@@ -417,7 +607,7 @@ static gotoblas_t *get_coretype(void){
}
}
} else if (exfamily == 8) {
if (model == 1) {
if (model == 1 || model == 8) {
if(support_avx())
return &gotoblas_ZEN;
else{

File diff suppressed because it is too large Load Diff

View File

@@ -35,6 +35,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <string.h>
#if defined(_WIN32) && defined(_MSC_VER)
#if _MSC_VER < 1900
#define snprintf _snprintf
#endif
#endif
static char* openblas_config_str=""
#ifdef USE64BITINT
"USE64BITINT "

View File

@@ -114,15 +114,15 @@ $(LIBDYNNAME) : ../$(LIBNAME).osx.renamed osx.def
endif
ifneq (,$(filter 1 2,$(NOFORTRAN)))
#only build without Fortran
$(CC) $(CFLAGS) -all_load -headerpad_max_install_names -install_name "$(CURDIR)/../$(LIBDYNNAME)" -dynamiclib -o ../$(LIBDYNNAME) $< -Wl,-exported_symbols_list,osx.def $(FEXTRALIB)
$(CC) $(CFLAGS) $(LDFLAGS) -all_load -headerpad_max_install_names -install_name "$(CURDIR)/../$(LIBDYNNAME)" -dynamiclib -o ../$(LIBDYNNAME) $< -Wl,-exported_symbols_list,osx.def $(FEXTRALIB)
else
$(FC) $(FFLAGS) -all_load -headerpad_max_install_names -install_name "$(CURDIR)/../$(LIBDYNNAME)" -dynamiclib -o ../$(LIBDYNNAME) $< -Wl,-exported_symbols_list,osx.def $(FEXTRALIB)
$(FC) $(FFLAGS) $(LDFLAGS) -all_load -headerpad_max_install_names -install_name "$(CURDIR)/../$(LIBDYNNAME)" -dynamiclib -o ../$(LIBDYNNAME) $< -Wl,-exported_symbols_list,osx.def $(FEXTRALIB)
endif
dllinit.$(SUFFIX) : dllinit.c
$(CC) $(CFLAGS) -c -o $(@F) -s $<
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android))
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
so : ../$(LIBSONAME)

View File

@@ -260,7 +260,7 @@ HPLOBJS = dgemm.$(SUFFIX) dtrsm.$(SUFFIX) \
idamax.$(SUFFIX) daxpy.$(SUFFIX) dcopy.$(SUFFIX) dscal.$(SUFFIX)
CSBLAS1OBJS = \
cblas_isamax.$(SUFFIX) cblas_sasum.$(SUFFIX) cblas_saxpy.$(SUFFIX) \
cblas_isamax.$(SUFFIX) cblas_isamin.$(SUFFIX) cblas_sasum.$(SUFFIX) cblas_saxpy.$(SUFFIX) \
cblas_scopy.$(SUFFIX) cblas_sdot.$(SUFFIX) cblas_sdsdot.$(SUFFIX) cblas_dsdot.$(SUFFIX) \
cblas_srot.$(SUFFIX) cblas_srotg.$(SUFFIX) cblas_srotm.$(SUFFIX) cblas_srotmg.$(SUFFIX) \
cblas_sscal.$(SUFFIX) cblas_sswap.$(SUFFIX) cblas_snrm2.$(SUFFIX) cblas_saxpby.$(SUFFIX)
@@ -277,7 +277,7 @@ CSBLAS3OBJS = \
cblas_sgeadd.$(SUFFIX)
CDBLAS1OBJS = \
cblas_idamax.$(SUFFIX) cblas_dasum.$(SUFFIX) cblas_daxpy.$(SUFFIX) \
cblas_idamax.$(SUFFIX) cblas_idamin.$(SUFFIX) cblas_dasum.$(SUFFIX) cblas_daxpy.$(SUFFIX) \
cblas_dcopy.$(SUFFIX) cblas_ddot.$(SUFFIX) \
cblas_drot.$(SUFFIX) cblas_drotg.$(SUFFIX) cblas_drotm.$(SUFFIX) cblas_drotmg.$(SUFFIX) \
cblas_dscal.$(SUFFIX) cblas_dswap.$(SUFFIX) cblas_dnrm2.$(SUFFIX) cblas_daxpby.$(SUFFIX)
@@ -294,7 +294,7 @@ CDBLAS3OBJS += \
cblas_dgeadd.$(SUFFIX)
CCBLAS1OBJS = \
cblas_icamax.$(SUFFIX) cblas_scasum.$(SUFFIX) cblas_caxpy.$(SUFFIX) \
cblas_icamax.$(SUFFIX) cblas_icamin.$(SUFFIX) cblas_scasum.$(SUFFIX) cblas_caxpy.$(SUFFIX) \
cblas_ccopy.$(SUFFIX) \
cblas_cdotc.$(SUFFIX) cblas_cdotu.$(SUFFIX) \
cblas_cdotc_sub.$(SUFFIX) cblas_cdotu_sub.$(SUFFIX) \
@@ -320,7 +320,7 @@ CCBLAS3OBJS = \
CZBLAS1OBJS = \
cblas_izamax.$(SUFFIX) cblas_dzasum.$(SUFFIX) cblas_zaxpy.$(SUFFIX) \
cblas_izamax.$(SUFFIX) cblas_izamin.$(SUFFIX) cblas_dzasum.$(SUFFIX) cblas_zaxpy.$(SUFFIX) \
cblas_zcopy.$(SUFFIX) \
cblas_zdotc.$(SUFFIX) cblas_zdotu.$(SUFFIX) \
cblas_zdotc_sub.$(SUFFIX) cblas_zdotu_sub.$(SUFFIX) \
@@ -1359,6 +1359,18 @@ cblas_icamax.$(SUFFIX) cblas_icamax.$(PSUFFIX) : imax.c
cblas_izamax.$(SUFFIX) cblas_izamax.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -DUSE_ABS -UUSE_MIN $< -o $(@F)
cblas_isamin.$(SUFFIX) cblas_isamin.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -DUSE_ABS -DUSE_MIN $< -o $(@F)
cblas_idamin.$(SUFFIX) cblas_idamin.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -DUSE_ABS -DUSE_MIN $< -o $(@F)
cblas_icamin.$(SUFFIX) cblas_icamin.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -DUSE_ABS -DUSE_MIN $< -o $(@F)
cblas_izamin.$(SUFFIX) cblas_izamin.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -DUSE_ABS -DUSE_MIN $< -o $(@F)
cblas_ismax.$(SUFFIX) cblas_ismax.$(PSUFFIX) : imax.c
$(CC) $(CFLAGS) -DCBLAS -c -UUSE_ABS -UUSE_MIN $< -o $(@F)

View File

@@ -40,11 +40,11 @@
#include "common.h"
#ifdef FUNCTION_PROFILE
#include "functable.h"
#endif
#endif
#if defined(Z13)
#define MULTI_THREAD_MINIMAL 200000
#else
#define MULTI_THREAD_MINIMAL 10000
#define MULTI_THREAD_MINIMAL 10000
#endif
#ifndef CBLAS
@@ -83,17 +83,15 @@ void CNAME(blasint n, FLOAT alpha, FLOAT *x, blasint incx, FLOAT *y, blasint inc
if (incy < 0) y -= (n - 1) * incy;
#ifdef SMP
nthreads = num_cpu_avail(1);
//disable multi-thread when incx==0 or incy==0
//In that case, the threads would be dependent.
if (incx == 0 || incy == 0)
nthreads = 1;
//
//Temporarily work-around the low performance issue with small imput size &
//multithreads.
if (n <= MULTI_THREAD_MINIMAL)
if (incx == 0 || incy == 0 || n <= MULTI_THREAD_MINIMAL)
nthreads = 1;
else
nthreads = num_cpu_avail(1);
if (nthreads == 1) {
#endif

View File

@@ -213,7 +213,7 @@ void CNAME(enum CBLAS_ORDER order,
if (trans) lenx = m;
if (trans) leny = n;
if (beta != ONE) SCAL_K(leny, 0, 0, beta, y, abs(incy), NULL, 0, NULL, 0);
if (beta != ONE) SCAL_K(leny, 0, 0, beta, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha == ZERO) return;

View File

@@ -199,7 +199,7 @@ void CNAME(enum CBLAS_ORDER order,
if (trans) lenx = m;
if (trans) leny = n;
if (beta != ONE) SCAL_K(leny, 0, 0, beta, y, abs(incy), NULL, 0, NULL, 0);
if (beta != ONE) SCAL_K(leny, 0, 0, beta, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha == ZERO) return;

View File

@@ -97,7 +97,7 @@ int NAME(blasint *N, FLOAT *a, blasint *LDA, blasint *K1, blasint *K2, blasint *
blas_level1_thread(mode, n, k1, k2, dummyalpha,
a, lda, NULL, 0, ipiv, incx,
laswp[flag], nthreads);
(int(*)())laswp[flag], nthreads);
}
#endif

View File

@@ -96,7 +96,7 @@ int NAME(blasint *N, FLOAT *a, blasint *LDA, blasint *K1, blasint *K2, blasint *
mode = BLAS_SINGLE | BLAS_COMPLEX;
#endif
blas_level1_thread(mode, n, k1, k2, dummyalpha, a, lda, NULL, 0, ipiv, incx, laswp[flag], nthreads);
blas_level1_thread(mode, n, k1, k2, dummyalpha, a, lda, NULL, 0, ipiv, incx, (int(*)())laswp[flag], nthreads);
}
#endif

View File

@@ -22,8 +22,8 @@ void CNAME(FLOAT *DA, FLOAT *DB, FLOAT *C, FLOAT *S){
long double s;
long double r, roe, z;
long double ada = fabs(da);
long double adb = fabs(db);
long double ada = fabsl(da);
long double adb = fabsl(db);
long double scale = ada + adb;
#ifndef CBLAS

View File

@@ -184,7 +184,7 @@ void CNAME(enum CBLAS_ORDER order,
if (n == 0) return;
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, abs(incy), NULL, 0, NULL, 0);
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha == ZERO) return;

View File

@@ -76,10 +76,11 @@ void CNAME(blasint n, FLOAT alpha, FLOAT *x, blasint incx){
#ifdef SMP
nthreads = num_cpu_avail(1);
if (n <= 1048576 )
nthreads = 1;
else
nthreads = num_cpu_avail(1);
if (nthreads == 1) {
#endif

View File

@@ -168,7 +168,7 @@ void CNAME(enum CBLAS_ORDER order,
if (n == 0) return;
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, abs(incy), NULL, 0, NULL, 0);
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha == ZERO) return;

View File

@@ -166,7 +166,7 @@ void CNAME(enum CBLAS_ORDER order, enum CBLAS_UPLO Uplo, blasint n, FLOAT alpha,
if (n == 0) return;
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, abs(incy), NULL, 0, NULL, 0);
if (beta != ONE) SCAL_K(n, 0, 0, beta, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha == ZERO) return;

View File

@@ -90,18 +90,16 @@ void CNAME(blasint n, FLOAT *ALPHA, FLOAT *x, blasint incx, FLOAT *y, blasint in
if (incy < 0) y -= (n - 1) * incy * 2;
#ifdef SMP
nthreads = num_cpu_avail(1);
//disable multi-thread when incx==0 or incy==0
//In that case, the threads would be dependent.
if (incx == 0 || incy == 0)
nthreads = 1;
//Work around the low performance issue with small imput size &
//
//Temporarily work-around the low performance issue with small imput size &
//multithreads.
if (n <= MULTI_THREAD_MINIMAL) {
if (incx == 0 || incy == 0 || n <= MULTI_THREAD_MINIMAL)
nthreads = 1;
}
else
nthreads = num_cpu_avail(1);
if (nthreads == 1) {
#endif

View File

@@ -237,7 +237,7 @@ void CNAME(enum CBLAS_ORDER order,
if (trans & 1) lenx = m;
if (trans & 1) leny = n;
if (beta_r != ONE || beta_i != ZERO) SCAL_K(leny, 0, 0, beta_r, beta_i, y, abs(incy), NULL, 0, NULL, 0);
if (beta_r != ONE || beta_i != ZERO) SCAL_K(leny, 0, 0, beta_r, beta_i, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha_r == ZERO && alpha_i == ZERO) return;

View File

@@ -225,7 +225,7 @@ void CNAME(enum CBLAS_ORDER order,
if (trans & 1) lenx = m;
if (trans & 1) leny = n;
if (beta_r != ONE || beta_i != ZERO) SCAL_K(leny, 0, 0, beta_r, beta_i, y, abs(incy), NULL, 0, NULL, 0);
if (beta_r != ONE || beta_i != ZERO) SCAL_K(leny, 0, 0, beta_r, beta_i, y, blasabs(incy), NULL, 0, NULL, 0);
if (alpha_r == ZERO && alpha_i == ZERO) return;

View File

@@ -190,7 +190,7 @@ void CNAME(enum CBLAS_ORDER order,
if (n == 0) return;
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, abs(incy), NULL, 0, NULL, 0);
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, blasabs(incy), NULL, 0, NULL, 0);
if ((alpha_r == ZERO) && (alpha_i == ZERO)) return;

View File

@@ -181,7 +181,7 @@ void CNAME(enum CBLAS_ORDER order, enum CBLAS_UPLO Uplo, blasint n, void *VALPHA
if (n == 0) return;
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, abs(incy), NULL, 0, NULL, 0);
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, blasabs(incy), NULL, 0, NULL, 0);
if ((alpha_r == ZERO) && (alpha_i == ZERO)) return;

View File

@@ -180,7 +180,7 @@ void CNAME(enum CBLAS_ORDER order,
if (n == 0) return;
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, abs(incy), NULL, 0, NULL, 0);
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, y, blasabs(incy), NULL, 0, NULL, 0);
if ((alpha_r == ZERO) && (alpha_i == ZERO)) return;

View File

@@ -14,7 +14,7 @@ void NAME(FLOAT *DA, FLOAT *DB, FLOAT *C, FLOAT *S){
long double db_i = *(DB + 1);
long double r;
long double ada = fabs(da_r) + fabs(da_i);
long double ada = fabsl(da_r) + fabsl(da_i);
PRINT_DEBUG_NAME;

View File

@@ -126,7 +126,7 @@ void NAME(char *UPLO, blasint *N, blasint *K, FLOAT *ALPHA, FLOAT *a, blasint *
if (n == 0) return;
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, c, abs(incy), NULL, 0, NULL, 0);
if ((beta_r != ONE) || (beta_i != ZERO)) SCAL_K(n, 0, 0, beta_r, beta_i, c, blasabs(incy), NULL, 0, NULL, 0);
if ((alpha_r == ZERO) && (alpha_i == ZERO)) return;

View File

@@ -90,10 +90,10 @@ void CNAME(blasint n, FLOAT alpha_r, void *vx, blasint incx){
FUNCTION_PROFILE_START();
#ifdef SMP
nthreads = num_cpu_avail(1);
if ( n <= 1048576 )
nthreads = 1;
else
nthreads = num_cpu_avail(1);
if (nthreads == 1) {
#endif

View File

@@ -79,12 +79,12 @@ FLOAT *y = (FLOAT*)vy;
if (incy < 0) y -= (n - 1) * incy * 2;
#ifdef SMP
nthreads = num_cpu_avail(1);
//disable multi-thread when incx==0 or incy==0
//In that case, the threads would be dependent.
if (incx == 0 || incy == 0)
nthreads = 1;
else
nthreads = num_cpu_avail(1);
if (nthreads == 1) {
#endif

View File

@@ -44,7 +44,7 @@ ifeq ($(CORE), POWER8)
USE_TRMM = 1
endif
ifeq ($(CORE), Z13)
ifeq ($(ARCH), zarch)
USE_TRMM = 1
endif

View File

@@ -58,11 +58,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
fldmiad X!, { d6 - d7 }
vldmia.f64 X!, { d6 - d7 }
vabs.f64 d6, d6
vadd.f64 d1 , d1, d5
vabs.f64 d7, d7
@@ -73,7 +73,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
@@ -82,22 +82,22 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
add X, X, INC_X
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
add X, X, INC_X
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
add X, X, INC_X
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
add X, X, INC_X
@@ -107,7 +107,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
add X, X, INC_X
@@ -118,11 +118,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
fldmias X!, { s6 - s7 }
vldmia.f32 X!, { s6 - s7 }
vabs.f32 s6, s6
vadd.f32 s1 , s1, s5
vabs.f32 s7, s7
@@ -133,7 +133,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
@@ -142,22 +142,22 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
add X, X, INC_X
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
add X, X, INC_X
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
add X, X, INC_X
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
add X, X, INC_X
@@ -167,7 +167,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
add X, X, INC_X
@@ -184,11 +184,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
fldmiad X!, { d6 - d7 }
vldmia.f64 X!, { d6 - d7 }
vabs.f64 d6, d6
vadd.f64 d1 , d1, d5
vabs.f64 d7, d7
@@ -196,11 +196,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
vadd.f64 d1 , d1, d7
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
fldmiad X!, { d6 - d7 }
vldmia.f64 X!, { d6 - d7 }
vabs.f64 d6, d6
vadd.f64 d1 , d1, d5
vabs.f64 d7, d7
@@ -212,11 +212,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
@@ -226,28 +226,28 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
fldmiad X, { d4 -d5 }
vldmia.f64 X, { d4 -d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
vadd.f64 d0 , d0, d5
add X, X, INC_X
fldmiad X, { d4 -d5 }
vldmia.f64 X, { d4 -d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
vadd.f64 d0 , d0, d5
add X, X, INC_X
fldmiad X, { d4 -d5 }
vldmia.f64 X, { d4 -d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
vadd.f64 d0 , d0, d5
add X, X, INC_X
fldmiad X, { d4 -d5 }
vldmia.f64 X, { d4 -d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
@@ -259,7 +259,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 -d5 }
vldmia.f64 X, { d4 -d5 }
vabs.f64 d4, d4
vadd.f64 d0 , d0, d4
vabs.f64 d5, d5
@@ -273,22 +273,22 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
fldmias X!, { s6 - s7 }
vldmia.f32 X!, { s6 - s7 }
vabs.f32 s6, s6
vadd.f32 s1 , s1, s5
vabs.f32 s7, s7
vadd.f32 s0 , s0, s6
vadd.f32 s1 , s1, s7
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
fldmias X!, { s6 - s7 }
vldmia.f32 X!, { s6 - s7 }
vabs.f32 s6, s6
vadd.f32 s1 , s1, s5
vabs.f32 s7, s7
@@ -300,11 +300,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
@@ -313,28 +313,28 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
fldmias X, { s4 -s5 }
vldmia.f32 X, { s4 -s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
vadd.f32 s0 , s0, s5
add X, X, INC_X
fldmias X, { s4 -s5 }
vldmia.f32 X, { s4 -s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
vadd.f32 s0 , s0, s5
add X, X, INC_X
fldmias X, { s4 -s5 }
vldmia.f32 X, { s4 -s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
vadd.f32 s0 , s0, s5
add X, X, INC_X
fldmias X, { s4 -s5 }
vldmia.f32 X, { s4 -s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5
@@ -346,7 +346,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 -s5 }
vldmia.f32 X, { s4 -s5 }
vabs.f32 s4, s4
vadd.f32 s0 , s0, s4
vabs.f32 s5, s5

View File

@@ -146,17 +146,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d7 }
vldmia.f64 X!, { d4 - d7 }
pld [ Y, #X_PRE ]
fldmiad Y , { d8 - d11 }
vldmia.f64 Y , { d8 - d11 }
fmacd d8 , d0, d4
fstmiad Y!, { d8 }
vstmia.f64 Y!, { d8 }
fmacd d9 , d0, d5
fstmiad Y!, { d9 }
vstmia.f64 Y!, { d9 }
fmacd d10, d0, d6
fstmiad Y!, { d10 }
vstmia.f64 Y!, { d10 }
fmacd d11, d0, d7
fstmiad Y!, { d11 }
vstmia.f64 Y!, { d11 }
.endm
@@ -164,19 +164,19 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
fldmiad Y , { d8 }
vldmia.f64 X!, { d4 }
vldmia.f64 Y , { d8 }
fmacd d8 , d0, d4
fstmiad Y!, { d8 }
vstmia.f64 Y!, { d8 }
.endm
.macro KERNEL_S1
fldmiad X , { d4 }
fldmiad Y , { d8 }
vldmia.f64 X , { d4 }
vldmia.f64 Y , { d8 }
fmacd d8 , d0, d4
fstmiad Y , { d8 }
vstmia.f64 Y , { d8 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -186,16 +186,16 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X!, { s4 - s7 }
fldmias Y , { s8 - s11 }
vldmia.f32 X!, { s4 - s7 }
vldmia.f32 Y , { s8 - s11 }
fmacs s8 , s0, s4
fstmias Y!, { s8 }
vstmia.f32 Y!, { s8 }
fmacs s9 , s0, s5
fstmias Y!, { s9 }
vstmia.f32 Y!, { s9 }
fmacs s10, s0, s6
fstmias Y!, { s10 }
vstmia.f32 Y!, { s10 }
fmacs s11, s0, s7
fstmias Y!, { s11 }
vstmia.f32 Y!, { s11 }
.endm
@@ -203,19 +203,19 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 }
fldmias Y , { s8 }
vldmia.f32 X!, { s4 }
vldmia.f32 Y , { s8 }
fmacs s8 , s0, s4
fstmias Y!, { s8 }
vstmia.f32 Y!, { s8 }
.endm
.macro KERNEL_S1
fldmias X , { s4 }
fldmias Y , { s8 }
vldmia.f32 X , { s4 }
vldmia.f32 Y , { s8 }
fmacs s8 , s0, s4
fstmias Y , { s8 }
vstmia.f32 Y , { s8 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -231,42 +231,42 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d7 }
vldmia.f64 X!, { d4 - d7 }
pld [ Y, #X_PRE ]
fldmiad Y , { d8 - d11 }
vldmia.f64 Y , { d8 - d11 }
FMAC_R1 d8 , d0, d4
FMAC_R2 d8 , d1, d5
FMAC_I1 d9 , d0, d5
FMAC_I2 d9 , d1, d4
fstmiad Y!, { d8 }
fstmiad Y!, { d9 }
vstmia.f64 Y!, { d8 }
vstmia.f64 Y!, { d9 }
FMAC_R1 d10, d0, d6
FMAC_R2 d10, d1, d7
FMAC_I1 d11, d0, d7
FMAC_I2 d11, d1, d6
fstmiad Y!, { d10 }
fstmiad Y!, { d11 }
vstmia.f64 Y!, { d10 }
vstmia.f64 Y!, { d11 }
pld [ X, #X_PRE ]
fldmiad X!, { d4 - d7 }
vldmia.f64 X!, { d4 - d7 }
pld [ Y, #X_PRE ]
fldmiad Y , { d8 - d11 }
vldmia.f64 Y , { d8 - d11 }
FMAC_R1 d8 , d0, d4
FMAC_R2 d8 , d1, d5
FMAC_I1 d9 , d0, d5
FMAC_I2 d9 , d1, d4
fstmiad Y!, { d8 }
fstmiad Y!, { d9 }
vstmia.f64 Y!, { d8 }
vstmia.f64 Y!, { d9 }
FMAC_R1 d10, d0, d6
FMAC_R2 d10, d1, d7
FMAC_I1 d11, d0, d7
FMAC_I2 d11, d1, d6
fstmiad Y!, { d10 }
fstmiad Y!, { d11 }
vstmia.f64 Y!, { d10 }
vstmia.f64 Y!, { d11 }
@@ -277,15 +277,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 - d5 }
fldmiad Y , { d8 - d9 }
vldmia.f64 X!, { d4 - d5 }
vldmia.f64 Y , { d8 - d9 }
FMAC_R1 d8 , d0, d4
FMAC_R2 d8 , d1, d5
FMAC_I1 d9 , d0, d5
FMAC_I2 d9 , d1, d4
fstmiad Y!, { d8 }
fstmiad Y!, { d9 }
vstmia.f64 Y!, { d8 }
vstmia.f64 Y!, { d9 }
@@ -293,14 +293,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X , { d4 - d5 }
fldmiad Y , { d8 - d9 }
vldmia.f64 X , { d4 - d5 }
vldmia.f64 Y , { d8 - d9 }
FMAC_R1 d8 , d0, d4
FMAC_R2 d8 , d1, d5
FMAC_I1 d9 , d0, d5
FMAC_I2 d9 , d1, d4
fstmiad Y , { d8 - d9 }
vstmia.f64 Y , { d8 - d9 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -314,40 +314,40 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmias X!, { s4 - s7 }
vldmia.f32 X!, { s4 - s7 }
pld [ Y, #X_PRE ]
fldmias Y , { s8 - s11 }
vldmia.f32 Y , { s8 - s11 }
FMAC_R1 s8 , s0, s4
FMAC_R2 s8 , s1, s5
FMAC_I1 s9 , s0, s5
FMAC_I2 s9 , s1, s4
fstmias Y!, { s8 }
fstmias Y!, { s9 }
vstmia.f32 Y!, { s8 }
vstmia.f32 Y!, { s9 }
FMAC_R1 s10, s0, s6
FMAC_R2 s10, s1, s7
FMAC_I1 s11, s0, s7
FMAC_I2 s11, s1, s6
fstmias Y!, { s10 }
fstmias Y!, { s11 }
vstmia.f32 Y!, { s10 }
vstmia.f32 Y!, { s11 }
fldmias X!, { s4 - s7 }
fldmias Y , { s8 - s11 }
vldmia.f32 X!, { s4 - s7 }
vldmia.f32 Y , { s8 - s11 }
FMAC_R1 s8 , s0, s4
FMAC_R2 s8 , s1, s5
FMAC_I1 s9 , s0, s5
FMAC_I2 s9 , s1, s4
fstmias Y!, { s8 }
fstmias Y!, { s9 }
vstmia.f32 Y!, { s8 }
vstmia.f32 Y!, { s9 }
FMAC_R1 s10, s0, s6
FMAC_R2 s10, s1, s7
FMAC_I1 s11, s0, s7
FMAC_I2 s11, s1, s6
fstmias Y!, { s10 }
fstmias Y!, { s11 }
vstmia.f32 Y!, { s10 }
vstmia.f32 Y!, { s11 }
@@ -358,15 +358,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 - s5 }
fldmias Y , { s8 - s9 }
vldmia.f32 X!, { s4 - s5 }
vldmia.f32 Y , { s8 - s9 }
FMAC_R1 s8 , s0, s4
FMAC_R2 s8 , s1, s5
FMAC_I1 s9 , s0, s5
FMAC_I2 s9 , s1, s4
fstmias Y!, { s8 }
fstmias Y!, { s9 }
vstmia.f32 Y!, { s8 }
vstmia.f32 Y!, { s9 }
@@ -374,14 +374,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X , { s4 - s5 }
fldmias Y , { s8 - s9 }
vldmia.f32 X , { s4 - s5 }
vldmia.f32 Y , { s8 - s9 }
FMAC_R1 s8 , s0, s4
FMAC_R2 s8 , s1, s5
FMAC_I1 s9 , s0, s5
FMAC_I2 s9 , s1, s4
fstmias Y , { s8 - s9 }
vstmia.f32 Y , { s8 - s9 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -65,15 +65,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_F4
pld [ X, #X_PRE ]
fldmias X!, { s0 - s7 }
fstmias Y!, { s0 - s7 }
vldmia.f32 X!, { s0 - s7 }
vstmia.f32 Y!, { s0 - s7 }
.endm
.macro COPY_F1
fldmias X!, { s0 - s1 }
fstmias Y!, { s0 - s1 }
vldmia.f32 X!, { s0 - s1 }
vstmia.f32 Y!, { s0 - s1 }
.endm
@@ -83,23 +83,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S4
nop
fldmias X, { s0 - s1 }
fstmias Y, { s0 - s1 }
vldmia.f32 X, { s0 - s1 }
vstmia.f32 Y, { s0 - s1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s2 - s3 }
fstmias Y, { s2 - s3 }
vldmia.f32 X, { s2 - s3 }
vstmia.f32 Y, { s2 - s3 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s0 - s1 }
fstmias Y, { s0 - s1 }
vldmia.f32 X, { s0 - s1 }
vstmia.f32 Y, { s0 - s1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s2 - s3 }
fstmias Y, { s2 - s3 }
vldmia.f32 X, { s2 - s3 }
vstmia.f32 Y, { s2 - s3 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -108,8 +108,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S1
fldmias X, { s0 - s1 }
fstmias Y, { s0 - s1 }
vldmia.f32 X, { s0 - s1 }
vstmia.f32 Y, { s0 - s1 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -76,30 +76,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmias X!, { s4 - s5 }
fldmias Y!, { s8 - s9 }
vldmia.f32 X!, { s4 - s5 }
vldmia.f32 Y!, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fldmias X!, { s6 - s7 }
vldmia.f32 X!, { s6 - s7 }
fmacs s2 , s5, s9
fmacs s3 , s5, s8
fldmias Y!, { s10 - s11 }
vldmia.f32 Y!, { s10 - s11 }
fmacs s0 , s6, s10
fmacs s1 , s6, s11
fmacs s2 , s7, s11
fmacs s3 , s7, s10
fldmias X!, { s4 - s5 }
fldmias Y!, { s8 - s9 }
vldmia.f32 X!, { s4 - s5 }
vldmia.f32 Y!, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fldmias X!, { s6 - s7 }
vldmia.f32 X!, { s6 - s7 }
fmacs s2 , s5, s9
fmacs s3 , s5, s8
fldmias Y!, { s10 - s11 }
vldmia.f32 Y!, { s10 - s11 }
fmacs s0 , s6, s10
fmacs s1 , s6, s11
fmacs s2 , s7, s11
@@ -109,8 +109,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 - s5 }
fldmias Y!, { s8 - s9 }
vldmia.f32 X!, { s4 - s5 }
vldmia.f32 Y!, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -125,8 +125,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
nop
fldmias X, { s4 - s5 }
fldmias Y, { s8 - s9 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -134,8 +134,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s4 - s5 }
fldmias Y, { s8 - s9 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -143,8 +143,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s4 - s5 }
fldmias Y, { s8 - s9 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -152,8 +152,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s4 - s5 }
fldmias Y, { s8 - s9 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -166,8 +166,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 - s5 }
fldmias Y, { s8 - s9 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s8 - s9 }
fmacs s0 , s4, s8
fmacs s1 , s4, s9
fmacs s2 , s5, s9
@@ -215,11 +215,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cmp N, #0
ble cdot_kernel_L999
cmp INC_X, #0
beq cdot_kernel_L999
# cmp INC_X, #0
# beq cdot_kernel_L999
cmp INC_Y, #0
beq cdot_kernel_L999
# cmp INC_Y, #0
# beq cdot_kernel_L999
cmp INC_X, #1
bne cdot_kernel_S_BEGIN

View File

@@ -165,9 +165,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_I
pld [ AO, #A_PRE ]
fldmias AO!, { s0 - s3 }
vldmia.f32 AO!, { s0 - s3 }
pld [ BO, #B_PRE ]
fldmias BO!, { s4 - s7 }
vldmia.f32 BO!, { s4 - s7 }
fmuls s8 , s0, s4
@@ -197,9 +197,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M1
pld [ AO, #A_PRE ]
fldmias AO!, { s0 - s3 }
vldmia.f32 AO!, { s0 - s3 }
pld [ BO, #B_PRE ]
fldmias BO!, { s4 - s7 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -225,8 +225,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M2
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -254,8 +254,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_E
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -282,8 +282,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_SUB
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -317,7 +317,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s7 }
vldmia.f32 CO1, { s4 - s7 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
@@ -329,9 +329,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
fldmias CO2, { s4 - s7 }
vldmia.f32 CO2, { s4 - s7 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
@@ -343,7 +343,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias CO2, { s4 - s7 }
vstmia.f32 CO2, { s4 - s7 }
add CO1, CO1, #16
@@ -500,23 +500,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s5 }
vldmia.f32 CO1, { s4 - s5 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
fldmias CO2, { s4 - s5 }
vldmia.f32 CO2, { s4 - s5 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias CO2, { s4 - s5 }
vstmia.f32 CO2, { s4 - s5 }
add CO1, CO1, #8
@@ -671,7 +671,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s7 }
vldmia.f32 CO1, { s4 - s7 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
@@ -683,7 +683,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
add CO1, CO1, #16
@@ -800,14 +800,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s5 }
vldmia.f32 CO1, { s4 - s5 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
add CO1, CO1, #8

View File

@@ -182,30 +182,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_I
pld [ AO , #A_PRE ]
pld [ BO , #B_PRE ]
fldmias AO!, { s0 - s1 }
fldmias BO!, { s8 - s9 }
vldmia.f32 AO!, { s0 - s1 }
vldmia.f32 BO!, { s8 - s9 }
fmuls s16 , s0, s8
fmuls s24 , s1, s9
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmuls s17 , s0, s9
fmuls s25 , s1, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmuls s18 , s2, s8
fmuls s26 , s3, s9
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmuls s19 , s2, s9
fmuls s27 , s3, s8
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmuls s20 , s0, s10
fmuls s28 , s1, s11
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmuls s21 , s0, s11
fmuls s29 , s1, s10
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmuls s22 , s2, s10
fmuls s30 , s3, s11
fmuls s23 , s2, s11
@@ -218,17 +218,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M1
fmacs s16 , s0, s8
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmacs s24 , s1, s9
fmacs s17 , s0, s9
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmacs s25 , s1, s8
fmacs s18 , s2, s8
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmacs s26 , s3, s9
fmacs s19 , s2, s9
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmacs s27 , s3, s8
fmacs s20 , s0, s10
@@ -250,19 +250,19 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ BO , #B_PRE ]
fmacs s24 , s5, s13
fmacs s17 , s4, s13
fldmias AO!, { s0 - s1 }
vldmia.f32 AO!, { s0 - s1 }
fmacs s25 , s5, s12
fmacs s18 , s6, s12
fmacs s26 , s7, s13
fldmias BO!, { s8 - s9 }
vldmia.f32 BO!, { s8 - s9 }
fmacs s19 , s6, s13
fmacs s27 , s7, s12
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmacs s20 , s4, s14
fmacs s28 , s5, s15
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmacs s21 , s4, s15
fmacs s29 , s5, s14
@@ -300,16 +300,16 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_SUB
fldmias AO!, { s0 - s1 }
fldmias BO!, { s8 - s9 }
vldmia.f32 AO!, { s0 - s1 }
vldmia.f32 BO!, { s8 - s9 }
fmacs s16 , s0, s8
fmacs s24 , s1, s9
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmacs s17 , s0, s9
fmacs s25 , s1, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmacs s18 , s2, s8
fmacs s26 , s3, s9
fmacs s19 , s2, s9
@@ -338,8 +338,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s7 }
fldmias CO2, { s8 - s11 }
vldmia.f32 CO1, { s4 - s7 }
vldmia.f32 CO2, { s8 - s11 }
FADD_R s16, s24 , s16
FADD_I s17, s25 , s17
@@ -370,8 +370,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s10, s1 , s23
FMAC_I2 s11, s1 , s22
fstmias CO1, { s4 - s7 }
fstmias CO2, { s8 - s11 }
vstmia.f32 CO1, { s4 - s7 }
vstmia.f32 CO2, { s8 - s11 }
add CO1, CO1, #16
@@ -534,8 +534,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s5 }
fldmias CO2, { s8 - s9 }
vldmia.f32 CO1, { s4 - s5 }
vldmia.f32 CO2, { s8 - s9 }
FADD_R s16, s24 , s16
FADD_I s17, s25 , s17
@@ -552,8 +552,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s8 , s1 , s21
FMAC_I2 s9 , s1 , s20
fstmias CO1, { s4 - s5 }
fstmias CO2, { s8 - s9 }
vstmia.f32 CO1, { s4 - s5 }
vstmia.f32 CO2, { s8 - s9 }
add CO1, CO1, #8
@@ -716,7 +716,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s7 }
vldmia.f32 CO1, { s4 - s7 }
FADD_R s16, s24 , s16
FADD_I s17, s25 , s17
@@ -733,7 +733,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s19
FMAC_I2 s7 , s1 , s18
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
add CO1, CO1, #16
@@ -851,7 +851,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias CO1, { s4 - s5 }
vldmia.f32 CO1, { s4 - s5 }
FADD_R s16, s24 , s16
FADD_I s17, s25 , s17
@@ -861,7 +861,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s4 , s1 , s17
FMAC_I2 s5 , s1 , s16
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
add CO1, CO1, #8

View File

@@ -85,7 +85,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s6 , [ AO2, #8 ]
flds s7 , [ AO2, #12 ]
fstmias BO!, { s0 - s7 }
vstmia.f32 BO!, { s0 - s7 }
add AO2, AO2, #16
.endm
@@ -99,7 +99,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s3 , [ AO2, #4 ]
add AO1, AO1, #8
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO2, AO2, #8
.endm
@@ -111,7 +111,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s2 , [ AO1, #8 ]
flds s3 , [ AO1, #12 ]
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO1, AO1, #16
.endm
@@ -122,7 +122,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0 , [ AO1, #0 ]
flds s1 , [ AO1, #4 ]
fstmias BO!, { s0 - s1 }
vstmia.f32 BO!, { s0 - s1 }
add AO1, AO1, #8
.endm

View File

@@ -73,12 +73,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**************************************************************************************/
.macro COPY2x2
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
add r3, AO1, LDA
fldmias r3, { s4 - s7 }
vldmia.f32 r3, { s4 - s7 }
fstmias BO1, { s0 - s7 }
vstmia.f32 BO1, { s0 - s7 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -86,12 +86,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x2
fldmias AO1, { s0 -s1 }
vldmia.f32 AO1, { s0 -s1 }
add r3, AO1, LDA
fldmias r3, { s2 - s3 }
vldmia.f32 r3, { s2 - s3 }
fstmias BO2, { s0 - s3 }
vstmia.f32 BO2, { s0 - s3 }
add AO1, AO1, #8
add BO2, BO2, #16
@@ -100,9 +100,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/*************************************************************************************************************************/
.macro COPY2x1
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
fstmias BO1, { s0 - s3 }
vstmia.f32 BO1, { s0 - s3 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -110,9 +110,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x1
fldmias AO1, { s0 - s1 }
vldmia.f32 AO1, { s0 - s1 }
fstmias BO2, { s0 - s1 }
vstmia.f32 BO2, { s0 - s1 }
add AO1, AO1, #8
add BO2, BO2, #8

View File

@@ -201,7 +201,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias YO, { s4 - s7 }
vldmia.f32 YO, { s4 - s7 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
@@ -213,9 +213,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias YO!, { s4 - s7 }
vstmia.f32 YO!, { s4 - s7 }
fldmias YO, { s4 - s7 }
vldmia.f32 YO, { s4 - s7 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
@@ -227,7 +227,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias YO!, { s4 - s7 }
vstmia.f32 YO!, { s4 - s7 }
.endm
@@ -266,14 +266,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, #8
@@ -349,47 +349,47 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, INC_Y
fldmias YO, { s6 - s7 }
vldmia.f32 YO, { s6 - s7 }
FMAC_R1 s6 , s0 , s10
FMAC_I1 s7 , s0 , s11
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias YO, { s6 - s7 }
vstmia.f32 YO, { s6 - s7 }
add YO, YO, INC_Y
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, INC_Y
fldmias YO, { s6 - s7 }
vldmia.f32 YO, { s6 - s7 }
FMAC_R1 s6 , s0 , s14
FMAC_I1 s7 , s0 , s15
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias YO, { s6 - s7 }
vstmia.f32 YO, { s6 - s7 }
add YO, YO, INC_Y
@@ -430,14 +430,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA_R
flds s1, ALPHA_I
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s8
FMAC_I1 s5 , s0 , s9
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, INC_Y

View File

@@ -150,9 +150,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X1
fldmias XO! , { s2 - s3 }
fldmias AO1!, { s4 - s5 }
fldmias AO2!, { s8 - s9 }
vldmia.f32 XO! , { s2 - s3 }
vldmia.f32 AO1!, { s4 - s5 }
vldmia.f32 AO2!, { s8 - s9 }
fmacs s12 , s4 , s2
fmacs s13 , s4 , s3
@@ -168,7 +168,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F2
fldmias YO, { s4 - s7 }
vldmia.f32 YO, { s4 - s7 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
@@ -180,7 +180,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias YO!, { s4 - s7 }
vstmia.f32 YO!, { s4 - s7 }
.endm
@@ -204,8 +204,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmias XO! , { s2 - s3 }
fldmias AO1!, { s4 - s5 }
vldmia.f32 XO! , { s2 - s3 }
vldmia.f32 AO1!, { s4 - s5 }
fmacs s12 , s4 , s2
fmacs s13 , s4 , s3
@@ -216,14 +216,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F1
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias YO!, { s4 - s5 }
vstmia.f32 YO!, { s4 - s5 }
.endm
@@ -249,9 +249,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X1
fldmias XO , { s2 - s3 }
fldmias AO1!, { s4 - s5 }
fldmias AO2!, { s8 - s9 }
vldmia.f32 XO , { s2 - s3 }
vldmia.f32 AO1!, { s4 - s5 }
vldmia.f32 AO2!, { s8 - s9 }
fmacs s12 , s4 , s2
fmacs s13 , s4 , s3
@@ -269,25 +269,25 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S2
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, INC_Y
fldmias YO, { s6 - s7 }
vldmia.f32 YO, { s6 - s7 }
FMAC_R1 s6 , s0 , s14
FMAC_I1 s7 , s0 , s15
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias YO, { s6 - s7 }
vstmia.f32 YO, { s6 - s7 }
add YO, YO, INC_Y
@@ -313,8 +313,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmias XO , { s2 - s3 }
fldmias AO1!, { s4 - s5 }
vldmia.f32 XO , { s2 - s3 }
vldmia.f32 AO1!, { s4 - s5 }
fmacs s12 , s4 , s2
fmacs s13 , s4 , s3
@@ -327,14 +327,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
FMAC_R1 s4 , s0 , s12
FMAC_I1 s5 , s0 , s13
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias YO, { s4 - s5 }
vstmia.f32 YO, { s4 - s5 }
add YO, YO, INC_Y

View File

@@ -165,9 +165,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_I
pld [ AO, #A_PRE ]
fldmias AO!, { s0 - s3 }
vldmia.f32 AO!, { s0 - s3 }
pld [ BO, #B_PRE ]
fldmias BO!, { s4 - s7 }
vldmia.f32 BO!, { s4 - s7 }
fmuls s8 , s0, s4
@@ -197,9 +197,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M1
pld [ AO, #A_PRE ]
fldmias AO!, { s0 - s3 }
vldmia.f32 AO!, { s0 - s3 }
pld [ BO, #B_PRE ]
fldmias BO!, { s4 - s7 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -225,8 +225,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M2
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -254,8 +254,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_E
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -282,8 +282,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_SUB
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s7 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s7 }
fmacs s8 , s0, s4
fmacs s9 , s0, s5
@@ -331,7 +331,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
flds s4, FP_ZERO
vmov.f32 s5, s4
@@ -348,7 +348,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s15
FMAC_I2 s7 , s1 , s14
fstmias CO2, { s4 - s7 }
vstmia.f32 CO2, { s4 - s7 }
add CO1, CO1, #16
@@ -513,7 +513,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
flds s4, FP_ZERO
vmov.f32 s5, s4
@@ -523,7 +523,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s4 , s1 , s13
FMAC_I2 s5 , s1 , s12
fstmias CO2, { s4 - s5 }
vstmia.f32 CO2, { s4 - s5 }
add CO1, CO1, #8
@@ -693,7 +693,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s11
FMAC_I2 s7 , s1 , s10
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
add CO1, CO1, #16
@@ -818,7 +818,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s4 , s1 , s9
FMAC_I2 s5 , s1 , s8
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
add CO1, CO1, #8

View File

@@ -170,30 +170,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_I
pld [ AO , #A_PRE ]
pld [ BO , #B_PRE ]
fldmias AO!, { s0 - s1 }
fldmias BO!, { s8 - s9 }
vldmia.f32 AO!, { s0 - s1 }
vldmia.f32 BO!, { s8 - s9 }
fmuls s16 , s0, s8
fmuls s24 , s1, s9
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmuls s17 , s0, s9
fmuls s25 , s1, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmuls s18 , s2, s8
fmuls s26 , s3, s9
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmuls s19 , s2, s9
fmuls s27 , s3, s8
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmuls s20 , s0, s10
fmuls s28 , s1, s11
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmuls s21 , s0, s11
fmuls s29 , s1, s10
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmuls s22 , s2, s10
fmuls s30 , s3, s11
fmuls s23 , s2, s11
@@ -206,17 +206,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_M1
fmacs s16 , s0, s8
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmacs s24 , s1, s9
fmacs s17 , s0, s9
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmacs s25 , s1, s8
fmacs s18 , s2, s8
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmacs s26 , s3, s9
fmacs s19 , s2, s9
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmacs s27 , s3, s8
fmacs s20 , s0, s10
@@ -238,19 +238,19 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ BO , #B_PRE ]
fmacs s24 , s5, s13
fmacs s17 , s4, s13
fldmias AO!, { s0 - s1 }
vldmia.f32 AO!, { s0 - s1 }
fmacs s25 , s5, s12
fmacs s18 , s6, s12
fmacs s26 , s7, s13
fldmias BO!, { s8 - s9 }
vldmia.f32 BO!, { s8 - s9 }
fmacs s19 , s6, s13
fmacs s27 , s7, s12
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmacs s20 , s4, s14
fmacs s28 , s5, s15
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmacs s21 , s4, s15
fmacs s29 , s5, s14
@@ -288,16 +288,16 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL2x2_SUB
fldmias AO!, { s0 - s1 }
fldmias BO!, { s8 - s9 }
vldmia.f32 AO!, { s0 - s1 }
vldmia.f32 BO!, { s8 - s9 }
fmacs s16 , s0, s8
fmacs s24 , s1, s9
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmacs s17 , s0, s9
fmacs s25 , s1, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmacs s18 , s2, s8
fmacs s26 , s3, s9
fmacs s19 , s2, s9
@@ -354,8 +354,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s10, s1 , s23
FMAC_I2 s11, s1 , s22
fstmias CO1, { s4 - s7 }
fstmias CO2, { s8 - s11 }
vstmia.f32 CO1, { s4 - s7 }
vstmia.f32 CO2, { s8 - s11 }
add CO1, CO1, #16
@@ -532,8 +532,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s8 , s1 , s21
FMAC_I2 s9 , s1 , s20
fstmias CO1, { s4 - s5 }
fstmias CO2, { s8 - s9 }
vstmia.f32 CO1, { s4 - s5 }
vstmia.f32 CO2, { s8 - s9 }
add CO1, CO1, #8
@@ -710,7 +710,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s6 , s1 , s19
FMAC_I2 s7 , s1 , s18
fstmias CO1, { s4 - s7 }
vstmia.f32 CO1, { s4 - s7 }
add CO1, CO1, #16
@@ -835,7 +835,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 s4 , s1 , s17
FMAC_I2 s5 , s1 , s16
fstmias CO1, { s4 - s5 }
vstmia.f32 CO1, { s4 - s5 }
add CO1, CO1, #8

View File

@@ -65,15 +65,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_F4
pld [ X, #X_PRE ]
fldmiad X!, { d0 - d3 }
fstmiad Y!, { d0 - d3 }
vldmia.f64 X!, { d0 - d3 }
vstmia.f64 Y!, { d0 - d3 }
.endm
.macro COPY_F1
fldmiad X!, { d0 }
fstmiad Y!, { d0 }
vldmia.f64 X!, { d0 }
vstmia.f64 Y!, { d0 }
.endm
@@ -83,23 +83,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S4
nop
fldmiad X, { d0 }
fstmiad Y, { d0 }
vldmia.f64 X, { d0 }
vstmia.f64 Y, { d0 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d1 }
fstmiad Y, { d1 }
vldmia.f64 X, { d1 }
vstmia.f64 Y, { d1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d0 }
fstmiad Y, { d0 }
vldmia.f64 X, { d0 }
vstmia.f64 Y, { d0 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d1 }
fstmiad Y, { d1 }
vldmia.f64 X, { d1 }
vstmia.f64 Y, { d1 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -108,8 +108,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S1
fldmiad X, { d0 }
fstmiad Y, { d0 }
vldmia.f64 X, { d0 }
vstmia.f64 Y, { d0 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -67,26 +67,26 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X!, { d8 }
vldmia.f64 X!, { d8 }
pld [ Y, #X_PRE ]
fldmiad Y!, { d4 }
fldmiad Y!, { d5 }
vldmia.f64 Y!, { d4 }
vldmia.f64 Y!, { d5 }
fmacd d0 , d4, d8
fldmiad X!, { d9 }
fldmiad Y!, { d6 }
vldmia.f64 X!, { d9 }
vldmia.f64 Y!, { d6 }
fmacd d1 , d5, d9
fldmiad X!, { d10 }
fldmiad X!, { d11 }
vldmia.f64 X!, { d10 }
vldmia.f64 X!, { d11 }
fmacd d0 , d6, d10
fldmiad Y!, { d7 }
vldmia.f64 Y!, { d7 }
fmacd d1 , d7, d11
.endm
.macro KERNEL_F1
fldmiad X!, { d4 }
fldmiad Y!, { d8 }
vldmia.f64 X!, { d4 }
vldmia.f64 Y!, { d8 }
fmacd d0 , d4, d8
.endm
@@ -97,26 +97,26 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
nop
fldmiad X, { d4 }
fldmiad Y, { d8 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d8 }
add X, X, INC_X
add Y, Y, INC_Y
fmacd d0 , d4, d8
fldmiad X, { d5 }
fldmiad Y, { d9 }
vldmia.f64 X, { d5 }
vldmia.f64 Y, { d9 }
add X, X, INC_X
add Y, Y, INC_Y
fmacd d1 , d5, d9
fldmiad X, { d6 }
fldmiad Y, { d10 }
vldmia.f64 X, { d6 }
vldmia.f64 Y, { d10 }
add X, X, INC_X
add Y, Y, INC_Y
fmacd d0 , d6, d10
fldmiad X, { d7 }
fldmiad Y, { d11 }
vldmia.f64 X, { d7 }
vldmia.f64 Y, { d11 }
add X, X, INC_X
add Y, Y, INC_Y
fmacd d1 , d7, d11
@@ -126,8 +126,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 }
fldmiad Y, { d8 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d8 }
add X, X, INC_X
fmacd d0 , d4, d8
add Y, Y, INC_Y
@@ -164,11 +164,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cmp N, #0
ble ddot_kernel_L999
cmp INC_X, #0
beq ddot_kernel_L999
# cmp INC_X, #0
# beq ddot_kernel_L999
cmp INC_Y, #0
beq ddot_kernel_L999
# cmp INC_Y, #0
# beq ddot_kernel_L999
cmp INC_X, #1
bne ddot_kernel_S_BEGIN

View File

@@ -331,7 +331,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add r4 , CO2, r3
pld [ CO2 , #C_PRE ]
fldmiad CO1, { d8 - d11 }
vldmia.f64 CO1, { d8 - d11 }
pld [ r4 , #C_PRE ]
fmacd d8 , d0 , d16
@@ -352,7 +352,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fmacd d15, d0 , d23
fstd d11, [CO1, #24 ]
fldmiad r4, { d8 - d11 }
vldmia.f64 r4, { d8 - d11 }
fmacd d8 , d0 , d24
fstd d12, [CO2]
@@ -367,7 +367,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ CO2 , #C_PRE ]
fldmiad CO2, { d12 - d15 }
vldmia.f64 CO2, { d12 - d15 }
fstd d8 , [r4 ]
fmacd d12, d0 , d28
@@ -378,7 +378,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fstd d11, [r4 , #24 ]
fmacd d15, d0 , d31
fstmiad CO2, { d12 - d15 }
vstmia.f64 CO2, { d12 - d15 }
add CO1, CO1, #32

View File

@@ -73,7 +73,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d3 , [ AO2, #8 ]
add AO1, AO1, #16
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO2, AO2, #16
.endm
@@ -85,7 +85,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d1 , [ AO2, #0 ]
add AO1, AO1, #8
fstmiad BO!, { d0 - d1 }
vstmia.f64 BO!, { d0 - d1 }
add AO2, AO2, #8
.endm
@@ -95,7 +95,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0 , [ AO1, #0 ]
fldd d1 , [ AO1, #8 ]
fstmiad BO!, { d0 - d1 }
vstmia.f64 BO!, { d0 - d1 }
add AO1, AO1, #16
.endm
@@ -105,7 +105,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0 , [ AO1, #0 ]
fstmiad BO!, { d0 }
vstmia.f64 BO!, { d0 }
add AO1, AO1, #8
.endm

View File

@@ -105,10 +105,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d11, [ AO4, #16 ]
fldd d15, [ AO4, #24 ]
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO4, AO4, #32
fstmiad BO!, { d4 - d7 }
fstmiad BO!, { d8 - d15 }
vstmia.f64 BO!, { d4 - d7 }
vstmia.f64 BO!, { d8 - d15 }
.endm
@@ -122,7 +122,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d3 , [ AO4, #0 ]
add AO3, AO3, #8
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO4, AO4, #8
.endm
@@ -140,7 +140,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d5 , [ AO2, #16 ]
fldd d7 , [ AO2, #24 ]
fstmiad BO!, { d0 - d7 }
vstmia.f64 BO!, { d0 - d7 }
add AO2, AO2, #32
.endm
@@ -152,7 +152,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d1 , [ AO2, #0 ]
add AO1, AO1, #8
fstmiad BO!, { d0 - d1 }
vstmia.f64 BO!, { d0 - d1 }
add AO2, AO2, #8
.endm
@@ -164,7 +164,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d2 , [ AO1, #16 ]
fldd d3 , [ AO1, #24 ]
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO1, AO1, #32
.endm
@@ -174,7 +174,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0 , [ AO1, #0 ]
fstmiad BO!, { d0 }
vstmia.f64 BO!, { d0 }
add AO1, AO1, #8
.endm

View File

@@ -76,21 +76,21 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x4
pld [ AO1, #A_PRE ]
fldmiad AO1, { d0 - d3 }
vldmia.f64 AO1, { d0 - d3 }
add r3, AO1, LDA
pld [ r3, #A_PRE ]
fldmiad r3, { d4 - d7 }
vldmia.f64 r3, { d4 - d7 }
add r3, r3, LDA
pld [ r3, #A_PRE ]
fldmiad r3, { d8 - d11 }
vldmia.f64 r3, { d8 - d11 }
add r3, r3, LDA
pld [ r3, #A_PRE ]
fldmiad r3, { d12 - d15 }
vldmia.f64 r3, { d12 - d15 }
fstmiad BO1, { d0 - d15 }
vstmia.f64 BO1, { d0 - d15 }
add AO1, AO1, #32
add BO1, BO1, M4
@@ -98,18 +98,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x4
fldmiad AO1, { d0 - d1 }
vldmia.f64 AO1, { d0 - d1 }
add r3, AO1, LDA
fldmiad r3, { d2 - d3 }
vldmia.f64 r3, { d2 - d3 }
add r3, r3, LDA
fldmiad r3, { d4 - d5 }
vldmia.f64 r3, { d4 - d5 }
add r3, r3, LDA
fldmiad r3, { d6 - d7 }
vldmia.f64 r3, { d6 - d7 }
fstmiad BO2, { d0 - d7 }
vstmia.f64 BO2, { d0 - d7 }
add AO1, AO1, #16
add BO2, BO2, #64
@@ -117,18 +117,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x4
fldmiad AO1, { d0 }
vldmia.f64 AO1, { d0 }
add r3, AO1, LDA
fldmiad r3, { d1 }
vldmia.f64 r3, { d1 }
add r3, r3, LDA
fldmiad r3, { d2 }
vldmia.f64 r3, { d2 }
add r3, r3, LDA
fldmiad r3, { d3 }
vldmia.f64 r3, { d3 }
fstmiad BO3, { d0 - d3 }
vstmia.f64 BO3, { d0 - d3 }
add AO1, AO1, #8
add BO3, BO3, #32
@@ -139,13 +139,13 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x2
pld [ AO1, #A_PRE ]
fldmiad AO1, { d0 - d3 }
vldmia.f64 AO1, { d0 - d3 }
add r3, AO1, LDA
pld [ r3, #A_PRE ]
fldmiad r3, { d4 - d7 }
vldmia.f64 r3, { d4 - d7 }
fstmiad BO1, { d0 - d7 }
vstmia.f64 BO1, { d0 - d7 }
add AO1, AO1, #32
add BO1, BO1, M4
@@ -153,12 +153,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x2
fldmiad AO1, { d0 - d1 }
vldmia.f64 AO1, { d0 - d1 }
add r3, AO1, LDA
fldmiad r3, { d2 - d3 }
vldmia.f64 r3, { d2 - d3 }
fstmiad BO2, { d0 - d3 }
vstmia.f64 BO2, { d0 - d3 }
add AO1, AO1, #16
add BO2, BO2, #32
@@ -166,12 +166,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x2
fldmiad AO1, { d0 }
vldmia.f64 AO1, { d0 }
add r3, AO1, LDA
fldmiad r3, { d1 }
vldmia.f64 r3, { d1 }
fstmiad BO3, { d0 - d1 }
vstmia.f64 BO3, { d0 - d1 }
add AO1, AO1, #8
add BO3, BO3, #16
@@ -182,9 +182,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x1
pld [ AO1, #A_PRE ]
fldmiad AO1, { d0 - d3 }
vldmia.f64 AO1, { d0 - d3 }
fstmiad BO1, { d0 - d3 }
vstmia.f64 BO1, { d0 - d3 }
add AO1, AO1, #32
add BO1, BO1, M4
@@ -192,9 +192,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x1
fldmiad AO1, { d0 - d1 }
vldmia.f64 AO1, { d0 - d1 }
fstmiad BO2, { d0 - d1 }
vstmia.f64 BO2, { d0 - d1 }
add AO1, AO1, #16
add BO2, BO2, #16
@@ -202,9 +202,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x1
fldmiad AO1, { d0 }
vldmia.f64 AO1, { d0 }
fstmiad BO3, { d0 }
vstmia.f64 BO3, { d0 }
add AO1, AO1, #8
add BO3, BO3, #8

View File

@@ -128,10 +128,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d8 , [ BO ]
pld [ AO , #A_PRE ]
fldmiad AO!, { d0 - d1}
vldmia.f64 AO!, { d0 - d1}
fmuld d16 , d0, d8
fldmiad AO!, { d2 - d3}
vldmia.f64 AO!, { d2 - d3}
fmuld d17 , d1, d8
fldd d9 , [ BO, #8 ]
fmuld d18 , d2, d8
@@ -148,10 +148,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fmuld d23 , d3, d9
fmuld d24 , d0, d10
fldmiad AO!, { d4 - d5 }
vldmia.f64 AO!, { d4 - d5 }
fmuld d25 , d1, d10
fmuld d26 , d2, d10
fldmiad AO!, { d6 - d7 }
vldmia.f64 AO!, { d6 - d7 }
fmuld d27 , d3, d10
fldd d13, [ BO, #8 ]
@@ -173,10 +173,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d8 , [ BO ]
pld [ AO , #A_PRE ]
fldmiad AO!, { d0 - d1}
vldmia.f64 AO!, { d0 - d1}
fmacd d16 , d0, d8
fldmiad AO!, { d2 - d3}
vldmia.f64 AO!, { d2 - d3}
fmacd d17 , d1, d8
fldd d9 , [ BO, #8 ]
fmacd d18 , d2, d8
@@ -193,10 +193,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fmacd d23 , d3, d9
fmacd d24 , d0, d10
fldmiad AO!, { d4 - d5 }
vldmia.f64 AO!, { d4 - d5 }
fmacd d25 , d1, d10
fmacd d26 , d2, d10
fldmiad AO!, { d6 - d7 }
vldmia.f64 AO!, { d6 - d7 }
fmacd d27 , d3, d10
fldd d13, [ BO, #8 ]
@@ -225,11 +225,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d8 , [ BO ]
fmacd d21 , d5, d13
fmacd d22 , d6, d13
fldmiad AO!, { d0 - d1 }
vldmia.f64 AO!, { d0 - d1 }
fmacd d23 , d7, d13
fmacd d24 , d4, d14
fldmiad AO!, { d2 - d3 }
vldmia.f64 AO!, { d2 - d3 }
fmacd d25 , d5, d14
fldd d9 , [ BO, #8 ]
fmacd d26 , d6, d14
@@ -257,10 +257,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fmacd d19 , d3, d8
fmacd d20 , d0, d9
fldmiad AO!, { d4 - d5 }
vldmia.f64 AO!, { d4 - d5 }
fmacd d21 , d1, d9
fmacd d22 , d2, d9
fldmiad AO!, { d6 - d7 }
vldmia.f64 AO!, { d6 - d7 }
fmacd d23 , d3, d9
fmacd d24 , d0, d10
@@ -390,7 +390,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fstd d11, [r4 , #24 ]
fmuld d15, d0 , d31
fstmiad CO2, { d12 - d15 }
vstmia.f64 CO2, { d12 - d15 }
add CO1, CO1, #32

View File

@@ -139,8 +139,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F8X1
pld [ AO2 , #A_PRE ]
fldmiad XO! , { d2 }
fldmiad AO1 , { d4 - d7 }
vldmia.f64 XO! , { d2 }
vldmia.f64 AO1 , { d4 - d7 }
vmla.f64 d8 , d2 , d4
pld [ AO2 , #4*SIZE ]
@@ -150,7 +150,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
vmla.f64 d11 , d2 , d7
fldmiad r3 , { d4 - d7 }
vldmia.f64 r3 , { d4 - d7 }
vmla.f64 d12 , d2 , d4
vmla.f64 d13 , d2 , d5
@@ -164,23 +164,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F8
fldmiad YO, { d4 - d7 }
vldmia.f64 YO, { d4 - d7 }
vmla.f64 d4 , d0, d8
vmla.f64 d5 , d0, d9
vmla.f64 d6 , d0, d10
vmla.f64 d7 , d0, d11
fstmiad YO!, { d4 - d7 }
vstmia.f64 YO!, { d4 - d7 }
fldmiad YO, { d4 - d7 }
vldmia.f64 YO, { d4 - d7 }
vmla.f64 d4 , d0, d12
vmla.f64 d5 , d0, d13
vmla.f64 d6 , d0, d14
vmla.f64 d7 , d0, d15
fstmiad YO!, { d4 - d7 }
vstmia.f64 YO!, { d4 - d7 }
.endm
@@ -195,8 +195,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmiad XO! , { d2 }
fldmiad AO1 , { d8 }
vldmia.f64 XO! , { d2 }
vldmia.f64 AO1 , { d8 }
vmla.f64 d12 , d2 , d8
add AO1, AO1, LDA
@@ -204,9 +204,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F1
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4, d0, d12
fstmiad YO!, { d4 }
vstmia.f64 YO!, { d4 }
.endm
@@ -234,8 +234,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4X1
pld [ AO2 , #A_PRE ]
fldmiad XO , { d2 }
fldmiad AO1 , { d8 - d11 }
vldmia.f64 XO , { d2 }
vldmia.f64 AO1 , { d8 - d11 }
vmla.f64 d12 , d2 , d8
add AO1, AO1, LDA
@@ -249,24 +249,24 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S4
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4 , d0, d12
fstmiad YO, { d4 }
vstmia.f64 YO, { d4 }
add YO, YO, INC_Y
fldmiad YO, { d5 }
vldmia.f64 YO, { d5 }
vmla.f64 d5 , d0, d13
fstmiad YO, { d5 }
vstmia.f64 YO, { d5 }
add YO, YO, INC_Y
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4 , d0, d14
fstmiad YO, { d4 }
vstmia.f64 YO, { d4 }
add YO, YO, INC_Y
fldmiad YO, { d5 }
vldmia.f64 YO, { d5 }
vmla.f64 d5 , d0, d15
fstmiad YO, { d5 }
vstmia.f64 YO, { d5 }
add YO, YO, INC_Y
.endm
@@ -282,8 +282,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmiad XO , { d2 }
fldmiad AO1 , { d8 }
vldmia.f64 XO , { d2 }
vldmia.f64 AO1 , { d8 }
vmla.f64 d12 , d2 , d8
add AO1, AO1, LDA
add XO, XO , INC_X
@@ -292,9 +292,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4, d0, d12
fstmiad YO , { d4 }
vstmia.f64 YO , { d4 }
add YO, YO, INC_Y
.endm
@@ -338,8 +338,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F8X1
pld [ AO2, #A_PRE ]
fldmias XO! , { s2 }
fldmias AO1 , { s4 - s7 }
vldmia.f32 XO! , { s2 }
vldmia.f32 AO1 , { s4 - s7 }
vmla.f32 s8 , s2 , s4
vmla.f32 s9 , s2 , s5
@@ -348,7 +348,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add r3, AO1, #4*SIZE
fldmias r3 , { s4 - s7 }
vldmia.f32 r3 , { s4 - s7 }
vmla.f32 s12 , s2 , s4
vmla.f32 s13 , s2 , s5
@@ -362,24 +362,24 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F8
fldmias YO, { s4 - s7 }
vldmia.f32 YO, { s4 - s7 }
vmla.f32 s4 , s0, s8
vmla.f32 s5 , s0, s9
vmla.f32 s6 , s0, s10
vmla.f32 s7 , s0, s11
fstmias YO!, { s4 - s7 }
vstmia.f32 YO!, { s4 - s7 }
fldmias YO, { s4 - s7 }
vldmia.f32 YO, { s4 - s7 }
vmla.f32 s4 , s0, s12
vmla.f32 s5 , s0, s13
vmla.f32 s6 , s0, s14
vmla.f32 s7 , s0, s15
fstmias YO!, { s4 - s7 }
vstmia.f32 YO!, { s4 - s7 }
.endm
@@ -394,8 +394,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmias XO! , { s2 }
fldmias AO1 , { s8 }
vldmia.f32 XO! , { s2 }
vldmia.f32 AO1 , { s8 }
vmla.f32 s12 , s2 , s8
add AO1, AO1, LDA
@@ -403,9 +403,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F1
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4, s0, s12
fstmias YO!, { s4 }
vstmia.f32 YO!, { s4 }
.endm
@@ -434,8 +434,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4X1
fldmias XO , { s2 }
fldmias AO1 , { s8 - s11 }
vldmia.f32 XO , { s2 }
vldmia.f32 AO1 , { s8 - s11 }
vmla.f32 s12 , s2 , s8
vmla.f32 s13 , s2 , s9
@@ -449,24 +449,24 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S4
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4 , s0, s12
fstmias YO, { s4 }
vstmia.f32 YO, { s4 }
add YO, YO, INC_Y
fldmias YO, { s5 }
vldmia.f32 YO, { s5 }
vmla.f32 s5 , s0, s13
fstmias YO, { s5 }
vstmia.f32 YO, { s5 }
add YO, YO, INC_Y
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4 , s0, s14
fstmias YO, { s4 }
vstmia.f32 YO, { s4 }
add YO, YO, INC_Y
fldmias YO, { s5 }
vldmia.f32 YO, { s5 }
vmla.f32 s5 , s0, s15
fstmias YO, { s5 }
vstmia.f32 YO, { s5 }
add YO, YO, INC_Y
.endm
@@ -482,8 +482,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmias XO , { s2 }
fldmias AO1 , { s8 }
vldmia.f32 XO , { s2 }
vldmia.f32 AO1 , { s8 }
vmla.f32 s12 , s2 , s8
add AO1, AO1, LDA
add XO, XO , INC_X
@@ -492,9 +492,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4, s0, s12
fstmias YO , { s4 }
vstmia.f32 YO , { s4 }
add YO, YO, INC_Y
.endm

View File

@@ -138,8 +138,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F8X1
fldmiad XO! , { d4 }
fldmiad AO1 , { d8 - d15 }
vldmia.f64 XO! , { d4 }
vldmia.f64 AO1 , { d8 - d15 }
vmla.f64 d24 , d4 , d8
pld [ AO2 , #A_PRE ]
@@ -158,7 +158,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F8
fldmiad YO, { d16 - d23 }
vldmia.f64 YO, { d16 - d23 }
vmla.f64 d16, d0, d24
vmla.f64 d17, d0, d25
@@ -169,7 +169,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
vmla.f64 d22, d0, d30
vmla.f64 d23, d0, d31
fstmiad YO!, { d16 - d23 }
vstmia.f64 YO!, { d16 - d23 }
.endm
@@ -184,8 +184,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmiad XO! , { d4 }
fldmiad AO1 , { d8 }
vldmia.f64 XO! , { d4 }
vldmia.f64 AO1 , { d8 }
vmla.f64 d24 , d4 , d8
add AO1, AO1, LDA
@@ -193,9 +193,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F1
fldmiad YO, { d16 }
vldmia.f64 YO, { d16 }
vmla.f64 d16, d0, d24
fstmiad YO!, { d16 }
vstmia.f64 YO!, { d16 }
.endm
@@ -234,8 +234,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ AO2 , #A_PRE ]
pld [ AO2 , #A_PRE+32 ]
fldmiad XO , { d4 }
fldmiad AO1 , { d8 - d15 }
vldmia.f64 XO , { d4 }
vldmia.f64 AO1 , { d8 - d15 }
vmla.f64 d24 , d4 , d8
vmla.f64 d25 , d4 , d9
@@ -253,44 +253,44 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S8
fldmiad YO, { d16 }
vldmia.f64 YO, { d16 }
vmla.f64 d16, d0, d24
fstmiad YO, { d16 }
vstmia.f64 YO, { d16 }
add YO, YO, INC_Y
fldmiad YO, { d17 }
vldmia.f64 YO, { d17 }
vmla.f64 d17, d0, d25
fstmiad YO, { d17 }
vstmia.f64 YO, { d17 }
add YO, YO, INC_Y
fldmiad YO, { d18 }
vldmia.f64 YO, { d18 }
vmla.f64 d18, d0, d26
fstmiad YO, { d18 }
vstmia.f64 YO, { d18 }
add YO, YO, INC_Y
fldmiad YO, { d19 }
vldmia.f64 YO, { d19 }
vmla.f64 d19, d0, d27
fstmiad YO, { d19 }
vstmia.f64 YO, { d19 }
add YO, YO, INC_Y
fldmiad YO, { d20 }
vldmia.f64 YO, { d20 }
vmla.f64 d20, d0, d28
fstmiad YO, { d20 }
vstmia.f64 YO, { d20 }
add YO, YO, INC_Y
fldmiad YO, { d21 }
vldmia.f64 YO, { d21 }
vmla.f64 d21, d0, d29
fstmiad YO, { d21 }
vstmia.f64 YO, { d21 }
add YO, YO, INC_Y
fldmiad YO, { d22 }
vldmia.f64 YO, { d22 }
vmla.f64 d22, d0, d30
fstmiad YO, { d22 }
vstmia.f64 YO, { d22 }
add YO, YO, INC_Y
fldmiad YO, { d23 }
vldmia.f64 YO, { d23 }
vmla.f64 d23, d0, d31
fstmiad YO, { d23 }
vstmia.f64 YO, { d23 }
add YO, YO, INC_Y
.endm
@@ -306,8 +306,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmiad XO , { d4 }
fldmiad AO1 , { d8 }
vldmia.f64 XO , { d4 }
vldmia.f64 AO1 , { d8 }
vmla.f64 d24 , d4 , d8
add AO1, AO1, LDA
add XO, XO, INC_X
@@ -316,9 +316,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmiad YO, { d16 }
vldmia.f64 YO, { d16 }
vmla.f64 d16, d0, d24
fstmiad YO, { d16 }
vstmia.f64 YO, { d16 }
add YO, YO, INC_Y
.endm
@@ -361,8 +361,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F8X1
pld [ AO2 , #A_PRE ]
fldmias XO! , { s4 }
fldmias AO1 , { s8 - s15 }
vldmia.f32 XO! , { s4 }
vldmia.f32 AO1 , { s8 - s15 }
vmla.f32 s24 , s4 , s8
vmla.f32 s25 , s4 , s9
@@ -379,7 +379,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F8
fldmias YO, { s16 - s23 }
vldmia.f32 YO, { s16 - s23 }
vmla.f32 s16, s0, s24
vmla.f32 s17, s0, s25
@@ -390,7 +390,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
vmla.f32 s22, s0, s30
vmla.f32 s23, s0, s31
fstmias YO!, { s16 - s23 }
vstmia.f32 YO!, { s16 - s23 }
.endm
@@ -405,8 +405,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmias XO! , { s4 }
fldmias AO1 , { s8 }
vldmia.f32 XO! , { s4 }
vldmia.f32 AO1 , { s8 }
vmla.f32 s24 , s4 , s8
add AO1, AO1, LDA
@@ -414,9 +414,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F1
fldmias YO, { s16 }
vldmia.f32 YO, { s16 }
vmla.f32 s16, s0, s24
fstmias YO!, { s16 }
vstmia.f32 YO!, { s16 }
.endm
@@ -454,8 +454,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S8X1
pld [ AO2 , #A_PRE ]
fldmias XO , { s4 }
fldmias AO1 , { s8 - s15 }
vldmia.f32 XO , { s4 }
vldmia.f32 AO1 , { s8 - s15 }
vmla.f32 s24 , s4 , s8
vmla.f32 s25 , s4 , s9
@@ -473,44 +473,44 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S8
fldmias YO, { s16 }
vldmia.f32 YO, { s16 }
vmla.f32 s16, s0, s24
fstmias YO, { s16 }
vstmia.f32 YO, { s16 }
add YO, YO, INC_Y
fldmias YO, { s17 }
vldmia.f32 YO, { s17 }
vmla.f32 s17, s0, s25
fstmias YO, { s17 }
vstmia.f32 YO, { s17 }
add YO, YO, INC_Y
fldmias YO, { s18 }
vldmia.f32 YO, { s18 }
vmla.f32 s18, s0, s26
fstmias YO, { s18 }
vstmia.f32 YO, { s18 }
add YO, YO, INC_Y
fldmias YO, { s19 }
vldmia.f32 YO, { s19 }
vmla.f32 s19, s0, s27
fstmias YO, { s19 }
vstmia.f32 YO, { s19 }
add YO, YO, INC_Y
fldmias YO, { s20 }
vldmia.f32 YO, { s20 }
vmla.f32 s20, s0, s28
fstmias YO, { s20 }
vstmia.f32 YO, { s20 }
add YO, YO, INC_Y
fldmias YO, { s21 }
vldmia.f32 YO, { s21 }
vmla.f32 s21, s0, s29
fstmias YO, { s21 }
vstmia.f32 YO, { s21 }
add YO, YO, INC_Y
fldmias YO, { s22 }
vldmia.f32 YO, { s22 }
vmla.f32 s22, s0, s30
fstmias YO, { s22 }
vstmia.f32 YO, { s22 }
add YO, YO, INC_Y
fldmias YO, { s23 }
vldmia.f32 YO, { s23 }
vmla.f32 s23, s0, s31
fstmias YO, { s23 }
vstmia.f32 YO, { s23 }
add YO, YO, INC_Y
.endm
@@ -526,8 +526,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmias XO , { s4 }
fldmias AO1 , { s8 }
vldmia.f32 XO , { s4 }
vldmia.f32 AO1 , { s8 }
vmla.f32 s24 , s4 , s8
add AO1, AO1, LDA
add XO, XO, INC_X
@@ -536,9 +536,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmias YO, { s16 }
vldmia.f32 YO, { s16 }
vmla.f32 s16, s0, s24
fstmias YO, { s16 }
vstmia.f32 YO, { s16 }
add YO, YO, INC_Y
.endm

View File

@@ -112,13 +112,13 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X4
pld [ XO , #X_PRE ]
fldmiad XO! , { d12 - d15 }
vldmia.f64 XO! , { d12 - d15 }
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
pld [ AO2 , #A_PRE ]
fldmiad AO2!, { d4 - d5 }
fldmiad AO1!, { d10 - d11 }
fldmiad AO2!, { d6 - d7 }
vldmia.f64 AO2!, { d4 - d5 }
vldmia.f64 AO1!, { d10 - d11 }
vldmia.f64 AO2!, { d6 - d7 }
vmla.f64 d2 , d12 , d8
vmla.f64 d3 , d12 , d4
@@ -133,9 +133,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X1
fldmiad XO! , { d1 }
fldmiad AO1!, { d8 }
fldmiad AO2!, { d4 }
vldmia.f64 XO! , { d1 }
vldmia.f64 AO1!, { d8 }
vldmia.f64 AO2!, { d4 }
vmla.f64 d2 , d1 , d8
vmla.f64 d3 , d1 , d4
@@ -143,10 +143,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F2
fldmiad YO, { d4 - d5 }
vldmia.f64 YO, { d4 - d5 }
vmla.f64 d4, d0, d2
vmla.f64 d5, d0, d3
fstmiad YO!, { d4 - d5 }
vstmia.f64 YO!, { d4 - d5 }
.endm
@@ -160,10 +160,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X4
pld [ XO , #X_PRE ]
fldmiad XO! , { d12 - d15 }
vldmia.f64 XO! , { d12 - d15 }
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d10 - d11 }
vmla.f64 d2 , d12 , d8
vmla.f64 d2 , d13 , d9
vmla.f64 d2 , d14, d10
@@ -173,17 +173,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmiad XO! , { d1 }
fldmiad AO1!, { d8 }
vldmia.f64 XO! , { d1 }
vldmia.f64 AO1!, { d8 }
vmla.f64 d2 , d1 , d8
.endm
.macro SAVE_F1
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4, d0, d2
fstmiad YO!, { d4 }
vstmia.f64 YO!, { d4 }
.endm
@@ -197,23 +197,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X4
fldmiad XO , { d12 }
vldmia.f64 XO , { d12 }
add XO, XO, INC_X
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
pld [ AO2 , #A_PRE ]
fldmiad AO2!, { d4 - d5 }
vldmia.f64 AO2!, { d4 - d5 }
fldmiad XO , { d13 }
vldmia.f64 XO , { d13 }
add XO, XO, INC_X
fldmiad AO1!, { d10 - d11 }
fldmiad AO2!, { d6 - d7 }
vldmia.f64 AO1!, { d10 - d11 }
vldmia.f64 AO2!, { d6 - d7 }
fldmiad XO , { d14 }
vldmia.f64 XO , { d14 }
add XO, XO, INC_X
fldmiad XO , { d15 }
vldmia.f64 XO , { d15 }
add XO, XO, INC_X
vmla.f64 d2 , d12 , d8
@@ -229,9 +229,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X1
fldmiad XO , { d1 }
fldmiad AO1!, { d8 }
fldmiad AO2!, { d4 }
vldmia.f64 XO , { d1 }
vldmia.f64 AO1!, { d8 }
vldmia.f64 AO2!, { d4 }
vmla.f64 d2 , d1 , d8
add XO, XO, INC_X
vmla.f64 d3 , d1 , d4
@@ -240,14 +240,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S2
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4, d0, d2
fstmiad YO, { d4 }
vstmia.f64 YO, { d4 }
add YO, YO, INC_Y
fldmiad YO, { d5 }
vldmia.f64 YO, { d5 }
vmla.f64 d5, d0, d3
fstmiad YO, { d5 }
vstmia.f64 YO, { d5 }
add YO, YO, INC_Y
.endm
@@ -261,20 +261,20 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X4
fldmiad XO , { d12 }
vldmia.f64 XO , { d12 }
add XO, XO, INC_X
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
fldmiad XO , { d13 }
vldmia.f64 XO , { d13 }
add XO, XO, INC_X
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d10 - d11 }
fldmiad XO , { d14 }
vldmia.f64 XO , { d14 }
add XO, XO, INC_X
fldmiad XO , { d15 }
vldmia.f64 XO , { d15 }
add XO, XO, INC_X
vmla.f64 d2 , d12 , d8
@@ -286,8 +286,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmiad XO , { d1 }
fldmiad AO1!, { d8 }
vldmia.f64 XO , { d1 }
vldmia.f64 AO1!, { d8 }
vmla.f64 d2 , d1 , d8
add XO, XO, INC_X
@@ -295,9 +295,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmiad YO, { d4 }
vldmia.f64 YO, { d4 }
vmla.f64 d4, d0, d2
fstmiad YO, { d4 }
vstmia.f64 YO, { d4 }
add YO, YO, INC_Y
.endm
@@ -315,11 +315,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X4
fldmias XO! , { s12 - s15 }
fldmias AO1!, { s8 - s9 }
fldmias AO2!, { s4 - s5 }
fldmias AO1!, { s10 - s11 }
fldmias AO2!, { s6 - s7 }
vldmia.f32 XO! , { s12 - s15 }
vldmia.f32 AO1!, { s8 - s9 }
vldmia.f32 AO2!, { s4 - s5 }
vldmia.f32 AO1!, { s10 - s11 }
vldmia.f32 AO2!, { s6 - s7 }
vmla.f32 s2 , s12 , s8
vmla.f32 s3 , s12 , s4
@@ -334,9 +334,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X1
fldmias XO! , { s1 }
fldmias AO1!, { s8 }
fldmias AO2!, { s4 }
vldmia.f32 XO! , { s1 }
vldmia.f32 AO1!, { s8 }
vldmia.f32 AO2!, { s4 }
vmla.f32 s2 , s1 , s8
vmla.f32 s3 , s1 , s4
@@ -344,10 +344,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F2
fldmias YO, { s4 - s5 }
vldmia.f32 YO, { s4 - s5 }
vmla.f32 s4, s0, s2
vmla.f32 s5, s0, s3
fstmias YO!, { s4 - s5 }
vstmia.f32 YO!, { s4 - s5 }
.endm
@@ -359,9 +359,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X4
fldmias XO! , { s12 - s15 }
fldmias AO1!, { s8 - s9 }
fldmias AO1!, { s10 - s11 }
vldmia.f32 XO! , { s12 - s15 }
vldmia.f32 AO1!, { s8 - s9 }
vldmia.f32 AO1!, { s10 - s11 }
vmla.f32 s2 , s12 , s8
vmla.f32 s2 , s13 , s9
vmla.f32 s2 , s14, s10
@@ -371,17 +371,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmias XO! , { s1 }
fldmias AO1!, { s8 }
vldmia.f32 XO! , { s1 }
vldmia.f32 AO1!, { s8 }
vmla.f32 s2 , s1 , s8
.endm
.macro SAVE_F1
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4, s0, s2
fstmias YO!, { s4 }
vstmia.f32 YO!, { s4 }
.endm
@@ -395,21 +395,21 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X4
fldmias XO , { s12 }
vldmia.f32 XO , { s12 }
add XO, XO, INC_X
fldmias AO1!, { s8 - s9 }
fldmias AO2!, { s4 - s5 }
vldmia.f32 AO1!, { s8 - s9 }
vldmia.f32 AO2!, { s4 - s5 }
fldmias XO , { s13 }
vldmia.f32 XO , { s13 }
add XO, XO, INC_X
fldmias AO1!, { s10 - s11 }
fldmias AO2!, { s6 - s7 }
vldmia.f32 AO1!, { s10 - s11 }
vldmia.f32 AO2!, { s6 - s7 }
fldmias XO , { s14 }
vldmia.f32 XO , { s14 }
add XO, XO, INC_X
fldmias XO , { s15 }
vldmia.f32 XO , { s15 }
add XO, XO, INC_X
vmla.f32 s2 , s12 , s8
@@ -425,9 +425,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X1
fldmias XO , { s1 }
fldmias AO1!, { s8 }
fldmias AO2!, { s4 }
vldmia.f32 XO , { s1 }
vldmia.f32 AO1!, { s8 }
vldmia.f32 AO2!, { s4 }
vmla.f32 s2 , s1 , s8
add XO, XO, INC_X
vmla.f32 s3 , s1 , s4
@@ -436,14 +436,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S2
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4, s0, s2
fstmias YO, { s4 }
vstmia.f32 YO, { s4 }
add YO, YO, INC_Y
fldmias YO, { s5 }
vldmia.f32 YO, { s5 }
vmla.f32 s5, s0, s3
fstmias YO, { s5 }
vstmia.f32 YO, { s5 }
add YO, YO, INC_Y
.endm
@@ -456,20 +456,20 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X4
fldmias XO , { s12 }
vldmia.f32 XO , { s12 }
add XO, XO, INC_X
pld [ AO1 , #A_PRE ]
fldmias AO1!, { s8 - s9 }
vldmia.f32 AO1!, { s8 - s9 }
fldmias XO , { s13 }
vldmia.f32 XO , { s13 }
add XO, XO, INC_X
fldmias AO1!, { s10 - s11 }
vldmia.f32 AO1!, { s10 - s11 }
fldmias XO , { s14 }
vldmia.f32 XO , { s14 }
add XO, XO, INC_X
fldmias XO , { s15 }
vldmia.f32 XO , { s15 }
add XO, XO, INC_X
vmla.f32 s2 , s12 , s8
@@ -481,8 +481,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmias XO , { s1 }
fldmias AO1!, { s8 }
vldmia.f32 XO , { s1 }
vldmia.f32 AO1!, { s8 }
vmla.f32 s2 , s1 , s8
add XO, XO, INC_X
@@ -490,9 +490,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmias YO, { s4 }
vldmia.f32 YO, { s4 }
vmla.f32 s4, s0, s2
fstmias YO, { s4 }
vstmia.f32 YO, { s4 }
add YO, YO, INC_Y
.endm

View File

@@ -108,17 +108,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X4
pld [ XO , #X_PRE ]
fldmiad XO! , { d28 - d31 }
vldmia.f64 XO! , { d28 - d31 }
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
pld [ AO2 , #A_PRE ]
fldmiad AO2!, { d16 - d17 }
vldmia.f64 AO2!, { d16 - d17 }
vmla.f64 d4 , d28 , d8
vmla.f64 d5 , d28 , d16
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d10 - d11 }
vmla.f64 d4 , d29 , d9
vmla.f64 d5 , d29 , d17
fldmiad AO2!, { d18 - d19 }
vldmia.f64 AO2!, { d18 - d19 }
vmla.f64 d4 , d30, d10
vmla.f64 d5 , d30, d18
vmla.f64 d4 , d31, d11
@@ -129,9 +129,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X1
fldmiad XO! , { d2 }
fldmiad AO1!, { d8 }
fldmiad AO2!, { d16 }
vldmia.f64 XO! , { d2 }
vldmia.f64 AO1!, { d8 }
vldmia.f64 AO2!, { d16 }
vmla.f64 d4 , d2 , d8
vmla.f64 d5 , d2 , d16
@@ -139,10 +139,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F2
fldmiad YO, { d24 - d25 }
vldmia.f64 YO, { d24 - d25 }
vmla.f64 d24, d0, d4
vmla.f64 d25, d0, d5
fstmiad YO!, { d24 - d25 }
vstmia.f64 YO!, { d24 - d25 }
.endm
@@ -156,23 +156,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X4
pld [ AO1 , #A_PRE ]
fldmiad XO , { d28 }
vldmia.f64 XO , { d28 }
add XO, XO, INC_X
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
pld [ AO2 , #A_PRE ]
fldmiad AO2!, { d16 - d17 }
vldmia.f64 AO2!, { d16 - d17 }
vmla.f64 d4 , d28 , d8
fldmiad XO , { d29 }
vldmia.f64 XO , { d29 }
add XO, XO, INC_X
vmla.f64 d5 , d28 , d16
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d10 - d11 }
vmla.f64 d4 , d29 , d9
fldmiad XO , { d30 }
vldmia.f64 XO , { d30 }
add XO, XO, INC_X
vmla.f64 d5 , d29 , d17
fldmiad AO2!, { d18 - d19 }
vldmia.f64 AO2!, { d18 - d19 }
vmla.f64 d4 , d30, d10
fldmiad XO , { d31 }
vldmia.f64 XO , { d31 }
add XO, XO, INC_X
vmla.f64 d5 , d30, d18
vmla.f64 d4 , d31, d11
@@ -183,10 +183,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X1
fldmiad XO , { d2 }
fldmiad AO1!, { d8 }
vldmia.f64 XO , { d2 }
vldmia.f64 AO1!, { d8 }
add XO, XO, INC_X
fldmiad AO2!, { d16 }
vldmia.f64 AO2!, { d16 }
vmla.f64 d4 , d2 , d8
vmla.f64 d5 , d2 , d16
@@ -194,14 +194,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S2
fldmiad YO, { d24 }
vldmia.f64 YO, { d24 }
vmla.f64 d24, d0, d4
fstmiad YO, { d24 }
vstmia.f64 YO, { d24 }
add YO, YO, INC_Y
fldmiad YO, { d24 }
vldmia.f64 YO, { d24 }
vmla.f64 d24, d0, d5
fstmiad YO, { d24 }
vstmia.f64 YO, { d24 }
add YO, YO, INC_Y
.endm
@@ -215,11 +215,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X4
pld [ XO , #X_PRE ]
fldmiad XO! , { d28 - d31 }
vldmia.f64 XO! , { d28 - d31 }
pld [ AO1 , #A_PRE ]
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
vmla.f64 d4 , d28 , d8
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d10 - d11 }
vmla.f64 d4 , d29 , d9
vmla.f64 d4 , d30, d10
vmla.f64 d4 , d31, d11
@@ -229,17 +229,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmiad XO! , { d2 }
fldmiad AO1!, { d8 }
vldmia.f64 XO! , { d2 }
vldmia.f64 AO1!, { d8 }
vmla.f64 d4 , d2 , d8
.endm
.macro SAVE_F1
fldmiad YO, { d24 }
vldmia.f64 YO, { d24 }
vmla.f64 d24, d0, d4
fstmiad YO!, { d24 }
vstmia.f64 YO!, { d24 }
.endm
@@ -252,18 +252,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X4
pld [ AO1 , #A_PRE ]
fldmiad XO , { d28 }
vldmia.f64 XO , { d28 }
add XO, XO, INC_X
fldmiad AO1!, { d8 - d9 }
vldmia.f64 AO1!, { d8 - d9 }
vmla.f64 d4 , d28 , d8
fldmiad XO , { d29 }
vldmia.f64 XO , { d29 }
add XO, XO, INC_X
fldmiad AO1!, { d10 - d11 }
vldmia.f64 AO1!, { d10 - d11 }
vmla.f64 d4 , d29 , d9
fldmiad XO , { d30 }
vldmia.f64 XO , { d30 }
add XO, XO, INC_X
vmla.f64 d4 , d30, d10
fldmiad XO , { d31 }
vldmia.f64 XO , { d31 }
add XO, XO, INC_X
vmla.f64 d4 , d31, d11
@@ -272,8 +272,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmiad XO , { d2 }
fldmiad AO1!, { d8 }
vldmia.f64 XO , { d2 }
vldmia.f64 AO1!, { d8 }
add XO, XO, INC_X
vmla.f64 d4 , d2 , d8
@@ -281,9 +281,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmiad YO, { d24 }
vldmia.f64 YO, { d24 }
vmla.f64 d24, d0, d4
fstmiad YO, { d24 }
vstmia.f64 YO, { d24 }
add YO, YO, INC_Y
.endm
@@ -300,15 +300,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X4
fldmias XO! , { s28 - s31 }
fldmias AO1!, { s8 - s9 }
fldmias AO2!, { s16 - s17 }
vldmia.f32 XO! , { s28 - s31 }
vldmia.f32 AO1!, { s8 - s9 }
vldmia.f32 AO2!, { s16 - s17 }
vmla.f32 s4 , s28 , s8
vmla.f32 s5 , s28 , s16
fldmias AO1!, { s10 - s11 }
vldmia.f32 AO1!, { s10 - s11 }
vmla.f32 s4 , s29 , s9
vmla.f32 s5 , s29 , s17
fldmias AO2!, { s18 - s19 }
vldmia.f32 AO2!, { s18 - s19 }
vmla.f32 s4 , s30, s10
vmla.f32 s5 , s30, s18
vmla.f32 s4 , s31, s11
@@ -319,9 +319,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F2X1
fldmias XO! , { s2 }
fldmias AO1!, { s8 }
fldmias AO2!, { s16 }
vldmia.f32 XO! , { s2 }
vldmia.f32 AO1!, { s8 }
vldmia.f32 AO2!, { s16 }
vmla.f32 s4 , s2 , s8
vmla.f32 s5 , s2 , s16
@@ -329,10 +329,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_F2
fldmias YO, { s24 - s25 }
vldmia.f32 YO, { s24 - s25 }
vmla.f32 s24, s0, s4
vmla.f32 s25, s0, s5
fstmias YO!, { s24 - s25 }
vstmia.f32 YO!, { s24 - s25 }
.endm
@@ -345,22 +345,22 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X4
fldmias XO , { s28 }
vldmia.f32 XO , { s28 }
add XO, XO, INC_X
fldmias AO1!, { s8 - s9 }
fldmias AO2!, { s16 - s17 }
vldmia.f32 AO1!, { s8 - s9 }
vldmia.f32 AO2!, { s16 - s17 }
vmla.f32 s4 , s28 , s8
fldmias XO , { s29 }
vldmia.f32 XO , { s29 }
add XO, XO, INC_X
vmla.f32 s5 , s28 , s16
fldmias AO1!, { s10 - s11 }
vldmia.f32 AO1!, { s10 - s11 }
vmla.f32 s4 , s29 , s9
fldmias XO , { s30 }
vldmia.f32 XO , { s30 }
add XO, XO, INC_X
vmla.f32 s5 , s29 , s17
fldmias AO2!, { s18 - s19 }
vldmia.f32 AO2!, { s18 - s19 }
vmla.f32 s4 , s30, s10
fldmias XO , { s31 }
vldmia.f32 XO , { s31 }
add XO, XO, INC_X
vmla.f32 s5 , s30, s18
vmla.f32 s4 , s31, s11
@@ -371,10 +371,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S2X1
fldmias XO , { s2 }
fldmias AO1!, { s8 }
vldmia.f32 XO , { s2 }
vldmia.f32 AO1!, { s8 }
add XO, XO, INC_X
fldmias AO2!, { s16 }
vldmia.f32 AO2!, { s16 }
vmla.f32 s4 , s2 , s8
vmla.f32 s5 , s2 , s16
@@ -382,14 +382,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S2
fldmias YO, { s24 }
vldmia.f32 YO, { s24 }
vmla.f32 s24, s0, s4
fstmias YO, { s24 }
vstmia.f32 YO, { s24 }
add YO, YO, INC_Y
fldmias YO, { s24 }
vldmia.f32 YO, { s24 }
vmla.f32 s24, s0, s5
fstmias YO, { s24 }
vstmia.f32 YO, { s24 }
add YO, YO, INC_Y
.endm
@@ -402,10 +402,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X4
fldmias XO! , { s28 - s31 }
fldmias AO1!, { s8 - s9 }
vldmia.f32 XO! , { s28 - s31 }
vldmia.f32 AO1!, { s8 - s9 }
vmla.f32 s4 , s28 , s8
fldmias AO1!, { s10 - s11 }
vldmia.f32 AO1!, { s10 - s11 }
vmla.f32 s4 , s29 , s9
vmla.f32 s4 , s30, s10
vmla.f32 s4 , s31, s11
@@ -415,17 +415,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1X1
fldmias XO! , { s2 }
fldmias AO1!, { s8 }
vldmia.f32 XO! , { s2 }
vldmia.f32 AO1!, { s8 }
vmla.f32 s4 , s2 , s8
.endm
.macro SAVE_F1
fldmias YO, { s24 }
vldmia.f32 YO, { s24 }
vmla.f32 s24, s0, s4
fstmias YO!, { s24 }
vstmia.f32 YO!, { s24 }
.endm
@@ -437,18 +437,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X4
fldmias XO , { s28 }
vldmia.f32 XO , { s28 }
add XO, XO, INC_X
fldmias AO1!, { s8 - s9 }
vldmia.f32 AO1!, { s8 - s9 }
vmla.f32 s4 , s28 , s8
fldmias XO , { s29 }
vldmia.f32 XO , { s29 }
add XO, XO, INC_X
fldmias AO1!, { s10 - s11 }
vldmia.f32 AO1!, { s10 - s11 }
vmla.f32 s4 , s29 , s9
fldmias XO , { s30 }
vldmia.f32 XO , { s30 }
add XO, XO, INC_X
vmla.f32 s4 , s30, s10
fldmias XO , { s31 }
vldmia.f32 XO , { s31 }
add XO, XO, INC_X
vmla.f32 s4 , s31, s11
@@ -457,8 +457,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1X1
fldmias XO , { s2 }
fldmias AO1!, { s8 }
vldmia.f32 XO , { s2 }
vldmia.f32 AO1!, { s8 }
add XO, XO, INC_X
vmla.f32 s4 , s2 , s8
@@ -466,9 +466,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro SAVE_S1
fldmias YO, { s24 }
vldmia.f32 YO, { s24 }
vmla.f32 s24, s0, s4
fstmias YO, { s24 }
vstmia.f32 YO, { s24 }
add YO, YO, INC_Y
.endm

View File

@@ -114,7 +114,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_F
fldmiad X!, { d0 }
vldmia.f64 X!, { d0 }
VABS( d0, d0 )
mov Z, #1
mov INDEX, Z
@@ -123,7 +123,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
add Z, Z, #1
VABS( d4, d4 )
vcmpe.f64 d4, d0
@@ -135,7 +135,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_S
fldmiad X, { d0 }
vldmia.f64 X, { d0 }
VABS( d0, d0 )
mov Z, #1
mov INDEX, Z
@@ -146,7 +146,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
add Z, Z, #1
VABS( d4, d4 )
vcmpe.f64 d4, d0
@@ -161,7 +161,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_F
fldmias X!, { s0 }
vldmia.f32 X!, { s0 }
VABS( s0, s0 )
mov Z, #1
mov INDEX, Z
@@ -170,7 +170,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
add Z, Z, #1
VABS( s4, s4 )
vcmpe.f32 s4, s0
@@ -182,7 +182,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_S
fldmias X, { s0 }
vldmia.f32 X, { s0 }
VABS( s0, s0 )
mov Z, #1
mov INDEX, Z
@@ -193,7 +193,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
add Z, Z, #1
VABS( s4, s4 )
vcmpe.f32 s4, s0
@@ -215,7 +215,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_F
fldmiad X!, { d0 -d1 }
vldmia.f64 X!, { d0 -d1 }
vabs.f64 d0, d0
vabs.f64 d1, d1
vadd.f64 d0 , d0, d1
@@ -227,7 +227,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
add Z, Z, #1
vabs.f64 d4, d4
vabs.f64 d5, d5
@@ -241,7 +241,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_S
fldmiad X, { d0 -d1 }
vldmia.f64 X, { d0 -d1 }
vabs.f64 d0, d0
vabs.f64 d1, d1
vadd.f64 d0 , d0, d1
@@ -255,7 +255,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
add Z, Z, #1
vabs.f64 d4, d4
vabs.f64 d5, d5
@@ -272,7 +272,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_F
fldmias X!, { s0 -s1 }
vldmia.f32 X!, { s0 -s1 }
vabs.f32 s0, s0
vabs.f32 s1, s1
vadd.f32 s0 , s0, s1
@@ -284,7 +284,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
add Z, Z, #1
vabs.f32 s4, s4
vabs.f32 s5, s5
@@ -298,7 +298,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro INIT_S
fldmias X, { s0 -s1 }
vldmia.f32 X, { s0 -s1 }
vabs.f32 s0, s0
vabs.f32 s1, s1
vadd.f32 s0 , s0, s1
@@ -312,7 +312,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
add Z, Z, #1
vabs.f32 s4, s4
vabs.f32 s5, s5

View File

@@ -58,7 +58,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_F1_NEXT_\@
@@ -95,7 +95,7 @@ KERNEL_F1_NEXT_\@:
.macro KERNEL_S1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_S1_NEXT
@@ -121,7 +121,7 @@ KERNEL_S1_NEXT:
.macro KERNEL_F1
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_F1_NEXT_\@
@@ -158,7 +158,7 @@ KERNEL_F1_NEXT_\@:
.macro KERNEL_S1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_S1_NEXT
@@ -191,7 +191,7 @@ KERNEL_S1_NEXT:
.macro KERNEL_F1
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -249,7 +249,7 @@ KERNEL_F1_END_\@:
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -294,7 +294,7 @@ KERNEL_S1_END_\@:
.macro KERNEL_F1
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -350,7 +350,7 @@ KERNEL_F1_END_\@:
.macro KERNEL_S1
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr

View File

@@ -58,7 +58,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 }
vldmia.f64 X!, { d4 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_F1_NEXT_\@
@@ -95,7 +95,7 @@ KERNEL_F1_NEXT_\@:
.macro KERNEL_S1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_S1_NEXT
@@ -121,7 +121,7 @@ KERNEL_S1_NEXT:
.macro KERNEL_F1
fldmias X!, { s4 }
vldmia.f32 X!, { s4 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_F1_NEXT_\@
@@ -158,7 +158,7 @@ KERNEL_F1_NEXT_\@:
.macro KERNEL_S1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
beq KERNEL_S1_NEXT
@@ -191,7 +191,7 @@ KERNEL_S1_NEXT:
.macro KERNEL_F1
fldmiad X!, { d4 - d5 }
vldmia.f64 X!, { d4 - d5 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -249,7 +249,7 @@ KERNEL_F1_END_\@:
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vcmpe.f64 d4, d6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -294,7 +294,7 @@ KERNEL_S1_END_\@:
.macro KERNEL_F1
fldmias X!, { s4 - s5 }
vldmia.f32 X!, { s4 - s5 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr
@@ -350,7 +350,7 @@ KERNEL_F1_END_\@:
.macro KERNEL_S1
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vcmpe.f32 s4, s6 // compare with 0.0
vmrs APSR_nzcv, fpscr

View File

@@ -77,68 +77,68 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
.endm
.macro KERNEL_F1
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
.endm
.macro KERNEL_S1
fldmiad X, { d4 }
fldmiad Y, { d5 }
vldmia.f64 X, { d4 }
vldmia.f64 Y, { d5 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d5
vmul.f64 d3 , d0, d5
vmls.f64 d3 , d1, d4
fstmiad X, { d2 }
fstmiad Y, { d3 }
vstmia.f64 X, { d2 }
vstmia.f64 Y, { d3 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -149,68 +149,68 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
.endm
.macro KERNEL_F1
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
.endm
.macro KERNEL_S1
fldmias X, { s4 }
fldmias Y, { s5 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s5 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s5
vmul.f32 s3 , s0, s5
vmls.f32 s3 , s1, s4
fstmias X, { s2 }
fstmias Y, { s3 }
vstmia.f32 X, { s2 }
vstmia.f32 Y, { s3 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -230,96 +230,96 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
vmul.f64 d2 , d0, d5
fmacd d2 , d1, d7
vmul.f64 d3 , d0, d7
vmls.f64 d3 , d1, d5
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
vmul.f64 d2 , d0, d5
fmacd d2 , d1, d7
vmul.f64 d3 , d0, d7
vmls.f64 d3 , d1, d5
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
vmul.f64 d2 , d0, d5
fmacd d2 , d1, d7
vmul.f64 d3 , d0, d7
vmls.f64 d3 , d1, d5
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
vmul.f64 d2 , d0, d5
fmacd d2 , d1, d7
vmul.f64 d3 , d0, d7
vmls.f64 d3 , d1, d5
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
.endm
.macro KERNEL_F1
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
vmls.f64 d3 , d1, d4
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
vmul.f64 d2 , d0, d5
fmacd d2 , d1, d7
vmul.f64 d3 , d0, d7
vmls.f64 d3 , d1, d5
fstmiad X!, { d2 }
fstmiad Y!, { d3 }
vstmia.f64 X!, { d2 }
vstmia.f64 Y!, { d3 }
.endm
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
fldmiad Y, { d6 - d7 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d6 - d7 }
vmul.f64 d2 , d0, d4
fmacd d2 , d1, d6
vmul.f64 d3 , d0, d6
@@ -347,96 +347,96 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
vmul.f32 s2 , s0, s5
fmacs s2 , s1, s7
vmul.f32 s3 , s0, s7
vmls.f32 s3 , s1, s5
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
vmul.f32 s2 , s0, s5
fmacs s2 , s1, s7
vmul.f32 s3 , s0, s7
vmls.f32 s3 , s1, s5
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
vmul.f32 s2 , s0, s5
fmacs s2 , s1, s7
vmul.f32 s3 , s0, s7
vmls.f32 s3 , s1, s5
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
vmul.f32 s2 , s0, s5
fmacs s2 , s1, s7
vmul.f32 s3 , s0, s7
vmls.f32 s3 , s1, s5
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
.endm
.macro KERNEL_F1
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6
vmls.f32 s3 , s1, s4
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
vmul.f32 s2 , s0, s5
fmacs s2 , s1, s7
vmul.f32 s3 , s0, s7
vmls.f32 s3 , s1, s5
fstmias X!, { s2 }
fstmias Y!, { s3 }
vstmia.f32 X!, { s2 }
vstmia.f32 Y!, { s3 }
.endm
.macro KERNEL_S1
fldmias X, { s4 - s5 }
fldmias Y, { s6 - s7 }
vldmia.f32 X, { s4 - s5 }
vldmia.f32 Y, { s6 - s7 }
vmul.f32 s2 , s0, s4
fmacs s2 , s1, s6
vmul.f32 s3 , s0, s6

View File

@@ -64,30 +64,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
pld [ X, #X_PRE ]
fldmiad X, { d4 - d7 }
vldmia.f64 X, { d4 - d7 }
vmul.f64 d4, d4, d0
vmul.f64 d5, d5, d0
vmul.f64 d6, d6, d0
fstmiad X!, { d4 - d5 }
vstmia.f64 X!, { d4 - d5 }
vmul.f64 d7, d7, d0
fstmiad X!, { d6 - d7 }
vstmia.f64 X!, { d6 - d7 }
.endm
.macro KERNEL_F1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vmul.f64 d4, d4, d0
fstmiad X!, { d4 }
vstmia.f64 X!, { d4 }
.endm
.macro KERNEL_S1
fldmiad X, { d4 }
vldmia.f64 X, { d4 }
vmul.f64 d4, d4, d0
fstmiad X, { d4 }
vstmia.f64 X, { d4 }
add X, X, INC_X
.endm
@@ -96,30 +96,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X, { s4 - s7 }
vldmia.f32 X, { s4 - s7 }
vmul.f32 s4, s4, s0
vmul.f32 s5, s5, s0
vmul.f32 s6, s6, s0
fstmias X!, { s4 - s5 }
vstmia.f32 X!, { s4 - s5 }
vmul.f32 s7, s7, s0
fstmias X!, { s6 - s7 }
vstmia.f32 X!, { s6 - s7 }
.endm
.macro KERNEL_F1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vmul.f32 s4, s4, s0
fstmias X!, { s4 }
vstmia.f32 X!, { s4 }
.endm
.macro KERNEL_S1
fldmias X, { s4 }
vldmia.f32 X, { s4 }
vmul.f32 s4, s4, s0
fstmias X, { s4 }
vstmia.f32 X, { s4 }
add X, X, INC_X
.endm
@@ -136,58 +136,58 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X!, { d2 - d3 }
vstmia.f64 X!, { d2 - d3 }
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X!, { d2 - d3 }
vstmia.f64 X!, { d2 - d3 }
pld [ X, #X_PRE ]
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X!, { d2 - d3 }
vstmia.f64 X!, { d2 - d3 }
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X!, { d2 - d3 }
vstmia.f64 X!, { d2 - d3 }
.endm
.macro KERNEL_F1
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X!, { d2 - d3 }
vstmia.f64 X!, { d2 - d3 }
.endm
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
vldmia.f64 X, { d4 - d5 }
vmul.f64 d2, d0, d4
vmls.f64 d2, d1, d5
vmul.f64 d3, d0, d5
fmacd d3, d1, d4
fstmiad X, { d2 - d3 }
vstmia.f64 X, { d2 - d3 }
add X, X, INC_X
.endm
@@ -199,56 +199,56 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X!, { s2 - s3 }
vstmia.f32 X!, { s2 - s3 }
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X!, { s2 - s3 }
vstmia.f32 X!, { s2 - s3 }
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X!, { s2 - s3 }
vstmia.f32 X!, { s2 - s3 }
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X!, { s2 - s3 }
vstmia.f32 X!, { s2 - s3 }
.endm
.macro KERNEL_F1
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X!, { s2 - s3 }
vstmia.f32 X!, { s2 - s3 }
.endm
.macro KERNEL_S1
fldmias X, { s4 - s5 }
vldmia.f32 X, { s4 - s5 }
vmul.f32 s2, s0, s4
vmls.f32 s2, s1, s5
vmul.f32 s3, s0, s5
fmacs s3, s1, s4
fstmias X, { s2 - s3 }
vstmia.f32 X, { s2 - s3 }
add X, X, INC_X
.endm

View File

@@ -65,17 +65,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_F8
pld [ X, #X_PRE ]
fldmias X!, { s0 - s3 }
fldmias X!, { s4 - s7 }
fstmias Y!, { s0 - s3 }
fstmias Y!, { s4 - s7 }
vldmia.f32 X!, { s0 - s3 }
vldmia.f32 X!, { s4 - s7 }
vstmia.f32 Y!, { s0 - s3 }
vstmia.f32 Y!, { s4 - s7 }
.endm
.macro COPY_F1
fldmias X!, { s0 }
fstmias Y!, { s0 }
vldmia.f32 X!, { s0 }
vstmia.f32 Y!, { s0 }
.endm
@@ -85,23 +85,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S4
nop
fldmias X, { s0 }
fstmias Y, { s0 }
vldmia.f32 X, { s0 }
vstmia.f32 Y, { s0 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s1 }
fstmias Y, { s1 }
vldmia.f32 X, { s1 }
vstmia.f32 Y, { s1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s0 }
fstmias Y, { s0 }
vldmia.f32 X, { s0 }
vstmia.f32 Y, { s0 }
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s1 }
fstmias Y, { s1 }
vldmia.f32 X, { s1 }
vstmia.f32 Y, { s1 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -110,8 +110,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S1
fldmias X, { s0 }
fstmias Y, { s0 }
vldmia.f32 X, { s0 }
vstmia.f32 Y, { s0 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -68,26 +68,26 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X!, { s14 }
fldmias Y!, { s15 }
vldmia.f32 X!, { s14 }
vldmia.f32 Y!, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
fldmias X!, { s14 }
fldmias Y!, { s15 }
vldmia.f32 X!, { s14 }
vldmia.f32 Y!, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
fldmias X!, { s14 }
fldmias Y!, { s15 }
vldmia.f32 X!, { s14 }
vldmia.f32 Y!, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
fldmias X!, { s14 }
fldmias Y!, { s15 }
vldmia.f32 X!, { s14 }
vldmia.f32 Y!, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
@@ -96,8 +96,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s14 }
fldmias Y!, { s15 }
vldmia.f32 X!, { s14 }
vldmia.f32 Y!, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
@@ -109,32 +109,32 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
nop
fldmias X, { s14 }
fldmias Y, { s15 }
vldmia.f32 X, { s14 }
vldmia.f32 Y, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s14 }
fldmias Y, { s15 }
vldmia.f32 X, { s14 }
vldmia.f32 Y, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s14 }
fldmias Y, { s15 }
vldmia.f32 X, { s14 }
vldmia.f32 Y, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
add X, X, INC_X
add Y, Y, INC_Y
fldmias X, { s14 }
fldmias Y, { s15 }
vldmia.f32 X, { s14 }
vldmia.f32 Y, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
@@ -146,8 +146,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s14 }
fldmias Y, { s15 }
vldmia.f32 X, { s14 }
vldmia.f32 Y, { s15 }
vmul.f32 s15, s14, s15
vcvt.f64.f32 d4, s15
vadd.f64 d0 , d0, d4
@@ -162,12 +162,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X!, { s8 - s9 }
fldmias Y!, { s4 - s5}
vldmia.f32 X!, { s8 - s9 }
vldmia.f32 Y!, { s4 - s5}
fmacs s0 , s4, s8
fldmias X!, { s10 - s11 }
vldmia.f32 X!, { s10 - s11 }
fmacs s1 , s5, s9
fldmias Y!, { s6 - s7 }
vldmia.f32 Y!, { s6 - s7 }
fmacs s0 , s6, s10
fmacs s1 , s7, s11
@@ -175,8 +175,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmias X!, { s4 }
fldmias Y!, { s8 }
vldmia.f32 X!, { s4 }
vldmia.f32 Y!, { s8 }
fmacs s0 , s4, s8
.endm
@@ -185,26 +185,26 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S4
nop
fldmias X, { s4 }
fldmias Y, { s8 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s8 }
add X, X, INC_X
add Y, Y, INC_Y
fmacs s0 , s4, s8
fldmias X, { s5 }
fldmias Y, { s9 }
vldmia.f32 X, { s5 }
vldmia.f32 Y, { s9 }
add X, X, INC_X
add Y, Y, INC_Y
fmacs s1 , s5, s9
fldmias X, { s6 }
fldmias Y, { s10 }
vldmia.f32 X, { s6 }
vldmia.f32 Y, { s10 }
add X, X, INC_X
add Y, Y, INC_Y
fmacs s0 , s6, s10
fldmias X, { s7 }
fldmias Y, { s11 }
vldmia.f32 X, { s7 }
vldmia.f32 Y, { s11 }
add X, X, INC_X
add Y, Y, INC_Y
fmacs s1 , s7, s11
@@ -214,8 +214,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmias X, { s4 }
fldmias Y, { s8 }
vldmia.f32 X, { s4 }
vldmia.f32 Y, { s8 }
add X, X, INC_X
fmacs s0 , s4, s8
add Y, Y, INC_Y
@@ -253,11 +253,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cmp N, #0
ble sdot_kernel_L999
cmp INC_X, #0
beq sdot_kernel_L999
# cmp INC_X, #0
# beq sdot_kernel_L999
cmp INC_Y, #0
beq sdot_kernel_L999
# cmp INC_Y, #0
# beq sdot_kernel_L999
cmp INC_X, #1
bne sdot_kernel_S_BEGIN

View File

@@ -112,8 +112,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x2_SUB
fldmias AO! , { s0 - s3 }
fldmias BO! , { s4 - s5 }
vldmia.f32 AO! , { s0 - s3 }
vldmia.f32 BO! , { s4 - s5 }
fmacs s8 , s0, s4
fmacs s9 , s1, s4

View File

@@ -136,29 +136,29 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x4_I
pld [ AO , #A_PRE ]
fldmias AO!, { s0 - s1 }
vldmia.f32 AO!, { s0 - s1 }
pld [ BO , #B_PRE ]
fldmias BO!, { s8 - s9 }
vldmia.f32 BO!, { s8 - s9 }
fmuls s16 , s0, s8
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmuls s17 , s1, s8
fmuls s18 , s2, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmuls s19 , s3, s8
fmuls s20 , s0, s9
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmuls s21 , s1, s9
fmuls s22 , s2, s9
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmuls s23 , s3, s9
fmuls s24 , s0, s10
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmuls s25 , s1, s10
fmuls s26 , s2, s10
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmuls s27 , s3, s10
fmuls s28 , s0, s11
@@ -174,20 +174,20 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ AO , #A_PRE ]
fmacs s16 , s4, s12
fmacs s17 , s5, s12
fldmias AO!, { s0 - s3 }
vldmia.f32 AO!, { s0 - s3 }
fmacs s18 , s6, s12
pld [ BO , #B_PRE ]
fmacs s19 , s7, s12
fmacs s20 , s4, s13
fldmias BO!, { s8 - s11 }
vldmia.f32 BO!, { s8 - s11 }
fmacs s21 , s5, s13
fmacs s22 , s6, s13
//fldmias AO!, { s2 - s3 }
//vldmia.f32 AO!, { s2 - s3 }
fmacs s23 , s7, s13
fmacs s24 , s4, s14
//fldmias BO!, { s10 - s11 }
//vldmia.f32 BO!, { s10 - s11 }
fmacs s25 , s5, s14
fmacs s26 , s6, s14
fmacs s27 , s7, s14
@@ -203,17 +203,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x4_M1
fmacs s16 , s0, s8
fldmias AO!, { s4 - s7 }
vldmia.f32 AO!, { s4 - s7 }
fmacs s17 , s1, s8
fmacs s18 , s2, s8
fldmias BO!, { s12 - s15 }
//fldmias AO!, { s6 - s7 }
vldmia.f32 BO!, { s12 - s15 }
//vldmia.f32 AO!, { s6 - s7 }
fmacs s19 , s3, s8
fmacs s20 , s0, s9
fmacs s21 , s1, s9
fmacs s22 , s2, s9
//fldmias BO!, { s14 - s15 }
//vldmia.f32 BO!, { s14 - s15 }
fmacs s23 , s3, s9
fmacs s24 , s0, s10
@@ -300,7 +300,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0, ALPHA
add r4 , CO2, r3
fldmias CO1, { s8 - s11 }
vldmia.f32 CO1, { s8 - s11 }
fmacs s8 , s0 , s16
flds s12, [CO2]
@@ -322,7 +322,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ CO1 , #C_PRE ]
fldmias r4, { s8 - s11 }
vldmia.f32 r4, { s8 - s11 }
fmacs s8 , s0 , s24
fsts s12, [CO2]
@@ -338,7 +338,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add CO2, r4 , r3
fldmias CO2, { s12 - s15 }
vldmia.f32 CO2, { s12 - s15 }
fsts s8 , [r4 ]
fmacs s12, s0 , s28
@@ -350,7 +350,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fmacs s15, s0 , s31
pld [ r4 , #C_PRE ]
fstmias CO2, { s12 - s15 }
vstmia.f32 CO2, { s12 - s15 }
pld [ CO2 , #C_PRE ]
add CO1, CO1, #16

View File

@@ -73,7 +73,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s3 , [ AO2, #4 ]
add AO1, AO1, #8
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO2, AO2, #8
.endm
@@ -85,7 +85,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s1 , [ AO2, #0 ]
add AO1, AO1, #4
fstmias BO!, { s0 - s1 }
vstmia.f32 BO!, { s0 - s1 }
add AO2, AO2, #4
.endm
@@ -95,7 +95,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0 , [ AO1, #0 ]
flds s1 , [ AO1, #4 ]
fstmias BO!, { s0 - s1 }
vstmia.f32 BO!, { s0 - s1 }
add AO1, AO1, #8
.endm
@@ -105,7 +105,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0 , [ AO1, #0 ]
fstmias BO!, { s0 }
vstmia.f32 BO!, { s0 }
add AO1, AO1, #4
.endm

View File

@@ -100,10 +100,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s11, [ AO4, #8 ]
flds s15, [ AO4, #12 ]
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO4, AO4, #16
fstmias BO!, { s4 - s7 }
fstmias BO!, { s8 - s15 }
vstmia.f32 BO!, { s4 - s7 }
vstmia.f32 BO!, { s8 - s15 }
.endm
@@ -117,7 +117,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s3 , [ AO4, #0 ]
add AO3, AO3, #4
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO4, AO4, #4
.endm
@@ -135,7 +135,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s5 , [ AO2, #8 ]
flds s7 , [ AO2, #12 ]
fstmias BO!, { s0 - s7 }
vstmia.f32 BO!, { s0 - s7 }
add AO2, AO2, #16
.endm
@@ -147,7 +147,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s1 , [ AO2, #0 ]
add AO1, AO1, #4
fstmias BO!, { s0 - s1 }
vstmia.f32 BO!, { s0 - s1 }
add AO2, AO2, #4
.endm
@@ -159,7 +159,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s2 , [ AO1, #8 ]
flds s3 , [ AO1, #12 ]
fstmias BO!, { s0 - s3 }
vstmia.f32 BO!, { s0 - s3 }
add AO1, AO1, #16
.endm
@@ -169,7 +169,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
flds s0 , [ AO1, #0 ]
fstmias BO!, { s0 }
vstmia.f32 BO!, { s0 }
add AO1, AO1, #4
.endm

View File

@@ -76,21 +76,21 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x4_1
pld [ AO1, #A_PRE ]
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
add r3, AO1, LDA
pld [ r3, #A_PRE ]
fldmias r3, { s4 - s7 }
vldmia.f32 r3, { s4 - s7 }
add r3, r3, LDA
pld [ r3, #A_PRE ]
fldmias r3, { s8 - s11 }
vldmia.f32 r3, { s8 - s11 }
add r3, r3, LDA
pld [ r3, #A_PRE ]
fldmias r3, { s12 - s15 }
vldmia.f32 r3, { s12 - s15 }
fstmias BO1, { s0 - s15 }
vstmia.f32 BO1, { s0 - s15 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -98,18 +98,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x4_2
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
add r3, AO1, LDA
fldmias r3, { s4 - s7 }
vldmia.f32 r3, { s4 - s7 }
add r3, r3, LDA
fldmias r3, { s8 - s11 }
vldmia.f32 r3, { s8 - s11 }
add r3, r3, LDA
fldmias r3, { s12 - s15 }
vldmia.f32 r3, { s12 - s15 }
fstmias BO1, { s0 - s15 }
vstmia.f32 BO1, { s0 - s15 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -118,18 +118,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x4
fldmias AO1, { s0 - s1 }
vldmia.f32 AO1, { s0 - s1 }
add r3, AO1, LDA
fldmias r3, { s2 - s3 }
vldmia.f32 r3, { s2 - s3 }
add r3, r3, LDA
fldmias r3, { s4 - s5 }
vldmia.f32 r3, { s4 - s5 }
add r3, r3, LDA
fldmias r3, { s6 - s7 }
vldmia.f32 r3, { s6 - s7 }
fstmias BO2, { s0 - s7 }
vstmia.f32 BO2, { s0 - s7 }
add AO1, AO1, #8
add BO2, BO2, #32
@@ -137,18 +137,18 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x4
fldmias AO1, { s0 }
vldmia.f32 AO1, { s0 }
add r3, AO1, LDA
fldmias r3, { s1 }
vldmia.f32 r3, { s1 }
add r3, r3, LDA
fldmias r3, { s2 }
vldmia.f32 r3, { s2 }
add r3, r3, LDA
fldmias r3, { s3 }
vldmia.f32 r3, { s3 }
fstmias BO3, { s0 - s3 }
vstmia.f32 BO3, { s0 - s3 }
add AO1, AO1, #4
add BO3, BO3, #16
@@ -158,12 +158,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x2
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
add r3, AO1, LDA
fldmias r3, { s4 - s7 }
vldmia.f32 r3, { s4 - s7 }
fstmias BO1, { s0 - s7 }
vstmia.f32 BO1, { s0 - s7 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -171,12 +171,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x2
fldmias AO1, { s0 - s1 }
vldmia.f32 AO1, { s0 - s1 }
add r3, AO1, LDA
fldmias r3, { s2 - s3 }
vldmia.f32 r3, { s2 - s3 }
fstmias BO2, { s0 - s3 }
vstmia.f32 BO2, { s0 - s3 }
add AO1, AO1, #8
add BO2, BO2, #16
@@ -184,12 +184,12 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x2
fldmias AO1, { s0 }
vldmia.f32 AO1, { s0 }
add r3, AO1, LDA
fldmias r3, { s1 }
vldmia.f32 r3, { s1 }
fstmias BO3, { s0 - s1 }
vstmia.f32 BO3, { s0 - s1 }
add AO1, AO1, #4
add BO3, BO3, #8
@@ -199,9 +199,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY4x1
fldmias AO1, { s0 - s3 }
vldmia.f32 AO1, { s0 - s3 }
fstmias BO1, { s0 - s3 }
vstmia.f32 BO1, { s0 - s3 }
add AO1, AO1, #16
add BO1, BO1, M4
@@ -209,9 +209,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY2x1
fldmias AO1, { s0 - s1 }
vldmia.f32 AO1, { s0 - s1 }
fstmias BO2, { s0 - s1 }
vstmia.f32 BO2, { s0 - s1 }
add AO1, AO1, #8
add BO2, BO2, #8
@@ -219,9 +219,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY1x1
fldmias AO1, { s0 }
vldmia.f32 AO1, { s0 }
fstmias BO3, { s0 }
vstmia.f32 BO3, { s0 }
add AO1, AO1, #4
add BO3, BO3, #4

View File

@@ -118,8 +118,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x2_SUB
fldmias AO!, { s0 - s3 }
fldmias BO!, { s4 - s5 }
vldmia.f32 AO!, { s0 - s3 }
vldmia.f32 BO!, { s4 - s5 }
fmacs s8 , s0, s4
fmacs s9 , s1, s4

View File

@@ -122,30 +122,30 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x4_I
fldmias AO!, { s0 - s1 }
vldmia.f32 AO!, { s0 - s1 }
pld [ AO , #A_PRE-8 ]
fldmias BO!, { s8 - s9 }
vldmia.f32 BO!, { s8 - s9 }
pld [ BO , #B_PRE-8 ]
fmuls s16 , s0, s8
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmuls s17 , s1, s8
fmuls s18 , s2, s8
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmuls s19 , s3, s8
fmuls s20 , s0, s9
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmuls s21 , s1, s9
fmuls s22 , s2, s9
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmuls s23 , s3, s9
fmuls s24 , s0, s10
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmuls s25 , s1, s10
fmuls s26 , s2, s10
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmuls s27 , s3, s10
fmuls s28 , s0, s11
@@ -161,20 +161,20 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ AO , #A_PRE ]
fmacs s16 , s4, s12
fmacs s17 , s5, s12
fldmias AO!, { s0 - s1 }
vldmia.f32 AO!, { s0 - s1 }
fmacs s18 , s6, s12
pld [ BO , #B_PRE ]
fmacs s19 , s7, s12
fmacs s20 , s4, s13
fldmias AO!, { s2 - s3 }
vldmia.f32 AO!, { s2 - s3 }
fmacs s21 , s5, s13
fmacs s22 , s6, s13
fldmias BO!, { s8 - s9 }
vldmia.f32 BO!, { s8 - s9 }
fmacs s23 , s7, s13
fmacs s24 , s4, s14
fldmias BO!, { s10 - s11 }
vldmia.f32 BO!, { s10 - s11 }
fmacs s25 , s5, s14
fmacs s26 , s6, s14
fmacs s27 , s7, s14
@@ -190,17 +190,17 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL4x4_M1
fmacs s16 , s0, s8
fldmias AO!, { s4 - s5 }
vldmia.f32 AO!, { s4 - s5 }
fmacs s17 , s1, s8
fmacs s18 , s2, s8
fldmias AO!, { s6 - s7 }
vldmia.f32 AO!, { s6 - s7 }
fmacs s19 , s3, s8
fmacs s20 , s0, s9
fldmias BO!, { s12 - s13 }
vldmia.f32 BO!, { s12 - s13 }
fmacs s21 , s1, s9
fmacs s22 , s2, s9
fldmias BO!, { s14 - s15 }
vldmia.f32 BO!, { s14 - s15 }
fmacs s23 , s3, s9
fmacs s24 , s0, s10
@@ -325,7 +325,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fsts s11, [r4 , #12 ]
fmuls s15, s0 , s31
fstmias CO2, { s12 - s15 }
vstmia.f32 CO2, { s12 - s15 }
add CO1, CO1, #16

View File

@@ -103,29 +103,29 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d0 - d3 }
fldmiad Y, { d4 - d7 }
fstmiad Y!, { d0 - d3 }
fstmiad X!, { d4 - d7}
vldmia.f64 X, { d0 - d3 }
vldmia.f64 Y, { d4 - d7 }
vstmia.f64 Y!, { d0 - d3 }
vstmia.f64 X!, { d4 - d7}
.endm
.macro KERNEL_F1
fldmiad X, { d0 }
fldmiad Y, { d4 }
fstmiad Y!, { d0 }
fstmiad X!, { d4 }
vldmia.f64 X, { d0 }
vldmia.f64 Y, { d4 }
vstmia.f64 Y!, { d0 }
vstmia.f64 X!, { d4 }
.endm
.macro KERNEL_S1
fldmiad X, { d0 }
fldmiad Y, { d4 }
fstmiad Y, { d0 }
fstmiad X, { d4 }
vldmia.f64 X, { d0 }
vldmia.f64 Y, { d4 }
vstmia.f64 Y, { d0 }
vstmia.f64 X, { d4 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -135,29 +135,29 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F4
fldmias X, { s0 - s3 }
fldmias Y, { s4 - s7 }
fstmias Y!, { s0 - s3 }
fstmias X!, { s4 - s7}
vldmia.f32 X, { s0 - s3 }
vldmia.f32 Y, { s4 - s7 }
vstmia.f32 Y!, { s0 - s3 }
vstmia.f32 X!, { s4 - s7}
.endm
.macro KERNEL_F1
fldmias X, { s0 }
fldmias Y, { s4 }
fstmias Y!, { s0 }
fstmias X!, { s4 }
vldmia.f32 X, { s0 }
vldmia.f32 Y, { s4 }
vstmia.f32 Y!, { s0 }
vstmia.f32 X!, { s4 }
.endm
.macro KERNEL_S1
fldmias X, { s0 }
fldmias Y, { s4 }
fstmias Y, { s0 }
fstmias X, { s4 }
vldmia.f32 X, { s0 }
vldmia.f32 Y, { s4 }
vstmia.f32 Y, { s0 }
vstmia.f32 X, { s4 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -174,35 +174,35 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d0 - d3 }
fldmiad Y, { d4 - d7 }
fstmiad Y!, { d0 - d3 }
fstmiad X!, { d4 - d7}
vldmia.f64 X, { d0 - d3 }
vldmia.f64 Y, { d4 - d7 }
vstmia.f64 Y!, { d0 - d3 }
vstmia.f64 X!, { d4 - d7}
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X, { d0 - d3 }
fldmiad Y, { d4 - d7 }
fstmiad Y!, { d0 - d3 }
fstmiad X!, { d4 - d7}
vldmia.f64 X, { d0 - d3 }
vldmia.f64 Y, { d4 - d7 }
vstmia.f64 Y!, { d0 - d3 }
vstmia.f64 X!, { d4 - d7}
.endm
.macro KERNEL_F1
fldmiad X, { d0 - d1 }
fldmiad Y, { d4 - d5 }
fstmiad Y!, { d0 - d1 }
fstmiad X!, { d4 - d5 }
vldmia.f64 X, { d0 - d1 }
vldmia.f64 Y, { d4 - d5 }
vstmia.f64 Y!, { d0 - d1 }
vstmia.f64 X!, { d4 - d5 }
.endm
.macro KERNEL_S1
fldmiad X, { d0 - d1 }
fldmiad Y, { d4 - d5 }
fstmiad Y, { d0 - d1 }
fstmiad X, { d4 - d5 }
vldmia.f64 X, { d0 - d1 }
vldmia.f64 Y, { d4 - d5 }
vstmia.f64 Y, { d0 - d1 }
vstmia.f64 X, { d4 - d5 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -215,33 +215,33 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmias X, { s0 - s3 }
fldmias Y, { s4 - s7 }
fstmias Y!, { s0 - s3 }
fstmias X!, { s4 - s7}
vldmia.f32 X, { s0 - s3 }
vldmia.f32 Y, { s4 - s7 }
vstmia.f32 Y!, { s0 - s3 }
vstmia.f32 X!, { s4 - s7}
fldmias X, { s0 - s3 }
fldmias Y, { s4 - s7 }
fstmias Y!, { s0 - s3 }
fstmias X!, { s4 - s7}
vldmia.f32 X, { s0 - s3 }
vldmia.f32 Y, { s4 - s7 }
vstmia.f32 Y!, { s0 - s3 }
vstmia.f32 X!, { s4 - s7}
.endm
.macro KERNEL_F1
fldmias X, { s0 - s1 }
fldmias Y, { s4 - s5 }
fstmias Y!, { s0 - s1 }
fstmias X!, { s4 - s5 }
vldmia.f32 X, { s0 - s1 }
vldmia.f32 Y, { s4 - s5 }
vstmia.f32 Y!, { s0 - s1 }
vstmia.f32 X!, { s4 - s5 }
.endm
.macro KERNEL_S1
fldmias X, { s0 - s1 }
fldmias Y, { s4 - s5 }
fstmias Y, { s0 - s1 }
fstmias X, { s4 - s5 }
vldmia.f32 X, { s0 - s1 }
vldmia.f32 Y, { s4 - s5 }
vstmia.f32 Y, { s0 - s1 }
vstmia.f32 X, { s4 - s5 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -66,15 +66,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ X, #X_PRE+32 ]
fldmiad X!, { d0 - d7 }
fstmiad Y!, { d0 - d7 }
vldmia.f64 X!, { d0 - d7 }
vstmia.f64 Y!, { d0 - d7 }
.endm
.macro COPY_F1
fldmiad X!, { d0 - d1 }
fstmiad Y!, { d0 - d1 }
vldmia.f64 X!, { d0 - d1 }
vstmia.f64 Y!, { d0 - d1 }
.endm
@@ -84,23 +84,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S4
nop
fldmiad X, { d0 - d1 }
fstmiad Y, { d0 - d1 }
vldmia.f64 X, { d0 - d1 }
vstmia.f64 Y, { d0 - d1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d2 - d3 }
fstmiad Y, { d2 - d3 }
vldmia.f64 X, { d2 - d3 }
vstmia.f64 Y, { d2 - d3 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d0 - d1 }
fstmiad Y, { d0 - d1 }
vldmia.f64 X, { d0 - d1 }
vstmia.f64 Y, { d0 - d1 }
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d2 - d3 }
fstmiad Y, { d2 - d3 }
vldmia.f64 X, { d2 - d3 }
vstmia.f64 Y, { d2 - d3 }
add X, X, INC_X
add Y, Y, INC_Y
@@ -109,8 +109,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro COPY_S1
fldmiad X, { d0 - d1 }
fstmiad Y, { d0 - d1 }
vldmia.f64 X, { d0 - d1 }
vstmia.f64 Y, { d0 - d1 }
add X, X, INC_X
add Y, Y, INC_Y

View File

@@ -76,15 +76,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ X, #X_PRE ]
pld [ Y, #X_PRE ]
fldmiad X!, { d4 - d5 }
fldmiad Y!, { d8 - d9 }
vldmia.f64 X!, { d4 - d5 }
vldmia.f64 Y!, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fldmiad X!, { d6 - d7 }
vldmia.f64 X!, { d6 - d7 }
fmacd d2 , d5, d9
fmacd d3 , d5, d8
fldmiad Y!, { d10 - d11 }
vldmia.f64 Y!, { d10 - d11 }
fmacd d0 , d6, d10
fmacd d1 , d6, d11
pld [ X, #X_PRE ]
@@ -93,15 +93,15 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
pld [ Y, #X_PRE ]
fldmiad X!, { d4 - d5 }
fldmiad Y!, { d8 - d9 }
vldmia.f64 X!, { d4 - d5 }
vldmia.f64 Y!, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fldmiad X!, { d6 - d7 }
vldmia.f64 X!, { d6 - d7 }
fmacd d2 , d5, d9
fmacd d3 , d5, d8
fldmiad Y!, { d10 - d11 }
vldmia.f64 Y!, { d10 - d11 }
fmacd d0 , d6, d10
fmacd d1 , d6, d11
fmacd d2 , d7, d11
@@ -111,8 +111,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_F1
fldmiad X!, { d4 - d5 }
fldmiad Y!, { d8 - d9 }
vldmia.f64 X!, { d4 - d5 }
vldmia.f64 Y!, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -127,8 +127,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
nop
fldmiad X, { d4 - d5 }
fldmiad Y, { d8 - d9 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -136,8 +136,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d4 - d5 }
fldmiad Y, { d8 - d9 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -145,8 +145,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d4 - d5 }
fldmiad Y, { d8 - d9 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -154,8 +154,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
add X, X, INC_X
add Y, Y, INC_Y
fldmiad X, { d4 - d5 }
fldmiad Y, { d8 - d9 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -168,8 +168,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.macro KERNEL_S1
fldmiad X, { d4 - d5 }
fldmiad Y, { d8 - d9 }
vldmia.f64 X, { d4 - d5 }
vldmia.f64 Y, { d8 - d9 }
fmacd d0 , d4, d8
fmacd d1 , d4, d9
fmacd d2 , d5, d9
@@ -218,11 +218,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cmp N, #0
ble zdot_kernel_L999
cmp INC_X, #0
beq zdot_kernel_L999
# cmp INC_X, #0
# beq zdot_kernel_L999
cmp INC_Y, #0
beq zdot_kernel_L999
# cmp INC_Y, #0
# beq zdot_kernel_L999
cmp INC_X, #1
bne zdot_kernel_S_BEGIN

View File

@@ -360,7 +360,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d7 }
vldmia.f64 CO1, { d4 - d7 }
FMAC_R1 d4 , d0 , d8
FMAC_I1 d5 , d0 , d9
@@ -372,9 +372,9 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d6 , d1 , d11
FMAC_I2 d7 , d1 , d10
fstmiad CO1, { d4 - d7 }
vstmia.f64 CO1, { d4 - d7 }
fldmiad CO2, { d4 - d7 }
vldmia.f64 CO2, { d4 - d7 }
FMAC_R1 d4 , d0 , d12
FMAC_I1 d5 , d0 , d13
@@ -386,7 +386,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d6 , d1 , d15
FMAC_I2 d7 , d1 , d14
fstmiad CO2, { d4 - d7 }
vstmia.f64 CO2, { d4 - d7 }
add CO1, CO1, #32
@@ -543,23 +543,23 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d5 }
vldmia.f64 CO1, { d4 - d5 }
FMAC_R1 d4 , d0 , d8
FMAC_I1 d5 , d0 , d9
FMAC_R2 d4 , d1 , d9
FMAC_I2 d5 , d1 , d8
fstmiad CO1, { d4 - d5 }
vstmia.f64 CO1, { d4 - d5 }
fldmiad CO2, { d4 - d5 }
vldmia.f64 CO2, { d4 - d5 }
FMAC_R1 d4 , d0 , d12
FMAC_I1 d5 , d0 , d13
FMAC_R2 d4 , d1 , d13
FMAC_I2 d5 , d1 , d12
fstmiad CO2, { d4 - d5 }
vstmia.f64 CO2, { d4 - d5 }
add CO1, CO1, #16
@@ -714,7 +714,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d7 }
vldmia.f64 CO1, { d4 - d7 }
FMAC_R1 d4 , d0 , d8
FMAC_I1 d5 , d0 , d9
@@ -726,7 +726,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d6 , d1 , d11
FMAC_I2 d7 , d1 , d10
fstmiad CO1, { d4 - d7 }
vstmia.f64 CO1, { d4 - d7 }
add CO1, CO1, #32
@@ -843,14 +843,14 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d5 }
vldmia.f64 CO1, { d4 - d5 }
FMAC_R1 d4 , d0 , d8
FMAC_I1 d5 , d0 , d9
FMAC_R2 d4 , d1 , d9
FMAC_I2 d5 , d1 , d8
fstmiad CO1, { d4 - d5 }
vstmia.f64 CO1, { d4 - d5 }
add CO1, CO1, #16

View File

@@ -374,8 +374,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d7 }
fldmiad CO2, { d8 - d11 }
vldmia.f64 CO1, { d4 - d7 }
vldmia.f64 CO2, { d8 - d11 }
FADD_R d16, d24 , d16
FADD_I d17, d25 , d17
@@ -406,8 +406,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d10, d1 , d23
FMAC_I2 d11, d1 , d22
fstmiad CO1, { d4 - d7 }
fstmiad CO2, { d8 - d11 }
vstmia.f64 CO1, { d4 - d7 }
vstmia.f64 CO2, { d8 - d11 }
add CO1, CO1, #32
@@ -570,8 +570,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d5 }
fldmiad CO2, { d8 - d9 }
vldmia.f64 CO1, { d4 - d5 }
vldmia.f64 CO2, { d8 - d9 }
FADD_R d16, d24 , d16
FADD_I d17, d25 , d17
@@ -588,8 +588,8 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d8 , d1 , d21
FMAC_I2 d9 , d1 , d20
fstmiad CO1, { d4 - d5 }
fstmiad CO2, { d8 - d9 }
vstmia.f64 CO1, { d4 - d5 }
vstmia.f64 CO2, { d8 - d9 }
add CO1, CO1, #16
@@ -752,7 +752,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d7 }
vldmia.f64 CO1, { d4 - d7 }
FADD_R d16, d24 , d16
FADD_I d17, d25 , d17
@@ -769,7 +769,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d6 , d1 , d19
FMAC_I2 d7 , d1 , d18
fstmiad CO1, { d4 - d7 }
vstmia.f64 CO1, { d4 - d7 }
add CO1, CO1, #32
@@ -887,7 +887,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0, ALPHA_R
fldd d1, ALPHA_I
fldmiad CO1, { d4 - d5 }
vldmia.f64 CO1, { d4 - d5 }
FADD_R d16, d24 , d16
FADD_I d17, d25 , d17
@@ -897,7 +897,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
FMAC_R2 d4 , d1 , d17
FMAC_I2 d5 , d1 , d16
fstmiad CO1, { d4 - d5 }
vstmia.f64 CO1, { d4 - d5 }
add CO1, CO1, #16

View File

@@ -87,7 +87,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d6 , [ AO2, #16 ]
fldd d7 , [ AO2, #24 ]
fstmiad BO!, { d0 - d7 }
vstmia.f64 BO!, { d0 - d7 }
add AO2, AO2, #32
.endm
@@ -101,7 +101,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d3 , [ AO2, #8 ]
add AO1, AO1, #16
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO2, AO2, #16
.endm
@@ -113,7 +113,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d2 , [ AO1, #16 ]
fldd d3 , [ AO1, #24 ]
fstmiad BO!, { d0 - d3 }
vstmia.f64 BO!, { d0 - d3 }
add AO1, AO1, #32
.endm
@@ -124,7 +124,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
fldd d0 , [ AO1, #0 ]
fldd d1 , [ AO1, #8 ]
fstmiad BO!, { d0 - d1 }
vstmia.f64 BO!, { d0 - d1 }
add AO1, AO1, #16
.endm

Some files were not shown because too many files have changed in this diff Show More