Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
2016-02-23 22:47:53 +00:00 |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
2016-02-19 17:56:07 -05:00 |
Zhang Xianyi
|
d06b92906a
|
Add gemm3m building for CMake.
|
2016-02-12 05:02:51 +08:00 |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
2016-02-02 09:15:02 +08:00 |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
2016-01-28 04:38:07 +08:00 |
Werner Saar
|
63a7d7fb24
|
updated gemv_n_vfpv3.S for armv7
|
2016-01-25 15:00:13 +01:00 |
Werner Saar
|
b4ede558a5
|
updated nrm2 kernel for armv7
|
2016-01-25 11:55:25 +01:00 |
Werner Saar
|
de3e2d4349
|
updated trmm kernels for armv7
|
2016-01-25 11:08:56 +01:00 |
Werner Saar
|
a0e51e96f1
|
updated gemm kernels for armv7
|
2016-01-25 10:46:10 +01:00 |
Werner Saar
|
c2891330bc
|
updated KERNEL.ARMV6
|
2016-01-24 17:12:07 +01:00 |
Werner Saar
|
ceaa931e48
|
updated gemv kernel for armv6
|
2016-01-24 16:31:19 +01:00 |
Werner Saar
|
eaa63165df
|
updated cgemv and zgemv kernels for armv6
|
2016-01-24 14:42:38 +01:00 |
Werner Saar
|
c65357c566
|
updated trmm_kernels for armv6
|
2016-01-24 13:03:33 +01:00 |
Werner Saar
|
e63e9f9f26
|
updated gemm_kernels for armv6
|
2016-01-24 11:55:50 +01:00 |
Werner Saar
|
aafd3ab60e
|
updated cdot and zdot on arm
|
2016-01-24 10:56:49 +01:00 |
Werner Saar
|
d2f84c9c8a
|
Ref #740: updated nrm2_vfp.S
|
2016-01-23 17:47:58 +01:00 |
Werner Saar
|
ca32253f32
|
Ref #740: updated asum_vfp.S and iamax_vfp.S
|
2016-01-23 14:44:34 +01:00 |
Werner Saar
|
9066d1f982
|
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
|
2016-01-23 11:59:51 +01:00 |
Werner Saar
|
692d9c881c
|
Ref #740: simple solution to clear floating point register on arm
|
2016-01-17 15:37:12 +01:00 |
Zhang Xianyi
|
3602a2cd1f
|
#736 Revert #733 patch to fix bus error on ARM.
|
2016-01-12 22:19:58 +00:00 |
Zhang Xianyi
|
e3e20e2242
|
Merge pull request #733 from yuyichao/arm-asm
Do not use vsub to clear the register values
|
2016-01-05 19:35:12 -06:00 |
Yichao Yu
|
594b9f4c73
|
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
|
2016-01-05 16:54:05 +00:00 |
Werner Saar
|
c8f2c5d636
|
added optimized trsm_kernels
|
2016-01-05 13:05:05 +01:00 |
Ashwin Sekhar T K
|
318f0949c3
|
lapack-test fixes in nrm2 kernels for Cortex A57
|
2015-11-23 13:43:36 +05:30 |
Ashwin Sekhar T K
|
98965da2e8
|
lapack-test fixes for Cortex A57
|
2015-11-20 01:15:04 +05:30 |
Ashwin Sekhar T K
|
c99c43d51e
|
Optimized trmm kernels for CORTEXA57
|
2015-11-09 14:15:54 +05:30 |
Ashwin Sekhar T K
|
1397b47197
|
Optimized zgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
45f78963ac
|
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
402443bf9c
|
Optimized dgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
19fdbee291
|
Improve the sgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
3b0cdfab1e
|
Optimized gemv kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |
Ashwin Sekhar T K
|
46efa6a1da
|
Optimized swap kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |
Ashwin Sekhar T K
|
ea1465cdf8
|
Optimized scal kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |
Ashwin Sekhar T K
|
fb4be3b3eb
|
Optimized rot kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |
Ashwin Sekhar T K
|
6c2f4ddbcd
|
Optimized nrm2 kernels for CORTEXA57
|
2015-11-09 14:15:51 +05:30 |
Ashwin Sekhar T K
|
870c4d49c0
|
Optimized dot kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:51 +05:30 |
Ashwin Sekhar T K
|
cd7684097c
|
Optimized copy kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:51 +05:30 |
Ashwin Sekhar T K
|
2690b71b1f
|
Optimized axpy kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:51 +05:30 |
Ashwin Sekhar T K
|
3e4acedf0e
|
Optimized asum kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:51 +05:30 |
Ashwin Sekhar T K
|
2610752dbb
|
Optimized iamax kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:50 +05:30 |
Ashwin Sekhar T K
|
dbb213655e
|
Optimized amax kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:50 +05:30 |
Ashwin Sekhar T K
|
f2f8a0fe8b
|
Adding arm64 target CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:50 +05:30 |
Ralph Campbell
|
c053559ed9
|
Minor C code fixes in kernel/arm
|
2015-11-09 14:15:49 +05:30 |
Ralph Campbell
|
55e4332f00
|
Remove duplicate -D args in kernel/Makefile.L1
|
2015-11-09 14:15:48 +05:30 |
Zhang Xianyi
|
69363622a8
|
Fix DYNAMIC_ARCH=1 bug.
|
2015-10-27 05:10:40 +08:00 |
Zhang Xianyi
|
53b6023a6c
|
Fix cmake bug on MSVC 32-bit.
|
2015-10-26 14:52:13 -05:00 |
Zhang Xianyi
|
309875de3c
|
Fix cmake bug on x86 32-bit.
e.g. Build 32-bit on 64-bit Linux.
cmake -DBINARY=32
|
2015-10-27 02:54:53 +08:00 |
Zhang Xianyi
|
8fade093aa
|
Fixed cmake bug on Visual Studio.
|
2015-10-20 14:37:22 -05:00 |
Zhang Xianyi
|
96f0bbe067
|
Fixed cmake bug on haswell.
|
2015-10-21 02:24:54 +08:00 |
Zhang Xianyi
|
d8392c1245
|
Fixe cmake config bugs.
|
2015-10-20 04:30:55 +08:00 |