Zhang Xianyi
e3e20e2242
Merge pull request #733 from yuyichao/arm-asm
...
Do not use vsub to clear the register values
2016-01-05 19:35:12 -06:00
Yichao Yu
594b9f4c73
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
2016-01-05 16:54:05 +00:00
Werner Saar
c8f2c5d636
added optimized trsm_kernels
2016-01-05 13:05:05 +01:00
Ashwin Sekhar T K
318f0949c3
lapack-test fixes in nrm2 kernels for Cortex A57
2015-11-23 13:43:36 +05:30
Ashwin Sekhar T K
98965da2e8
lapack-test fixes for Cortex A57
2015-11-20 01:15:04 +05:30
Ashwin Sekhar T K
c99c43d51e
Optimized trmm kernels for CORTEXA57
2015-11-09 14:15:54 +05:30
Ashwin Sekhar T K
1397b47197
Optimized zgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
45f78963ac
Optimized cgemm kernel for CORTEXA57
...
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
402443bf9c
Optimized dgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
19fdbee291
Improve the sgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
3b0cdfab1e
Optimized gemv kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
46efa6a1da
Optimized swap kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
ea1465cdf8
Optimized scal kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
fb4be3b3eb
Optimized rot kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
6c2f4ddbcd
Optimized nrm2 kernels for CORTEXA57
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
870c4d49c0
Optimized dot kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
cd7684097c
Optimized copy kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
2690b71b1f
Optimized axpy kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
3e4acedf0e
Optimized asum kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
2610752dbb
Optimized iamax kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ashwin Sekhar T K
dbb213655e
Optimized amax kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ashwin Sekhar T K
f2f8a0fe8b
Adding arm64 target CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ralph Campbell
c053559ed9
Minor C code fixes in kernel/arm
2015-11-09 14:15:49 +05:30
Ralph Campbell
55e4332f00
Remove duplicate -D args in kernel/Makefile.L1
2015-11-09 14:15:48 +05:30
Zhang Xianyi
69363622a8
Fix DYNAMIC_ARCH=1 bug.
2015-10-27 05:10:40 +08:00
Zhang Xianyi
53b6023a6c
Fix cmake bug on MSVC 32-bit.
2015-10-26 14:52:13 -05:00
Zhang Xianyi
309875de3c
Fix cmake bug on x86 32-bit.
...
e.g. Build 32-bit on 64-bit Linux.
cmake -DBINARY=32
2015-10-27 02:54:53 +08:00
Zhang Xianyi
8fade093aa
Fixed cmake bug on Visual Studio.
2015-10-20 14:37:22 -05:00
Zhang Xianyi
96f0bbe067
Fixed cmake bug on haswell.
2015-10-21 02:24:54 +08:00
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
2015-10-20 04:30:55 +08:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
...
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
7df0820160
Use C kernels for s/dgemv on x86.
2015-08-19 08:07:47 -05:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
898fc7552a
Merge pull request #612 from ibmsoe/ppc64le
...
ppc64le platform support (ELF ABI v2)
2015-08-04 16:58:24 -05:00
Zhang Xianyi
ab0a0a75fc
Merge branch 'develop' into cmake
2015-08-03 23:59:01 -05:00
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
...
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
Zhang Xianyi
7ac7e147d4
Fixed cmake building bugs on Linux. Disable LAPACK by default.
2015-08-04 04:37:05 +08:00
Matthew Brandyberry
7ba4fe5afb
ppc64le platform support (ELF ABI v2)
2015-07-21 22:20:19 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
2015-06-13 16:16:29 +02:00
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
2015-06-13 10:28:27 +02:00
Werner Saar
24f58c8bb1
added optimized cscal and zscal kernels for steamroller
2015-05-18 12:40:07 +02:00
Werner Saar
95b1faf667
added optimized cscal and zscal kernels for steamroller and piledriver
2015-05-18 10:50:57 +02:00
Werner Saar
2d9e406050
added optimized cscal kernel for sandybridge
2015-05-18 08:46:06 +02:00
Werner Saar
59083e3ce1
added optimized cscal kernel for bulldozer
2015-05-18 07:33:52 +02:00
wernsaar
685be40339
Merge pull request #571 from wernsaar/develop
...
added optimized cscal and zscal functions
2015-05-17 14:09:14 +02:00
Werner Saar
31c9e399e9
added optimized cscal kernel for haswell
2015-05-17 13:44:09 +02:00
Werner Saar
7de6bb9889
added optimized zscal kernel for bulldozer
2015-05-17 11:45:19 +02:00
Werner Saar
d63034303b
added optimized zscal kernel for haswell
2015-05-16 16:41:45 +02:00