wernsaar
43fbdb7a5a
added ARMV5 as reference platform
2014-05-13 17:25:19 +02:00
wernsaar
777cebc8c7
added ZERO check to zscal.c because bug in lapack-testing
2014-05-13 16:31:00 +02:00
wernsaar
aa5c73e20f
added ZERO check to zscal.c because bug in lapack-test
2014-05-13 16:25:21 +02:00
wernsaar
5e5ef28ca0
added ZERO check because bug in lapack-test
2014-05-13 15:36:03 +02:00
wernsaar
650ed34336
added ZERO check because bug in lapack-test
2014-05-13 15:31:36 +02:00
wernsaar
5f3b68b4d4
replaced sgemm and cgemm kernels because lapack bugs
2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd
replaced dgemm-kernel because bug in lapack
2014-05-10 10:52:37 +02:00
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
2014-05-06 15:29:39 +02:00
wernsaar
47b22763f8
reduced stack usage on windows to 16K
2014-04-24 14:09:26 +02:00
wernsaar
9db0fb8b02
bugfix for sdsdot
2014-02-28 14:59:36 +01:00
wernsaar
f9daebba0a
checked in bugfixes for ARM
2014-02-16 11:45:47 +01:00
Zhang Xianyi
9a557e90da
Refs #340 . Fixed SEGFAULT bug of dgemv_n on OSX.
2014-02-15 23:23:15 +08:00
wangqian
2d557eb1e0
Fixed computational error of dgemv_n.
2014-02-04 21:47:51 +08:00
Zhang Xianyi
05bb391c3a
Refs #330 . Fixed the compatible issue with clang on Mac OSX.
2013-12-16 20:31:17 +08:00
Zhang Xianyi
9b5be29886
Refs #310 . Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
...
Please also check JuliaLang/julia#4099
Julia test script:
A=rand(256, 256)
qrfact(A)
I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar
53eaf41901
added support for HASWELL
2013-12-02 13:17:51 +01:00
wernsaar
9423f980f6
modified trsm kernel
2013-12-02 10:08:14 +01:00
wernsaar
c6156b2ef2
added trsm kernels from origin
2013-12-01 22:39:39 +01:00
wernsaar
034a5b2083
modified zsymv
2013-12-01 21:07:49 +01:00
wernsaar
27d4234d4d
merged symv
2013-12-01 20:56:02 +01:00
wernsaar
402d6e91db
Merge remote branch 'origin/develop' into armv7
2013-12-01 18:18:40 +01:00
wernsaar
b3254eecaf
Merge remote branch 'origin/haswell' into develop
2013-12-01 18:09:12 +01:00
wernsaar
d910404f00
Merge remote branch 'origin/piledriver' into develop
2013-12-01 18:06:51 +01:00
wernsaar
ffe70b1fdc
modified Makefile.L3
2013-12-01 17:58:46 +01:00
wernsaar
0b6e13b689
Merge remote branch 'origin/develop' into haswell
2013-12-01 13:38:11 +01:00
wernsaar
e09dc279a2
Merge remote branch 'origin/develop' into piledriver
2013-12-01 13:33:18 +01:00
wernsaar
4be4db590c
Merge remote branch 'origin/develop' into armv7
2013-12-01 13:16:41 +01:00
wernsaar
5c648a8984
Merge remote branch 'origin/develop' into haswell
2013-12-01 11:25:33 +01:00
wernsaar
c44dc4dd3c
Merge remote branch 'origin/develop' into piledriver
2013-12-01 11:06:36 +01:00
wernsaar
9d3fae15a8
Merge branch 'develop' into armv7
2013-12-01 10:12:07 +01:00
wernsaar
2d3c884294
added complex gemv kernels for ARMV6 and ARMV7
2013-11-29 17:06:33 +01:00
wernsaar
d54a061713
optimized gemv_n_vfp.S
2013-11-28 17:40:21 +01:00
wernsaar
86afb47e83
added optimized ctrmm kernel for ARMV6
2013-11-28 14:35:07 +01:00
wernsaar
42a4dff056
added optimized ztrmm kernel for ARMV6
2013-11-28 13:41:06 +01:00
wernsaar
5bc322a66c
optimized strmm kernel for ARMV6
2013-11-28 12:45:38 +01:00
wernsaar
dec7ad0dfd
optimized dtrmm kernel for ARMV7
2013-11-28 12:32:12 +01:00
wernsaar
274304bd03
add optimized cgemm kernel for ARMV6
2013-11-28 11:54:38 +01:00
wernsaar
5007a534c4
optimized zgemm kernel for ARMV6
2013-11-28 10:04:43 +01:00
wernsaar
a537d7d8d7
optimized zgemm_kernel_2x2_vfp.S
2013-11-28 08:33:44 +01:00
wernsaar
b42145834f
optimized sgemm kernel for ARMV6
2013-11-28 08:08:08 +01:00
wernsaar
3d5e792c72
optimized sgemm kernel for ARMV6
2013-11-27 18:38:32 +01:00
wernsaar
a9bd12da2c
optimized dgemm kernel for ARMV6
2013-11-27 17:37:38 +01:00
wernsaar
697e198e8a
added zgemm_kernel for ARMV6
2013-11-27 16:15:06 +01:00
wernsaar
36b0f7fe1d
added optimized gemv_t kernel for ARMV6
2013-11-25 19:31:27 +01:00
wernsaar
d2b20c5c51
add optimized axpy kernel
2013-11-25 12:25:58 +01:00
wernsaar
fe5f46c330
added experimental support for ARMV8
2013-11-24 15:47:00 +01:00
wernsaar
25c6050593
add single and double precision gemv_n kernel for ARMV6
2013-11-24 12:03:28 +01:00
wernsaar
12e02a00e0
added ncopy kernels for ARMV6
2013-11-24 08:46:47 +01:00
wernsaar
29a3196f56
added optimized sgemm and strmm kernel for ARMV6
2013-11-23 18:09:41 +01:00
wernsaar
8776a73773
added optimized dgemm and dtrmm kernel for ARMV6
2013-11-23 16:24:52 +01:00
wernsaar
7e84acd3e8
fixed bug in SAVE macros, that are not found by any test routine
2013-11-23 14:35:19 +01:00
wernsaar
33d3ab6e09
small optimizations for zgemv kernels
2013-11-23 12:35:31 +01:00
wernsaar
9a0f978929
added nrm2 kernel for ARMV6
2013-11-22 17:21:10 +01:00
wernsaar
7f210587f0
renamed some ncopy and tcopy files
2013-11-22 00:20:25 +01:00
wernsaar
9f0a3a35b3
removed obsolete file sdot_vfpv3.S
2013-11-21 23:42:54 +01:00
wernsaar
dbae93110b
added sdot_vfp.S
2013-11-21 23:34:51 +01:00
wernsaar
19cd5c64a2
renamed swap_vfpv3.S to swap_vfp.S
2013-11-21 23:19:32 +01:00
wernsaar
9adf87495e
renamed some dot kernels
2013-11-21 23:07:51 +01:00
wernsaar
440db4cdda
delete rot_vfpv3.S
2013-11-21 22:52:24 +01:00
wernsaar
cd93cae5a7
renamed rot_vfpv3.S to rot_vfp.S
2013-11-21 22:49:28 +01:00
wernsaar
8565afb3c2
renamed asum_vfpv3.S to asum_vfp.S
2013-11-21 22:26:27 +01:00
wernsaar
5bf7cf8d67
renamed scal_vfpv3.S to scal_vfp.S
2013-11-21 22:03:36 +01:00
wernsaar
29a005c635
renamed iamax assembler kernel
2013-11-21 21:12:33 +01:00
wernsaar
f1be3a168a
renamed some BLAS kernels, which are compatible to ARMV6
2013-11-21 20:48:57 +01:00
wernsaar
410afda9b4
added cpu detection and target ARMV6, used in raspberry pi
2013-11-21 20:18:51 +01:00
wernsaar
bf04544902
added gemv_n kernel for single and double precision
2013-11-19 15:07:20 +01:00
wernsaar
86283c0be1
added gemv_t kernel for single and double precision
2013-11-19 09:55:54 +01:00
wernsaar
f27cabfd08
added nrm2 kernel for all precisions
2013-11-16 16:17:17 +01:00
wernsaar
23dd474cd0
added rot kernel for all precisions
2013-11-15 14:08:57 +01:00
wernsaar
f1b452e160
added scal kernel for all precisions
2013-11-15 11:56:43 +01:00
wernsaar
3dabd7e6e6
added swap-kernel for all precisions
2013-11-14 19:06:19 +01:00
wernsaar
6f4a0ebe38
added max- und min-kernels for all precisions
2013-11-14 13:52:47 +01:00
wernsaar
f1db386211
changes for compatibility with Pathscale compiler
2013-11-13 17:59:11 +01:00
wernsaar
6da558d2ab
changes for compatibility with Pathscale compiler
2013-11-13 17:39:13 +01:00
wernsaar
f750103336
small optimizations on dot-kernels
2013-11-11 15:47:56 +01:00
wernsaar
00f33c0134
added asum_kernel for all precisions and complex
2013-11-11 14:20:59 +01:00
wernsaar
5b36cc0f47
added blas level1 dot kernels for complex and double complex
2013-11-08 09:08:11 +01:00
wernsaar
c8f1aeb154
added optimized blas level1 dot kernels for single and double precision
2013-11-07 17:22:03 +01:00
wernsaar
8fa93be06e
added optimized blas level1 copy kernels
2013-11-07 17:18:56 +01:00
wernsaar
1e8128f41c
added cgemm_tcopy_2_vfpv3.S and zgemm_tcopy_2_vfpv3.S
2013-11-07 17:15:50 +01:00
Zhang Xianyi
2f5fdd2000
Refs #314 . Fixed clang compiling bug on OSX.
2013-11-07 08:12:03 +08:00
wernsaar
80a2e901b1
added dgemm_tcopy_4_vfpv3.S and sgemm_tcopy_4_vfpv3.S
2013-11-06 20:01:18 +01:00
wernsaar
ac50bccbd2
added cgemm_ncopy_2_vfpv3.S and made assembler labels unique
2013-11-05 20:21:35 +01:00
wernsaar
82015beaef
added zgemm_ncopy_2_vfpv3.S and made assembler labels unique
2013-11-05 19:31:22 +01:00
wernsaar
6216ab8a7e
removed obsolete gemm_kernels from haswell branch
2013-11-04 08:33:04 +01:00
wernsaar
370e3834a9
added missing file kernel/arm/Makefile
2013-11-03 11:54:39 +01:00
wernsaar
e31186efd4
deleted obsolete dgemm_kernel and dtrmm_kernel
2013-11-02 13:12:21 +01:00
wernsaar
2b801a00a5
small optimizations on sgemm_kernel for ARMV7
2013-11-02 13:06:11 +01:00
wernsaar
b3eab8fcb7
minor optimizations on zgemm_kernel for ARMV7
2013-11-02 09:43:53 +01:00
wernsaar
02bc36ac79
added sgemm_ncopy routine and made some improvements on cgemm_kernel for ARMV7
2013-11-01 18:22:27 +01:00
wernsaar
5118a7f4d1
small optimizations on dgemm_kernel for Piledriver
2013-10-31 11:53:26 +01:00
wernsaar
e172b70ea2
added cgemm_kernel for Piledriver
2013-10-31 08:38:17 +01:00
wernsaar
1cf4b974b2
added zgemm_kernel for Piledriver
2013-10-30 09:12:17 +01:00
wernsaar
7bccff1512
added sgemm_kernel for PILEDRIVER
2013-10-29 22:53:04 +01:00
wernsaar
afe44b0241
tests and code cleanup of gemm_kernels for HASWELL
2013-10-28 14:23:48 +01:00
wernsaar
a77c71eaf5
added highly optimized dgemm_kernel for HASWELL
2013-10-28 10:23:47 +01:00
wernsaar
fe8c5666f9
optimized dgemm_kernel for HASWELL
2013-10-20 16:52:26 +02:00
wernsaar
f6b50057e2
corrected and testet FMA3 Code
2013-10-19 10:52:20 +02:00
wernsaar
2840d56aeb
added dgemm_kernel for Piledriver
2013-10-19 09:47:15 +02:00
wernsaar
85484a42df
added kernels for cgemm, ctrmm, zgemm and ztrmm
2013-10-16 18:00:41 +02:00