powderluv
|
ebb9eba987
|
Fix build with ALLOC_SHM=0 (Android NDK)
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
|
2015-05-10 00:10:26 -07:00 |
Zhang Xianyi
|
8e5a1083bb
|
Refs #532. Improve gemv paralel with small m and large n case.
Splite the matrix and reduction.
|
2015-05-08 05:33:17 +08:00 |
Zhang Xianyi
|
6743beb748
|
Refs #565. Fix the bug of generate FEXTRALIB.
|
2015-05-07 13:06:53 +08:00 |
Zhang Xianyi
|
bcabf72c08
|
Refs #565. Merge branch 'andreasnoack-anj/bench' into develop
|
2015-05-07 12:52:14 +08:00 |
Andreas Noack
|
cda29f183b
|
Add vecLib benchmarks
|
2015-05-06 21:52:34 -04:00 |
wernsaar
|
e52d36450a
|
Merge pull request #564 from wernsaar/develop
Use only 1 thread in trsm if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
|
2015-05-06 11:10:31 +02:00 |
Werner Saar
|
f8f2e261fe
|
use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
|
2015-05-06 10:41:53 +02:00 |
Werner Saar
|
be3c843700
|
added loops to trsm.c
|
2015-05-06 09:21:19 +02:00 |
wernsaar
|
e6f57db846
|
Merge pull request #563 from wernsaar/develop
Bugfix for gemm3m tests
|
2015-05-05 12:13:35 +02:00 |
Werner Saar
|
9bfd267d51
|
bugfix for gemm3m tests
|
2015-05-05 11:58:59 +02:00 |
Werner Saar
|
924bc5372e
|
removed gemm3m functions from normal checks
|
2015-05-05 11:39:43 +02:00 |
wernsaar
|
2b83a69650
|
Merge pull request #561 from wernsaar/develop
updated dgemv_n sgemv_n kernels
|
2015-05-04 11:11:13 +02:00 |
Werner Saar
|
133c11a156
|
updated dgemv_n kernel for nehalem
|
2015-04-30 14:38:06 +02:00 |
Werner Saar
|
30f52d53df
|
optimized dgemv_n kernel for haswell
|
2015-04-30 12:11:39 +02:00 |
Zhang Xianyi
|
a124637329
|
Merge pull request #560 from sebastien-villemot/develop
Fix detection of ARM architectures in c_check.
|
2015-04-29 11:36:47 -05:00 |
Sébastien Villemot
|
642aaba2e0
|
Fix detection of ARM architectures in c_check.
This is necessary to avoid the false detection of a cross-compiling environment.
|
2015-04-29 18:14:21 +02:00 |
wernsaar
|
4c616173e4
|
Merge pull request #558 from wernsaar/develop
optimizations for sandybridge
|
2015-04-28 17:30:16 +02:00 |
Werner Saar
|
5e83d80725
|
optimized dger kernel for sandybridge
|
2015-04-28 16:58:11 +02:00 |
Werner Saar
|
b2e1797dc6
|
added optimized sger kernel for sandybridge
|
2015-04-28 15:33:38 +02:00 |
Werner Saar
|
e216f686cb
|
optimized saxpy and daxpy for sandybridge
|
2015-04-28 10:18:32 +02:00 |
Zhang Xianyi
|
e42652f772
|
Merge pull request #554 from wernsaar/develop
added benchmarks for zgeru and cgeru
|
2015-04-25 08:11:36 -05:00 |
Werner Saar
|
e77db2af31
|
add benchmarks for zgeru and cgeru
|
2015-04-25 14:53:07 +02:00 |
Zhang Xianyi
|
37b00841ac
|
Merge pull request #552 from jeromerobert/develop
gemv: Ensure stack buffer is large enough to handle memory alignment
|
2015-04-24 14:12:12 -05:00 |
Werner Saar
|
fc0e0391f3
|
bugfixes: replaced int with BLASLONG
|
2015-04-24 14:30:44 +02:00 |
wernsaar
|
da0f27b9ac
|
Merge pull request #553 from wernsaar/develop
optimized some blas level1 kernels for increments != 1
|
2015-04-24 13:57:48 +02:00 |
Werner Saar
|
c22068c406
|
optimized sdot.c for increments != 1
|
2015-04-24 13:13:20 +02:00 |
Werner Saar
|
dee100d0e4
|
optimized saxpy.c for increments != 1
|
2015-04-24 11:52:59 +02:00 |
Werner Saar
|
0273966abb
|
optimized daxpy kernel for increments != 1
|
2015-04-24 11:39:17 +02:00 |
Werner Saar
|
3a67daa954
|
optimized ddot.c for increments != 1
|
2015-04-24 10:56:55 +02:00 |
Jerome Robert
|
ab567d8443
|
gemv: Ensure stack buffer is large enough to handle memory alignment
Ref #478
|
2015-04-24 10:12:49 +02:00 |
wernsaar
|
3c09cea4b2
|
Merge pull request #550 from wernsaar/develop
added optimized ssymv kernels for haswell and sandybridge
|
2015-04-23 13:27:38 +02:00 |
Werner Saar
|
b4f2153dcd
|
added optimized ssymv kernels for sandybridge
|
2015-04-23 12:19:24 +02:00 |
Werner Saar
|
1c4b0eeae3
|
added optimized ssymv kernels for haswell
|
2015-04-23 10:23:13 +02:00 |
wernsaar
|
406d9d64e9
|
Merge pull request #549 from wernsaar/develop
added optimized dsymv kernels for haswell and sandybridge
|
2015-04-22 12:36:13 +02:00 |
Werner Saar
|
1bec9abb9a
|
added optimized dsymv kernels for sandybridge
|
2015-04-22 12:09:43 +02:00 |
Werner Saar
|
3814bf60d3
|
added optimized dsymv kernels for haswell
|
2015-04-22 10:42:50 +02:00 |
Zhang Xianyi
|
847e19c04e
|
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
|
2015-04-20 23:22:40 -05:00 |
Werner Saar
|
46c7b4d5c8
|
added asum benchmark
|
2015-04-19 11:24:07 +02:00 |
Werner Saar
|
8e05d291b5
|
added scal benchmark
|
2015-04-18 08:41:41 +02:00 |
wernsaar
|
9da555e5f7
|
Merge pull request #546 from wernsaar/develop
added optimized zaxpy-kernels
|
2015-04-16 11:36:51 +02:00 |
Werner Saar
|
6d0db0151f
|
added optimized zaxpy-kernels
|
2015-04-16 11:19:37 +02:00 |
Zhang Xianyi
|
37b9033c90
|
Merge pull request #543 from jeromerobert/develop
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
|
2015-04-15 11:18:14 -05:00 |
wernsaar
|
59e7a518c6
|
Merge pull request #544 from wernsaar/develop
Optimized caxpy-kernels
|
2015-04-15 17:04:02 +02:00 |
Werner Saar
|
13889515b3
|
added optimized caxpy-kernel for sandybridge
|
2015-04-15 16:29:25 +02:00 |
Werner Saar
|
248c9340c3
|
added optimized caxpy-kernel for haswell
|
2015-04-15 15:16:31 +02:00 |
Werner Saar
|
e9f33b4ca7
|
added optimized caxpy-kernel for steamroller
|
2015-04-15 13:49:23 +02:00 |
Werner Saar
|
f5d847122a
|
updated caxpy_microk_bulldozer-2.c and caxpy.c
|
2015-04-15 11:59:38 +02:00 |
Jerome Robert
|
a4c96eca67
|
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Refs #478, #482, 9798481 , fd9fd42
|
2015-04-15 11:46:48 +02:00 |
wernsaar
|
fb02cb0a41
|
Merge pull request #540 from wernsaar/develop
Optimized dot- and axpy-kernels
|
2015-04-14 15:53:09 +02:00 |
Werner Saar
|
baa0363ea2
|
add optimized ddot-kernel for piledriver
|
2015-04-14 15:09:13 +02:00 |