Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5
removed double definition line
2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
2015-04-05 17:57:53 +02:00
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
2015-03-24 15:27:17 -05:00
Zhang Xianyi
a3491e1e88
Update the doc for 0.2.14.
2015-03-24 15:05:59 -05:00
Zhang Xianyi
e81a5d61e4
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop
2015-03-24 12:17:12 -05:00
Zhang Xianyi
c674fa32be
Add ARM targets.
2015-03-24 12:17:04 -05:00
Zhang Xianyi
e34911a73d
Fix compiling bug for ARM with setting BINARY.
2015-03-24 17:15:33 +00:00
Zhang Xianyi
76dcaf2281
Merge pull request #521 from maxlevesque/patch-1
...
Correct typo /proc/ instead of /pros/
2015-03-21 12:26:35 -05:00
Maximilien Levesque
770fac92eb
Correct typo /proc/ instead of /pros/
2015-03-20 23:25:11 +01:00
Zhang Xianyi
e95d64333a
Refs #519 . Avoid calling strncpy.
2015-03-19 15:57:22 -05:00
Zhang Xianyi
75c40bcc48
Refs #520 . Fixed ONLY_CBLAS=1 compiling bug on OSX.
2015-03-19 11:52:09 -05:00
Zhang Xianyi
b62f9f4120
Merge pull request #518 from ton/issue-508
...
Fix issue #508
2015-03-18 13:00:07 -05:00
Ton van den Heuvel
b6438dedea
Fix issue #508
...
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Zhang Xianyi
cdefdb21cd
Refs #492 . Fixed c/zsyr bug with negative incx.
2015-02-26 06:37:03 +08:00
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
2015-02-26 01:47:11 +08:00
Zhang Xianyi
bf5dbb7e2a
Refs#509. Merge branch 'grisuthedragon-develop' into develop
2015-02-26 01:44:19 +08:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Zhang Xianyi
771b18ae9c
Detect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.
2015-02-08 01:42:48 -06:00
Zhang Xianyi
cfa9392ffa
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
2015-02-08 01:30:23 -06:00
Zhang Xianyi
1ccd57ce80
Merge pull request #497 from eschnett/develop
...
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 23:09:38 -06:00
Erik Schnetter
65a847cd36
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 12:23:41 -05:00
Zhang Xianyi
07ff001981
Merge pull request #495 from jeromerobert/develop
...
Fix a segfault in gemv when MAX_STACK_ALLOC is set
2015-01-29 18:23:50 +08:00
Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
...
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Zhang Xianyi
63c6fcfa0a
Merge pull request #490 from eschnett/develop
...
Move #include statements outside extern "C" blocks
2015-01-13 15:43:56 +08:00
Erik Schnetter
29cb47fc06
Move #include statements outside extern "C" blocks
2015-01-12 21:27:52 -05:00
Zhang Xianyi
4e6c4046f7
Fix cortex-a15 detecting bug.
2015-01-12 09:35:16 +00:00
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
2015-01-12 08:55:29 +00:00
Zhang Xianyi
ef75be0e51
Merge pull request #487 from kortschak/dromtg-test
...
Add test for drotmg bug fixed by 692b14c
2015-01-07 14:13:11 +08:00
kortschak
5344f335a8
Add test for drotmg bug fixed by 692b14c
...
Test requested in issue xianyi/OpenBLAS#484 .
Run tests by applying the following change and then make:
diff --git a/Makefile.rule b/Makefile.rule
index bea1fe1..9852ff3 100644
--- a/Makefile.rule
+++ b/Makefile.rule
@@ -140,7 +140,7 @@ NO_AFFINITY = 1
-# UTEST_CHECK = 1
+UTEST_CHECK = 1
2015-01-07 10:06:55 +10:30
Zhang Xianyi
5cb5af9333
Add configuration options.
2015-01-02 02:42:32 +08:00
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
...
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Zhang Xianyi
f8f2e84659
Merge pull request #486 from wernsaar/develop
...
Optimizations for steamroller
2014-12-31 02:36:23 +08:00
Werner Saar
34633fef01
Merge branch 'develop' of github.com:wernsaar/OpenBLAS into develop
2014-12-30 20:16:53 +08:00
Werner Saar
ddf983d643
added optimizations for steamroller
2014-12-30 20:14:45 +08:00
Zhang Xianyi
17b9db20f1
Merge pull request #483 from wernsaar/develop
...
added Steamroller as a cpu target
2014-12-29 12:00:16 +08:00
Werner Saar
0dc559ed30
bugfix in dynamic.c
2014-12-28 17:15:42 +01:00
Werner Saar
9566f5fdb0
added Steamroller as a target processor
2014-12-28 13:45:19 +01:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
Zhang Xianyi
cbb3ab80e7
Merge pull request #481 from eschnett/develop
...
Correct ilaver C declaration
2014-12-26 10:09:19 +08:00
Erik Schnetter
cd9868b1b4
Correct ilaver C declaration
2014-12-25 17:41:17 -05:00
Zhang Xianyi
eb738148fe
Merge pull request #479 from wernsaar/develop
...
workaround for sandybridge zgemm kernel
2014-12-23 00:59:41 +08:00
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
2014-12-22 17:01:18 +01:00
Werner Saar
4de7b9ae47
increased NMAX to 128
2014-12-22 14:04:27 +01:00
Werner Saar
887aed634d
modified sources for OS Darwin
2014-12-19 12:40:46 +01:00
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
2014-12-18 20:35:51 +01:00
Werner Saar
1e566223ed
added code for the size of n
2014-12-17 15:02:11 +01:00
Werner Saar
113b48ca22
modified makefile for acml6.1
2014-12-17 14:12:21 +01:00