wernsaar
5400a9f4e4
redefined functions for TIMING and YIELDING for ARMV7 processor
2013-11-03 10:34:04 +01:00
Zhang Xianyi
77b572fa0b
Merge branch 'loongson3a' into develop
...
Conflicts:
Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
...
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
2013-07-09 18:17:50 +02:00
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
...
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
2013-06-08 09:40:17 +02:00
Xianyi Zhang
6b01d58712
Disable the optimization of muli-threading gemm on the Loongson3A.
2013-03-30 20:12:43 +00:00
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
2011-11-23 18:41:49 +00:00
traz
9fe3049de6
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.
2011-09-26 15:21:45 +00:00
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
2011-09-23 20:59:48 +00:00
Xianyi Zhang
1b97ec1a7c
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
2011-02-26 11:19:54 +08:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00