Commit Graph

119 Commits

Author SHA1 Message Date
wernsaar a748d3a75d enabled optimized trti2 lapack functions again 2014-05-21 11:02:07 +02:00
wernsaar dbaeea7b59 enabled lauu2 and lauum lapack functions again 2014-05-21 09:49:18 +02:00
wernsaar 4f98f8c9b3 enabled and tested optimized potrf lapack functions 2014-05-18 21:42:37 +02:00
wernsaar 536875d463 enabled and tested optimized getrs lapack functions 2014-05-18 21:13:56 +02:00
wernsaar ac029f81b3 enabled and tested optimized dgetrf function 2014-05-18 19:07:51 +02:00
wernsaar a35a1a9ae7 changed makefiles for lapack development 2014-05-07 11:33:02 +02:00
wernsaar 4be4db590c Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
Zhang Xianyi 5048a80032 Refs #283. Fixed the incorrect usage of long data type for Windows 64. 2013-11-14 13:46:42 +08:00
Zhang Xianyi 73770e60b8 Refs #309. Fixed trtri_U single thread computational bug. 2013-11-07 01:08:39 +08:00
wernsaar 95aedfa0ff added missing file arm/Makefile in lapack/laswp 2013-11-03 11:19:32 +01:00
Zhang Xianyi a07cc39571 Refs #266. Fixed the compiling bug with Open64 5.0. 2013-07-31 14:41:39 +08:00
Zhang Xianyi fd0c388681 Refs #191. A walk around for dtrtri_U single thread bug.
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
2013-07-14 22:16:30 +08:00
Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi 4c2123c334 Fixed the overflowing bug in single thread cholesky factorization. 2013-02-23 13:00:52 +08:00
Zhang Xianyi 7bd1834d59 Refs #130 Fixed laswp building bug with DYNAMIC_ARCH=1. 2012-08-09 20:36:29 +08:00
Zhang Xianyi 1b056c5328 Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf. 2012-08-09 20:06:51 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00