Commit Graph

34 Commits

Author SHA1 Message Date
Hank Anderson 0553476fba Added TRANS defines for complex sources in lapack. 2015-02-24 14:30:35 -06:00
Hank Anderson 0d8e227ea7 Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.

This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson f3f2b3d768 Added complex and single netlib-lapack fortran sources to lapack.cmake. 2015-02-19 12:26:11 -06:00
Hank Anderson 67e39bd8fb Added mangled complex filenames to interface and lapack CMakeLists.txt. 2015-02-17 13:12:30 -06:00
Hank Anderson 4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson e74462a3f5 Moved declarations to start of functions to satisfy MSVC C89 implementation. 2015-02-11 11:16:57 -06:00
Hank Anderson 056ba26755 Changed a number of inline calls to use __inline.
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
2015-02-11 11:13:17 -06:00
Hank Anderson 3b20b62423 Fixed trti2 name. 2015-02-09 15:29:28 -06:00
Hank Anderson 6ddbfea700 Added generic laswp object. 2015-02-09 15:15:58 -06:00
Hank Anderson e8c39138c6 Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson 13d2d48e67 Added yet another naming scheme for lapack functions. 2015-02-06 13:42:20 -06:00
Hank Anderson 373a1bdadb Converted lapack/Makefile to cmake. 2015-02-04 15:47:10 -06:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar c26bbee489 enabled abd tested optimized trtri lapack functions 2014-05-23 10:55:39 +02:00
wernsaar c4ccb3fbb2 removed lapack/getri because it was never used 2014-05-21 14:21:19 +02:00
wernsaar a748d3a75d enabled optimized trti2 lapack functions again 2014-05-21 11:02:07 +02:00
wernsaar dbaeea7b59 enabled lauu2 and lauum lapack functions again 2014-05-21 09:49:18 +02:00
wernsaar 4f98f8c9b3 enabled and tested optimized potrf lapack functions 2014-05-18 21:42:37 +02:00
wernsaar 536875d463 enabled and tested optimized getrs lapack functions 2014-05-18 21:13:56 +02:00
wernsaar ac029f81b3 enabled and tested optimized dgetrf function 2014-05-18 19:07:51 +02:00
wernsaar a35a1a9ae7 changed makefiles for lapack development 2014-05-07 11:33:02 +02:00
wernsaar 4be4db590c Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
Zhang Xianyi 5048a80032 Refs #283. Fixed the incorrect usage of long data type for Windows 64. 2013-11-14 13:46:42 +08:00
Zhang Xianyi 73770e60b8 Refs #309. Fixed trtri_U single thread computational bug. 2013-11-07 01:08:39 +08:00
wernsaar 95aedfa0ff added missing file arm/Makefile in lapack/laswp 2013-11-03 11:19:32 +01:00
Zhang Xianyi a07cc39571 Refs #266. Fixed the compiling bug with Open64 5.0. 2013-07-31 14:41:39 +08:00
Zhang Xianyi fd0c388681 Refs #191. A walk around for dtrtri_U single thread bug.
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
2013-07-14 22:16:30 +08:00
Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi 4c2123c334 Fixed the overflowing bug in single thread cholesky factorization. 2013-02-23 13:00:52 +08:00
Zhang Xianyi 7bd1834d59 Refs #130 Fixed laswp building bug with DYNAMIC_ARCH=1. 2012-08-09 20:36:29 +08:00
Zhang Xianyi 1b056c5328 Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf. 2012-08-09 20:06:51 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00