Commit Graph

111 Commits

Author SHA1 Message Date
Andrew bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Andrew ef95cd471f elminate unread variable, after reiteration 3 of them (clang4) 2017-11-25 02:54:37 +01:00
Martin Kroeker db72ad8f6a Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker 514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon 30486a356c Reduce number of data partitions in n. 2017-10-04 12:37:49 -07:00
Tim Moon 9de52b489a Cleaning up and documenting multi-threaded GEMM code. 2017-10-03 16:32:08 -07:00
Tim Moon 860dcfc703 Use 2D thread distribution for small GEMMs.
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Tim Moon 6aaa107865 Reducing threads for multi-threaded GEMMs on small matrices. 2017-09-27 19:25:33 -07:00
Sacha Refshauge 37858d1146 Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Isuru Fernando d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Martin Kroeker 49e62c0e77 fixed syrk_thread.c taken from wernsaar
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
2017-07-06 17:30:12 +02:00
Werner Saar a2672d5589 prepared driver/level3 functions for UNROLL values, that are not a power of two 2017-01-09 10:38:15 +01:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Zhang Xianyi d06b92906a Add gemm3m building for CMake. 2016-02-12 05:02:51 +08:00
Werner Saar b07d733a71 added updates for syrk and syr2k 2016-01-21 13:16:44 +01:00
Zhang Xianyi 055b481386 Fixed CMake bug for single core. 2016-01-15 06:42:54 +08:00
Ralph Campbell fbc21266e6 Minor C code fixes in driver/ 2015-11-09 14:15:49 +05:30
Zhang Xianyi d8392c1245 Fixe cmake config bugs. 2015-10-20 04:30:55 +08:00
Zhang Xianyi f874465bb8 Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Hank Anderson 9eaea02f33 Added additional gemm defines for complex types. 2015-02-25 09:39:11 -06:00
Hank Anderson 0d8e227ea7 Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.

This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson 371071d461 Added CONJ defines for trmm/trsm. 2015-02-21 10:59:02 -06:00
Hank Anderson 8a143516e3 Added alternate_name to a couple of the name mangling schemes.
Added zherk_k sources to driver/level3.
2015-02-20 17:03:33 -06:00
Hank Anderson e5897ecb9b Added zherk_kernel.c objects to driver/level3. 2015-02-19 16:19:56 -06:00
Hank Anderson 4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson e74462a3f5 Moved declarations to start of functions to satisfy MSVC C89 implementation. 2015-02-11 11:16:57 -06:00
Hank Anderson 056ba26755 Changed a number of inline calls to use __inline.
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
2015-02-11 11:13:17 -06:00
Hank Anderson e8c39138c6 Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson 627d5e7401 Added SMP objects to driver/level3. 2015-02-05 12:22:48 -06:00
Hank Anderson 943fa2fb58 Fixed object names in level2. 2015-02-05 10:49:11 -06:00
Hank Anderson 461e691127 Codes when define is absent are now a parameter to AllCombinations.
The level3 object names should now be correct.
2015-02-05 09:23:47 -06:00
Hank Anderson cfaf1c678f Added option to append define codes with an underscore.
Fixed the code array not getting reset on subsequent AllCombinations
calls.
2015-02-05 09:17:18 -06:00
Hank Anderson 0d7bad1f35 Changed GenerateObjects to append combination codes (e.g. dtrmm_TU). 2015-02-05 09:02:54 -06:00
Hank Anderson d11bde60d0 DOUBLE define for DBLAS objects is now set in main CMakeLists.txt.
Since the objects are the same, could generate SINGLE/COMPLEX/etc here
without having to rewrite all the object enumeration code again.
2015-02-02 15:00:44 -06:00
Hank Anderson 5057a4b4df Added openblas add_library call that uses DBLAS_OBJS ojbects. 2015-01-30 15:21:21 -06:00
Hank Anderson d3dcdddf75 Moved functions into util cmake file. 2015-01-30 13:47:40 -06:00
Hank Anderson e5e7595bf9 Added paramater to GenerateObjects for defines that affect all sources. 2015-01-30 13:31:13 -06:00
Hank Anderson 7693887d61 Added empty set to the combinations generated by AllCombinations. 2015-01-30 13:01:11 -06:00
Hank Anderson 8d9b196e0d Moved loop over define combos into a function.
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Hank Anderson a6cf8aafc0 Updated level3/CMakeLists with correct defines using all combos. 2015-01-30 11:21:50 -06:00
Hank Anderson dbdca7bf0c Added first pass at driver/level3 Makefile conversion.
Added a rather convoluted CMake function to find all combinations
of a given list. This will be useful for the object files that are
compiled multiple times with different combinations of preprocessor
definitions.
2015-01-29 22:53:11 -06:00
wernsaar 7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar 3ea4dadd30 optimizations for trsm 2014-07-25 11:59:17 +02:00
wernsaar 1b10ff129a optimizations for trmm 2014-07-25 10:00:23 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar c947ab85dc changed level3.c 2013-12-01 13:46:30 +01:00
wernsaar 2840d56aeb added dgemm_kernel for Piledriver 2013-10-19 09:47:15 +02:00
Zhang Xianyi 77b572fa0b Merge branch 'loongson3a' into develop
Conflicts:
	Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar 6f008abcef replaced defined(DOUBLE) by !defined(XDOUBLE) 2013-07-09 18:17:50 +02:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
Xianyi Zhang 6b01d58712 Disable the optimization of muli-threading gemm on the Loongson3A. 2013-03-30 20:12:43 +00:00
Wang Qian 8163ab7e55 Change the block size on Loongson 3B. 2011-11-23 18:41:49 +00:00
traz 9fe3049de6 Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms. 2011-09-26 15:21:45 +00:00
traz 831858b883 Modify aligned address of sa and sb to improve the performance of multi-threads. 2011-09-23 20:59:48 +00:00
Xianyi Zhang 1b97ec1a7c Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes. 2011-02-26 11:19:54 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00