Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Andrew
ef95cd471f
elminate unread variable, after reiteration 3 of them (clang4)
2017-11-25 02:54:37 +01:00
Martin Kroeker
db72ad8f6a
Merge pull request #1320 from timmoon10/develop
...
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
...
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon
30486a356c
Reduce number of data partitions in n.
2017-10-04 12:37:49 -07:00
Tim Moon
9de52b489a
Cleaning up and documenting multi-threaded GEMM code.
2017-10-03 16:32:08 -07:00
Tim Moon
860dcfc703
Use 2D thread distribution for small GEMMs.
...
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Tim Moon
6aaa107865
Reducing threads for multi-threaded GEMMs on small matrices.
2017-09-27 19:25:33 -07:00
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
2017-08-19 15:07:42 +10:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
49e62c0e77
fixed syrk_thread.c taken from wernsaar
...
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
2017-07-06 17:30:12 +02:00
Werner Saar
a2672d5589
prepared driver/level3 functions for UNROLL values, that are not a power of two
2017-01-09 10:38:15 +01:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Zhang Xianyi
d06b92906a
Add gemm3m building for CMake.
2016-02-12 05:02:51 +08:00
Werner Saar
b07d733a71
added updates for syrk and syr2k
2016-01-21 13:16:44 +01:00
Zhang Xianyi
055b481386
Fixed CMake bug for single core.
2016-01-15 06:42:54 +08:00
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
2015-10-20 04:30:55 +08:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Hank Anderson
9eaea02f33
Added additional gemm defines for complex types.
2015-02-25 09:39:11 -06:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
371071d461
Added CONJ defines for trmm/trsm.
2015-02-21 10:59:02 -06:00
Hank Anderson
8a143516e3
Added alternate_name to a couple of the name mangling schemes.
...
Added zherk_k sources to driver/level3.
2015-02-20 17:03:33 -06:00
Hank Anderson
e5897ecb9b
Added zherk_kernel.c objects to driver/level3.
2015-02-19 16:19:56 -06:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
2015-02-11 11:16:57 -06:00
Hank Anderson
056ba26755
Changed a number of inline calls to use __inline.
...
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
2015-02-11 11:13:17 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
627d5e7401
Added SMP objects to driver/level3.
2015-02-05 12:22:48 -06:00
Hank Anderson
943fa2fb58
Fixed object names in level2.
2015-02-05 10:49:11 -06:00
Hank Anderson
461e691127
Codes when define is absent are now a parameter to AllCombinations.
...
The level3 object names should now be correct.
2015-02-05 09:23:47 -06:00
Hank Anderson
cfaf1c678f
Added option to append define codes with an underscore.
...
Fixed the code array not getting reset on subsequent AllCombinations
calls.
2015-02-05 09:17:18 -06:00
Hank Anderson
0d7bad1f35
Changed GenerateObjects to append combination codes (e.g. dtrmm_TU).
2015-02-05 09:02:54 -06:00
Hank Anderson
d11bde60d0
DOUBLE define for DBLAS objects is now set in main CMakeLists.txt.
...
Since the objects are the same, could generate SINGLE/COMPLEX/etc here
without having to rewrite all the object enumeration code again.
2015-02-02 15:00:44 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00
Hank Anderson
d3dcdddf75
Moved functions into util cmake file.
2015-01-30 13:47:40 -06:00
Hank Anderson
e5e7595bf9
Added paramater to GenerateObjects for defines that affect all sources.
2015-01-30 13:31:13 -06:00
Hank Anderson
7693887d61
Added empty set to the combinations generated by AllCombinations.
2015-01-30 13:01:11 -06:00
Hank Anderson
8d9b196e0d
Moved loop over define combos into a function.
...
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Hank Anderson
a6cf8aafc0
Updated level3/CMakeLists with correct defines using all combos.
2015-01-30 11:21:50 -06:00
Hank Anderson
dbdca7bf0c
Added first pass at driver/level3 Makefile conversion.
...
Added a rather convoluted CMake function to find all combinations
of a given list. This will be useful for the object files that are
compiled multiple times with different combinations of preprocessor
definitions.
2015-01-29 22:53:11 -06:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
1d33547222
optimized zgemm kernel for haswell
2014-07-27 11:51:42 +02:00
wernsaar
3ea4dadd30
optimizations for trsm
2014-07-25 11:59:17 +02:00
wernsaar
1b10ff129a
optimizations for trmm
2014-07-25 10:00:23 +02:00
wernsaar
125610d23b
allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk
2014-07-24 18:43:31 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
c947ab85dc
changed level3.c
2013-12-01 13:46:30 +01:00
wernsaar
2840d56aeb
added dgemm_kernel for Piledriver
2013-10-19 09:47:15 +02:00
Zhang Xianyi
77b572fa0b
Merge branch 'loongson3a' into develop
...
Conflicts:
Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
...
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
2013-07-09 18:17:50 +02:00
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
...
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
2013-06-08 09:40:17 +02:00
Xianyi Zhang
6b01d58712
Disable the optimization of muli-threading gemm on the Loongson3A.
2013-03-30 20:12:43 +00:00
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
2011-11-23 18:41:49 +00:00
traz
9fe3049de6
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.
2011-09-26 15:21:45 +00:00
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
2011-09-23 20:59:48 +00:00
Xianyi Zhang
1b97ec1a7c
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
2011-02-26 11:19:54 +08:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00