Commit Graph

725 Commits

Author SHA1 Message Date
Werner Saar
f9f127d838 added optimized ddot kernel for steamroller 2015-04-10 16:18:03 +02:00
wernsaar
62231ab337 Merge pull request #538 from wernsaar/develop
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7 updated cdot and zdot 2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a add optimized cdot- and zdot-kernel for sandybridge 2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc add optimized cdot- and zdot-kernel for haswell 2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8 updated cdot and zdot for piledriver 2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972 added optimized cdot- and zdot-kernel for steamroller 2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6 added optimized cdot- and zdot-kernels for bulldozer 2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel. 2015-04-08 03:55:49 +08:00
Werner Saar
60c6dec6e6 updated some lines for bulldozer 2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35 added optimized saxpy- and daxpy-kernel for sandybridge 2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287 added optimized saxpy- and daxpy-kernel for haswell 2015-04-06 12:33:16 +02:00
Werner Saar
a901b065d3 added optimized ddot-kernel for sandybridge 2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0 add optimized sdot-kernel for sandybridge 2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5 removed double definition line 2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556 added optimized sdot- and ddot-kernel for HASWELL 2015-04-05 17:57:53 +02:00
Zhang Xianyi
e5b96e55a7 Fix build bug for ARM64. 2015-03-24 15:27:17 -05:00
Hank Anderson
84d90d6ed8 Fixed some compiler errors/warnings for clang. 2015-02-25 11:52:25 -06:00
Hank Anderson
518e2424a8 Fixed bad filename for cpuid.S compile. 2015-02-25 11:51:29 -06:00
Zhang Xianyi
ea7f9dacf4 Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1. 2015-02-26 01:47:11 +08:00
Hank Anderson
0d8e227ea7 Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.

This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
12d1fb2e40 Fixed incorrect object name in kernel CMakeLists.txt 2015-02-24 10:30:16 -06:00
Hank Anderson
1b7f427401 Added conj gemv objects for complex build. 2015-02-23 10:24:31 -06:00
Hank Anderson
b2284647a3 More complex objects. 2015-02-23 07:51:05 -06:00
Hank Anderson
a6116e5859 Added some more complex-only objects. 2015-02-22 17:49:28 -06:00
Hank Anderson
714638c187 Added some TRMM objects for complex types. 2015-02-19 16:11:51 -06:00
Hank Anderson
e27c372e53 Fixed reuse of float_char from parent loop.
Fixed in/it/on/otcopy names.
2015-02-19 13:53:29 -06:00
Hank Anderson
f3f2b3d768 Added complex and single netlib-lapack fortran sources to lapack.cmake. 2015-02-19 12:26:11 -06:00
Hank Anderson
9492298048 Added other float types to Makefile.L3. 2015-02-18 13:01:05 -06:00
Hank Anderson
14fd3d35de Added checks for missing defines in kernel. 2015-02-18 10:25:01 -06:00
Hank Anderson
cebc07cebd ParseMakefileVars now recursively parses included makefiles. 2015-02-17 22:09:41 -06:00
Hank Anderson
33c5e8db7f Added a helper function for setting the L1 kernel defaults.
Added loop to build objects with different KERNEL defines.
2015-02-17 21:36:23 -06:00
Martin Koehler
39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
Hank Anderson
4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
162791e30e Added common objects from kernel Makefile. 2015-02-10 12:42:05 -06:00
Hank Anderson
c0624a26be Fixed some dgemm_copy function names. 2015-02-09 14:34:29 -06:00
Hank Anderson
4bfaf1ce66 Removed some list appends I missed. 2015-02-09 12:56:55 -06:00
Hank Anderson
e8c39138c6 Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
f992799226 Added the rest of Makefile.L3. 2015-02-09 10:47:35 -06:00
Hank Anderson
4c65afcce1 Changed kernel filenames to vars. These will need to be read from KERNEL.
Added some kernel/L3 objects.
2015-02-09 09:52:14 -06:00
Hank Anderson
7fa5c4e2fd Fixed some case issues with ARCH.
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Hank Anderson
fa0e6a6c93 Added the rest of the L1 kernel makefile. 2015-02-07 21:37:46 -06:00
Hank Anderson
38681fb1c6 Added more kernel files. 2015-02-07 12:54:30 -06:00
Hank Anderson
189fadfde0 Started implementing kernel/Makefile in cmake. 2015-02-05 21:05:11 -06:00
Zhang Xianyi
229ce2ccd1 Add cortex-a9 and cortex-a15 targets. 2015-01-12 08:55:29 +00:00
Zhang Xianyi
41aad0407f Merge pull request #482 from jeromerobert/develop
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643 added optimizations for steamroller 2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3 Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3 Ref #458: Backport, sandybrigde uses nehalem zgemm kernel 2014-12-22 17:01:18 +01:00