Commit Graph

78 Commits

Author SHA1 Message Date
Werner Saar c5b1fbcb2e updated optimized cgemm- and ctrmm-kernel for POWER8 2016-04-04 09:12:08 +02:00
Werner Saar 6a9bbfc227 updated sgemm- and strmm-kernel for POWER8 2016-04-02 17:16:36 +02:00
Werner Saar e1df5a6e23 fixed sgemm- and strmm-kernel 2016-03-18 12:12:03 +01:00
Werner Saar 5c658f8746 add optimized cgemm- and ctrmm-kernel for POWER8 2016-03-18 08:17:25 +01:00
Werner Saar 96284ab295 added sgemm- and strmm-kernel for POWER8 2016-03-14 13:52:44 +01:00
Werner Saar 91e1c5080c modified configuration, to use power6 sgemm kernel for power8 2016-03-04 13:38:57 +01:00
Werner Saar b752858d6c added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8 2016-03-01 07:33:56 +01:00
Zhang Xianyi 3e8d6ea74f Init POWER8 kernels by POWER6. 2015-11-03 12:34:23 +08:00
Werner Saar b07d733a71 added updates for syrk and syr2k 2016-01-21 13:16:44 +01:00
Ashwin Sekhar T K 39937d15cd Change BUFFER_SIZE for Cortex A57 to 20 MB
Change the GEMM_P, GEMM_Q, GEMM_R values for Cortex A57
2015-11-20 01:12:04 +05:30
Ashwin Sekhar T K 1397b47197 Optimized zgemm kernel for CORTEXA57 2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K 45f78963ac Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K 402443bf9c Optimized dgemm kernel for CORTEXA57 2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K f2f8a0fe8b Adding arm64 target CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Werner Saar 9bd962f655 modified haswell parameter dgemm_unroll_n 2015-06-13 10:28:27 +02:00
Zhang Xianyi 51ff17d46e Add AMD Excavator target. 2015-05-13 16:16:30 -05:00
Zhang Xianyi 229ce2ccd1 Add cortex-a9 and cortex-a15 targets. 2015-01-12 08:55:29 +00:00
Werner Saar ddf983d643 added optimizations for steamroller 2014-12-30 20:14:45 +08:00
Werner Saar 4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Werner Saar 587e16fba3 Ref #458: Backport, sandybrigde uses nehalem zgemm kernel 2014-12-22 17:01:18 +01:00
Zhang Xianyi 2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Zhang Xianyi a85c2785ae Refs #467. Added generic kernel file for x86_64. 2014-11-24 15:34:48 +08:00
Benedikt Huber 58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar 9d7057366d bugfix for GEMM3M functions 2014-09-21 11:41:43 +02:00
wernsaar 7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar 5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar 1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
Zhang Xianyi 99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar d83373db61 added parameter for gemm3m kernels 2014-06-25 10:40:25 +02:00
wernsaar 43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar 5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar 2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar 47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar aae75b2461 modified param.h 2013-12-01 18:43:24 +01:00
wernsaar b3254eecaf Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
wernsaar ecbc85b954 modified param.h 2013-12-01 17:54:53 +01:00
wernsaar afe44b0241 tests and code cleanup of gemm_kernels for HASWELL 2013-10-28 14:23:48 +01:00
wernsaar a77c71eaf5 added highly optimized dgemm_kernel for HASWELL 2013-10-28 10:23:47 +01:00
wernsaar fe8c5666f9 optimized dgemm_kernel for HASWELL 2013-10-20 16:52:26 +02:00
Zhang Xianyi 2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi 6e8501c8a1 Fixed #239 bug in param.h about BARCELONA and BULLDOZER. 2013-06-29 10:36:01 +08:00
wernsaar f67fa62851 added dgemv_n_bulldozer.S 2013-06-15 16:42:37 +02:00
wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 2013-06-09 15:57:42 +02:00
wernsaar ba800f0883 correct GEMM_THREAD in param.h 2013-06-08 10:03:59 +02:00