Andrew
|
4938faa822
|
core.IdenticalExpr clang501 checker
|
2018-01-19 23:15:58 +01:00 |
Martin Kroeker
|
42285d8e70
|
Merge pull request #1410 from brada4/develop
Address warnings #1357
|
2018-01-06 20:02:46 +01:00 |
Andrew
|
4d0b005e5b
|
Eliminate remaining unused results in kernels (clang5 analyzer)
|
2018-01-01 20:54:39 +01:00 |
Martin Kroeker
|
b81656936f
|
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
|
2017-12-31 20:18:48 +01:00 |
Martin Kroeker
|
b973990df2
|
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
|
2017-12-31 18:03:36 +01:00 |
Martin Kroeker
|
1e31124eb0
|
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
|
2017-12-30 14:52:03 +01:00 |
Martin Kroeker
|
723f396a20
|
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
|
2017-12-29 23:56:41 +01:00 |
Martin Kroeker
|
43c0622e7b
|
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
|
2017-12-13 18:40:39 +01:00 |
Martin Kroeker
|
0623636c98
|
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
|
2017-12-10 19:24:31 +01:00 |
Andrew
|
281a2b952f
|
warning cleanup (#1380)
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
|
2017-12-05 19:54:10 +01:00 |
Martin Kroeker
|
6c77b5f267
|
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
|
2017-11-28 18:15:31 +01:00 |
Martin Kroeker
|
c92cd6d162
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:05:27 +01:00 |
Martin Kroeker
|
cae5d9a20b
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:04:29 +01:00 |
Martin Kroeker
|
3d891c3106
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:03:40 +01:00 |
Martin Kroeker
|
4fbdcfa823
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:02:28 +01:00 |
Martin Kroeker
|
1bb6a96ebc
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:01:42 +01:00 |
Martin Kroeker
|
6bd163f37a
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 20:00:23 +01:00 |
Martin Kroeker
|
f0333333d1
|
Add trivially optimized dsdot based on sdot
|
2017-11-24 19:59:28 +01:00 |
Andrew
|
e89b979b2c
|
fix spurious compiler warning fix (no code change)
|
2017-11-24 18:39:04 +01:00 |
Andrew
|
7e9b29b9b8
|
fix spurious compiler warning (no code change)
|
2017-11-24 18:36:37 +01:00 |
Martin Kroeker
|
6157d0902a
|
Merge pull request #1358 from martin-frbg/unused_vars
Clean up spurious unused variables in the kernels
|
2017-11-15 11:31:43 +01:00 |
Martin Kroeker
|
3fea849bbf
|
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm
|
2017-11-14 23:35:10 +01:00 |
Martin Kroeker
|
8f177621bc
|
Remove unused variables at0...at3 from ?symv_U
|
2017-11-14 23:32:25 +01:00 |
Martin Kroeker
|
5f402b7759
|
Remove unused (loop?) variable j from the gemv_n_4 implementations
|
2017-11-14 23:29:42 +01:00 |
Martin Kroeker
|
a07807caac
|
Eliminate loop code when called as/from dsdot
|
2017-10-25 16:45:41 +02:00 |
Martin Kroeker
|
5e3e91d0fc
|
Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision
|
2017-10-22 18:18:51 +02:00 |
Martin Kroeker
|
28c3fa8950
|
Add dsdot
|
2017-10-16 23:29:03 +02:00 |
Martin Kroeker
|
8ac87c1cb6
|
Implement DSDOT with unchanged sdot microkernels
|
2017-10-16 23:27:51 +02:00 |
Isuru Fernando
|
505b218829
|
Merge remote-tracking branch 'upstream/develop' into dyn
|
2017-08-06 19:07:00 +05:30 |
Isuru Fernando
|
1d1854032b
|
Add missing EXCAVATOR
|
2017-08-02 19:03:04 +05:30 |
Isuru Fernando
|
2c51a990ac
|
Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
|
2017-08-02 18:26:57 +05:30 |
Isuru Fernando
|
ca17b4b75c
|
Fix complex support for MSVC headers
|
2017-07-28 11:50:29 +05:30 |
Denis Steckelmacher
|
c9ff735da6
|
Add ZEN support (tested for auto-detected static backend)
|
2017-03-19 15:32:50 +01:00 |
Martin Kroeker
|
a6efabf155
|
Replace gnu _real_ , _imag_ extensions in initializers
|
2017-03-13 00:38:37 +01:00 |
Martin Kroeker
|
dc34a0da96
|
Merge pull request #915 from mdong/small_fix_for_icc
remove input from clobbered list
|
2017-02-23 20:00:22 +01:00 |
Martin Kroeker
|
4998e19869
|
Change file comments to work around clang 3.9 assembler bug
|
2016-10-13 16:51:08 +02:00 |
Martin Kroeker
|
16446d1d23
|
Remove explicit include of complex.h
|
2016-09-29 23:45:56 +02:00 |
mdong
|
098d8ec5d6
|
remove input from clobbered list
|
2016-06-24 16:37:58 -04:00 |
Werner Saar
|
298b13bba4
|
updated some kernel files for EXCAVATOR
|
2016-04-25 10:36:23 +02:00 |
Zhang Xianyi
|
f24d5307cf
|
Refs #834. Fix zgemv config bug on Steamroller.
|
2016-04-12 22:26:11 +08:00 |
Zhang Xianyi
|
d4380c1fe4
|
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
|
2016-04-07 01:44:18 +08:00 |
Werner Saar
|
faa5e2e5e3
|
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
|
2016-03-10 11:10:38 +01:00 |
Werner Saar
|
fdf291be30
|
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
|
2016-03-10 09:42:07 +01:00 |
Werner Saar
|
c99cc41cbd
|
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
|
2016-03-09 14:02:03 +01:00 |
Werner Saar
|
acdff55a6a
|
Bugfix for ztrmv
|
2016-03-07 09:39:34 +01:00 |
Zhang Xianyi
|
7d6b68eb4a
|
Refs #786. Revert to default assembly kernel.
|
2016-03-07 11:34:58 +08:00 |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
2016-03-05 08:32:03 +08:00 |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
2016-03-01 11:18:56 +08:00 |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
2016-02-19 17:56:07 -05:00 |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
2016-02-02 09:15:02 +08:00 |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
2016-01-28 04:38:07 +08:00 |
Werner Saar
|
c8f2c5d636
|
added optimized trsm_kernels
|
2016-01-05 13:05:05 +01:00 |
Zhang Xianyi
|
69363622a8
|
Fix DYNAMIC_ARCH=1 bug.
|
2015-10-27 05:10:40 +08:00 |
Zhang Xianyi
|
f874465bb8
|
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
|
2015-08-10 14:10:44 -05:00 |
Zhang Xianyi
|
ab0a0a75fc
|
Merge branch 'develop' into cmake
|
2015-08-03 23:59:01 -05:00 |
Zhang Xianyi
|
1cf2b10224
|
Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
|
2015-08-03 23:55:56 -05:00 |
Zhang Xianyi
|
7ac7e147d4
|
Fixed cmake building bugs on Linux. Disable LAPACK by default.
|
2015-08-04 04:37:05 +08:00 |
Werner Saar
|
e7c969e164
|
added optimized dtrmm_kernel for haswell
|
2015-06-13 16:16:29 +02:00 |
Werner Saar
|
9bd962f655
|
modified haswell parameter dgemm_unroll_n
|
2015-06-13 10:28:27 +02:00 |
Werner Saar
|
24f58c8bb1
|
added optimized cscal and zscal kernels for steamroller
|
2015-05-18 12:40:07 +02:00 |
Werner Saar
|
95b1faf667
|
added optimized cscal and zscal kernels for steamroller and piledriver
|
2015-05-18 10:50:57 +02:00 |
Werner Saar
|
2d9e406050
|
added optimized cscal kernel for sandybridge
|
2015-05-18 08:46:06 +02:00 |
Werner Saar
|
59083e3ce1
|
added optimized cscal kernel for bulldozer
|
2015-05-18 07:33:52 +02:00 |
wernsaar
|
685be40339
|
Merge pull request #571 from wernsaar/develop
added optimized cscal and zscal functions
|
2015-05-17 14:09:14 +02:00 |
Werner Saar
|
31c9e399e9
|
added optimized cscal kernel for haswell
|
2015-05-17 13:44:09 +02:00 |
Werner Saar
|
7de6bb9889
|
added optimized zscal kernel for bulldozer
|
2015-05-17 11:45:19 +02:00 |
Werner Saar
|
d63034303b
|
added optimized zscal kernel for haswell
|
2015-05-16 16:41:45 +02:00 |
Zhang Xianyi
|
51ff17d46e
|
Add AMD Excavator target.
|
2015-05-13 16:16:30 -05:00 |
Werner Saar
|
18e90ee2e3
|
bugfix: added static to functions
|
2015-05-13 13:31:26 +02:00 |
Werner Saar
|
e00cccc41e
|
added optimized dscal kernel for piledriver
|
2015-05-13 13:05:35 +02:00 |
Werner Saar
|
73f09bf64f
|
optimized dscal kernel for increment != 1
|
2015-05-13 12:14:39 +02:00 |
Werner Saar
|
02e772c7e4
|
added optimized dscal kernel for haswell
|
2015-05-12 17:19:58 +02:00 |
Werner Saar
|
7aee913991
|
added optimized dscal kernel for sandybridge
|
2015-05-12 16:27:43 +02:00 |
Werner Saar
|
e50a933037
|
added optimized dscal kernel for bulldozer
|
2015-05-12 12:28:44 +02:00 |
Werner Saar
|
133c11a156
|
updated dgemv_n kernel for nehalem
|
2015-04-30 14:38:06 +02:00 |
Werner Saar
|
30f52d53df
|
optimized dgemv_n kernel for haswell
|
2015-04-30 12:11:39 +02:00 |
Werner Saar
|
5e83d80725
|
optimized dger kernel for sandybridge
|
2015-04-28 16:58:11 +02:00 |
Werner Saar
|
b2e1797dc6
|
added optimized sger kernel for sandybridge
|
2015-04-28 15:33:38 +02:00 |
Werner Saar
|
e216f686cb
|
optimized saxpy and daxpy for sandybridge
|
2015-04-28 10:18:32 +02:00 |
Werner Saar
|
fc0e0391f3
|
bugfixes: replaced int with BLASLONG
|
2015-04-24 14:30:44 +02:00 |
Werner Saar
|
c22068c406
|
optimized sdot.c for increments != 1
|
2015-04-24 13:13:20 +02:00 |
Werner Saar
|
dee100d0e4
|
optimized saxpy.c for increments != 1
|
2015-04-24 11:52:59 +02:00 |
Werner Saar
|
0273966abb
|
optimized daxpy kernel for increments != 1
|
2015-04-24 11:39:17 +02:00 |
Werner Saar
|
3a67daa954
|
optimized ddot.c for increments != 1
|
2015-04-24 10:56:55 +02:00 |
Werner Saar
|
b4f2153dcd
|
added optimized ssymv kernels for sandybridge
|
2015-04-23 12:19:24 +02:00 |
Werner Saar
|
1c4b0eeae3
|
added optimized ssymv kernels for haswell
|
2015-04-23 10:23:13 +02:00 |
Werner Saar
|
1bec9abb9a
|
added optimized dsymv kernels for sandybridge
|
2015-04-22 12:09:43 +02:00 |
Werner Saar
|
3814bf60d3
|
added optimized dsymv kernels for haswell
|
2015-04-22 10:42:50 +02:00 |
Werner Saar
|
6d0db0151f
|
added optimized zaxpy-kernels
|
2015-04-16 11:19:37 +02:00 |
Zhang Xianyi
|
37b9033c90
|
Merge pull request #543 from jeromerobert/develop
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
|
2015-04-15 11:18:14 -05:00 |
Werner Saar
|
13889515b3
|
added optimized caxpy-kernel for sandybridge
|
2015-04-15 16:29:25 +02:00 |
Werner Saar
|
248c9340c3
|
added optimized caxpy-kernel for haswell
|
2015-04-15 15:16:31 +02:00 |
Werner Saar
|
e9f33b4ca7
|
added optimized caxpy-kernel for steamroller
|
2015-04-15 13:49:23 +02:00 |
Werner Saar
|
f5d847122a
|
updated caxpy_microk_bulldozer-2.c and caxpy.c
|
2015-04-15 11:59:38 +02:00 |
Jerome Robert
|
a4c96eca67
|
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Refs #478, #482, 9798481 , fd9fd42
|
2015-04-15 11:46:48 +02:00 |
Werner Saar
|
baa0363ea2
|
add optimized ddot-kernel for piledriver
|
2015-04-14 15:09:13 +02:00 |
Werner Saar
|
34ba66606a
|
add optimized daxpy-kernel for piledriver
|
2015-04-14 14:23:29 +02:00 |
Werner Saar
|
f615dc7603
|
added optimized saxpy kernel for steamroller
|
2015-04-14 09:09:39 +02:00 |
Werner Saar
|
331c417637
|
optimized saxpy for piledriver
|
2015-04-14 08:34:11 +02:00 |
Werner Saar
|
d7a17ad85d
|
optimized sdot-kernel for pilediver
|
2015-04-13 13:19:21 +02:00 |