Commit Graph

3108 Commits

Author SHA1 Message Date
Andrew
8aafa0473c address last warnings as seen by gcc7 2018-01-01 20:57:12 +01:00
Andrew
11a627c54e remove surplus parentheses to silence clang5 2018-01-01 20:56:26 +01:00
Andrew
4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Martin Kroeker
b81656936f Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
2017-12-31 20:18:48 +01:00
Martin Kroeker
b973990df2 Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
2017-12-31 18:03:36 +01:00
Martin Kroeker
8fef2414b5 Merge pull request #1408 from xoviat/flang-ninja
Appveyor: speed up fortran builds
2017-12-30 14:52:21 +01:00
Martin Kroeker
1e31124eb0 Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker
cc9500db41 Merge pull request #1403 from brada4/develop
Address few more warnings
2017-12-30 14:51:34 +01:00
xoviat
b0652184ae Appveyor: enable building fortran with ninja 2017-12-29 19:58:35 -06:00
Martin Kroeker
723f396a20 Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Andrew
03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew
47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Andrew
bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Martin Kroeker
d741fc13d8 Merge pull request #1399 from martin-frbg/issue1398
Fix LAPACKE build problems with both cmake and make
2017-12-21 23:36:52 +01:00
Martin Kroeker
374260027d Add conditionals around ar calls for optional modules
The macOS ar aborts when it gets called with no input, see #1398
2017-12-21 20:42:30 +01:00
Martin Kroeker
599de9e598 Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
These were inadvertently dropped from the list in my PR #1095
2017-12-21 19:43:09 +01:00
Martin Kroeker
893bd14e92 Merge pull request #1393 from martin-frbg/daxpybug
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
2017-12-13 20:27:14 +01:00
Martin Kroeker
43c0622e7b Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker
6aba7b66ce Merge pull request #1390 from martin-frbg/daxpybug
Use Sandybridge daxpy kernel on Haswell and Zen for now
2017-12-10 21:46:36 +01:00
Martin Kroeker
0623636c98 Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Martin Kroeker
177b78c8b4 Issue1388 (#1389)
* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262 - should fix #1388

* Calculation of range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262
2017-12-09 22:29:03 +01:00
Andrew
281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
c49c6b237d Merge pull request #1382 from martin-frbg/dtrmv-1332
Work around errors in multithreaded dtrmv
2017-12-05 19:53:23 +01:00
Martin Kroeker
e2469a9ebc Merge pull request #1386 from martin-frbg/bignuma
Limit MAX_CPU to 1024 for now
2017-12-05 19:52:52 +01:00
Martin Kroeker
5b71f3a8e4 Merge pull request #1387 from martin-frbg/cmakeandroid
Explicitly link against libm on Android with cmake as well
2017-12-05 19:52:03 +01:00
Martin Kroeker
9381ac2748 Explicitly link against libm on Android with cmake as well
Patch from #1384
2017-12-05 13:02:48 +01:00
Martin Kroeker
28ae3ca76f Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the 
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
b414283f48 Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Martin Kroeker
38763ec4f3 Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
2017-12-03 22:40:54 +01:00
Martin Kroeker
452fbef0bf Merge pull request #1381 from martin-frbg/ctest-warnings
Fix compiler warnings in ctest
2017-12-03 21:35:20 +01:00
Martin Kroeker
8c8313983b Fix compiler warnings in ctest
Various fixes for const correctness, stray tab characters and unused labels
2017-12-03 18:19:30 +01:00
Martin Kroeker
881a50c093 Merge pull request #1379 from martin-frbg/warnfix
Work around compiler warnings for unused variables
2017-12-03 13:04:02 +01:00
Martin Kroeker
8213385ab8 Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Martin Kroeker
bede1c4fb4 Merge pull request #1372 from martin-frbg/param
Correct zgeadd_k prototype
2017-12-02 16:49:47 +01:00
Martin Kroeker
1d2da67841 Prefix make jobs with travis_wait (#1378)
* Prefix make with travis_wait to prevent it getting killed for producing no output

* Extend travis_wait to 30mins for the windows build

* Trying 45 mins wait time

* Increase travis_wait time to 45 minutes for linux builds as well
2017-12-02 12:59:27 +01:00
Martin Kroeker
0dc291d3fa Merge pull request #1377 from isuruf/threads
Allow overriding NUM_THREADS in cmake
2017-12-01 16:22:35 +01:00
Isuru Fernando
e0ddd7d124 Allow overriding NUM_THREADS 2017-12-01 01:42:45 -06:00
Martin Kroeker
adf4316f0e Merge pull request #1376 from xoviat/patch-2
[appveyor] fix test directory
2017-12-01 08:11:12 +01:00
xoviat
7fce11a5b8 [appveyor] fix test directory 2017-11-30 16:31:09 -06:00
Martin Kroeker
c40f01ccea Merge pull request #1375 from xoviat/patch-1
[appveyor] Use out-of-tree build and cache
2017-11-30 22:43:54 +01:00
xoviat
c567e34e6b [appveyor] fix syntax 2017-11-30 15:33:32 -06:00
xoviat
c917278d23 [appveyor] Use out-of-tree build and cache 2017-11-30 15:30:10 -06:00
Martin Kroeker
0639ed1258 Merge pull request #1373 from mc10/patch-1
README: Use the SVG Travis badge
2017-11-30 12:54:52 +01:00
Kevin Ji
f017e169dc README: Use the SVG Travis badge 2017-11-29 15:21:12 -08:00
Martin Kroeker
7e860acd38 Correct zgeadd_k prototype 2017-11-29 19:57:35 +01:00
Martin Kroeker
db00a51e6b Merge pull request #1371 from martin-frbg/develop
Add trivially optimized DSDOT for POWER8
2017-11-29 19:55:21 +01:00
martin
7a4b3cfbf8 Add trivially optimized DSDOT for POWER8 2017-11-28 18:38:07 +01:00
Martin Kroeker
6c77b5f267 Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Martin Kroeker
d8b3c3c7db Merge pull request #1368 from brada4/develop
Eliminate warnings
2017-11-28 18:15:04 +01:00
Martin Kroeker
beb18492fd Merge pull request #1366 from martin-frbg/develop
Update LAPACK to 3.8.0
2017-11-26 19:12:00 +01:00