Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker cc9500db41
Merge pull request #1403 from brada4/develop
Address few more warnings
2017-12-30 14:51:34 +01:00
xoviat b0652184ae Appveyor: enable building fortran with ninja 2017-12-29 19:58:35 -06:00
Martin Kroeker 723f396a20
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Andrew 03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew 47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Andrew bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Martin Kroeker d741fc13d8
Merge pull request #1399 from martin-frbg/issue1398
Fix LAPACKE build problems with both cmake and make
2017-12-21 23:36:52 +01:00
Martin Kroeker 374260027d
Add conditionals around ar calls for optional modules
The macOS ar aborts when it gets called with no input, see #1398
2017-12-21 20:42:30 +01:00
Martin Kroeker 599de9e598
Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
These were inadvertently dropped from the list in my PR #1095
2017-12-21 19:43:09 +01:00
Martin Kroeker 893bd14e92
Merge pull request #1393 from martin-frbg/daxpybug
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
2017-12-13 20:27:14 +01:00
Martin Kroeker 43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker 6aba7b66ce
Merge pull request #1390 from martin-frbg/daxpybug
Use Sandybridge daxpy kernel on Haswell and Zen for now
2017-12-10 21:46:36 +01:00
Martin Kroeker 0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Martin Kroeker 177b78c8b4
Issue1388 (#1389)
* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262 - should fix #1388

* Calculation of range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262
2017-12-09 22:29:03 +01:00
Andrew 281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker c49c6b237d
Merge pull request #1382 from martin-frbg/dtrmv-1332
Work around errors in multithreaded dtrmv
2017-12-05 19:53:23 +01:00
Martin Kroeker e2469a9ebc
Merge pull request #1386 from martin-frbg/bignuma
Limit MAX_CPU to 1024 for now
2017-12-05 19:52:52 +01:00
Martin Kroeker 5b71f3a8e4
Merge pull request #1387 from martin-frbg/cmakeandroid
Explicitly link against libm on Android with cmake as well
2017-12-05 19:52:03 +01:00
Martin Kroeker 9381ac2748
Explicitly link against libm on Android with cmake as well
Patch from #1384
2017-12-05 13:02:48 +01:00
Martin Kroeker 28ae3ca76f
Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the 
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker b414283f48
Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Martin Kroeker 38763ec4f3
Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
2017-12-03 22:40:54 +01:00
Martin Kroeker 452fbef0bf
Merge pull request #1381 from martin-frbg/ctest-warnings
Fix compiler warnings in ctest
2017-12-03 21:35:20 +01:00
Martin Kroeker 8c8313983b
Fix compiler warnings in ctest
Various fixes for const correctness, stray tab characters and unused labels
2017-12-03 18:19:30 +01:00
Martin Kroeker 881a50c093
Merge pull request #1379 from martin-frbg/warnfix
Work around compiler warnings for unused variables
2017-12-03 13:04:02 +01:00
Martin Kroeker 8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Martin Kroeker bede1c4fb4
Merge pull request #1372 from martin-frbg/param
Correct zgeadd_k prototype
2017-12-02 16:49:47 +01:00
Martin Kroeker 1d2da67841
Prefix make jobs with travis_wait (#1378)
* Prefix make with travis_wait to prevent it getting killed for producing no output

* Extend travis_wait to 30mins for the windows build

* Trying 45 mins wait time

* Increase travis_wait time to 45 minutes for linux builds as well
2017-12-02 12:59:27 +01:00
Martin Kroeker 0dc291d3fa
Merge pull request #1377 from isuruf/threads
Allow overriding NUM_THREADS in cmake
2017-12-01 16:22:35 +01:00
Isuru Fernando e0ddd7d124 Allow overriding NUM_THREADS 2017-12-01 01:42:45 -06:00
Martin Kroeker adf4316f0e
Merge pull request #1376 from xoviat/patch-2
[appveyor] fix test directory
2017-12-01 08:11:12 +01:00
xoviat 7fce11a5b8
[appveyor] fix test directory 2017-11-30 16:31:09 -06:00
Martin Kroeker c40f01ccea
Merge pull request #1375 from xoviat/patch-1
[appveyor] Use out-of-tree build and cache
2017-11-30 22:43:54 +01:00
xoviat c567e34e6b
[appveyor] fix syntax 2017-11-30 15:33:32 -06:00
xoviat c917278d23
[appveyor] Use out-of-tree build and cache 2017-11-30 15:30:10 -06:00
Martin Kroeker 0639ed1258
Merge pull request #1373 from mc10/patch-1
README: Use the SVG Travis badge
2017-11-30 12:54:52 +01:00
Kevin Ji f017e169dc
README: Use the SVG Travis badge 2017-11-29 15:21:12 -08:00
Martin Kroeker 7e860acd38
Correct zgeadd_k prototype 2017-11-29 19:57:35 +01:00
Martin Kroeker db00a51e6b
Merge pull request #1371 from martin-frbg/develop
Add trivially optimized DSDOT for POWER8
2017-11-29 19:55:21 +01:00
martin 7a4b3cfbf8 Add trivially optimized DSDOT for POWER8 2017-11-28 18:38:07 +01:00
Martin Kroeker 6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Martin Kroeker d8b3c3c7db
Merge pull request #1368 from brada4/develop
Eliminate warnings
2017-11-28 18:15:04 +01:00
Martin Kroeker beb18492fd
Merge pull request #1366 from martin-frbg/develop
Update LAPACK to 3.8.0
2017-11-26 19:12:00 +01:00
Andrew 441a9c8385 more dead increments clang4 scan-build deadcode.deadstores 2017-11-26 17:24:08 +01:00
Andrew 1236dbe5a6 Eliminate 2-8 dead increments code 2017-11-26 13:26:11 +01:00
Andrew ef95cd471f elminate unread variable, after reiteration 3 of them (clang4) 2017-11-25 02:54:37 +01:00
Martin Kroeker c92cd6d162
Add trivially optimized dsdot based on sdot 2017-11-24 20:05:27 +01:00
Martin Kroeker cae5d9a20b
Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
Martin Kroeker 3d891c3106
Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00