Martin Kroeker
1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
...
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker
cc9500db41
Merge pull request #1403 from brada4/develop
...
Address few more warnings
2017-12-30 14:51:34 +01:00
xoviat
b0652184ae
Appveyor: enable building fortran with ninja
2017-12-29 19:58:35 -06:00
Martin Kroeker
723f396a20
Tag %1 and %2 as both input and output
...
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Andrew
03e5ff0687
initialize potentially unitialized variables (clang5)
2017-12-26 09:24:24 +01:00
Andrew
47deec2c1a
fix couple of dead assignment warnings
2017-12-22 00:56:35 +01:00
Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Martin Kroeker
d741fc13d8
Merge pull request #1399 from martin-frbg/issue1398
...
Fix LAPACKE build problems with both cmake and make
2017-12-21 23:36:52 +01:00
Martin Kroeker
374260027d
Add conditionals around ar calls for optional modules
...
The macOS ar aborts when it gets called with no input, see #1398
2017-12-21 20:42:30 +01:00
Martin Kroeker
599de9e598
Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
...
These were inadvertently dropped from the list in my PR #1095
2017-12-21 19:43:09 +01:00
Martin Kroeker
893bd14e92
Merge pull request #1393 from martin-frbg/daxpybug
...
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
2017-12-13 20:27:14 +01:00
Martin Kroeker
43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
...
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker
6aba7b66ce
Merge pull request #1390 from martin-frbg/daxpybug
...
Use Sandybridge daxpy kernel on Haswell and Zen for now
2017-12-10 21:46:36 +01:00
Martin Kroeker
0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
...
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Martin Kroeker
177b78c8b4
Issue1388 ( #1389 )
...
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262 - should fix #1388
* Calculation of range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
2017-12-09 22:29:03 +01:00
Andrew
281a2b952f
warning cleanup ( #1380 )
...
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
c49c6b237d
Merge pull request #1382 from martin-frbg/dtrmv-1332
...
Work around errors in multithreaded dtrmv
2017-12-05 19:53:23 +01:00
Martin Kroeker
e2469a9ebc
Merge pull request #1386 from martin-frbg/bignuma
...
Limit MAX_CPU to 1024 for now
2017-12-05 19:52:52 +01:00
Martin Kroeker
5b71f3a8e4
Merge pull request #1387 from martin-frbg/cmakeandroid
...
Explicitly link against libm on Android with cmake as well
2017-12-05 19:52:03 +01:00
Martin Kroeker
9381ac2748
Explicitly link against libm on Android with cmake as well
...
Patch from #1384
2017-12-05 13:02:48 +01:00
Martin Kroeker
28ae3ca76f
Limit MAX_CPU to 1024 for now
...
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348 , the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
b414283f48
Disable gemv unrolling
...
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Martin Kroeker
38763ec4f3
Disable multithreading for trmv
...
as a (hopefully temporary) workaround for #1332
2017-12-03 22:40:54 +01:00
Martin Kroeker
452fbef0bf
Merge pull request #1381 from martin-frbg/ctest-warnings
...
Fix compiler warnings in ctest
2017-12-03 21:35:20 +01:00
Martin Kroeker
8c8313983b
Fix compiler warnings in ctest
...
Various fixes for const correctness, stray tab characters and unused labels
2017-12-03 18:19:30 +01:00
Martin Kroeker
881a50c093
Merge pull request #1379 from martin-frbg/warnfix
...
Work around compiler warnings for unused variables
2017-12-03 13:04:02 +01:00
Martin Kroeker
8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
2017-12-02 22:51:58 +01:00
Martin Kroeker
bede1c4fb4
Merge pull request #1372 from martin-frbg/param
...
Correct zgeadd_k prototype
2017-12-02 16:49:47 +01:00
Martin Kroeker
1d2da67841
Prefix make jobs with travis_wait ( #1378 )
...
* Prefix make with travis_wait to prevent it getting killed for producing no output
* Extend travis_wait to 30mins for the windows build
* Trying 45 mins wait time
* Increase travis_wait time to 45 minutes for linux builds as well
2017-12-02 12:59:27 +01:00
Martin Kroeker
0dc291d3fa
Merge pull request #1377 from isuruf/threads
...
Allow overriding NUM_THREADS in cmake
2017-12-01 16:22:35 +01:00
Isuru Fernando
e0ddd7d124
Allow overriding NUM_THREADS
2017-12-01 01:42:45 -06:00
Martin Kroeker
adf4316f0e
Merge pull request #1376 from xoviat/patch-2
...
[appveyor] fix test directory
2017-12-01 08:11:12 +01:00
xoviat
7fce11a5b8
[appveyor] fix test directory
2017-11-30 16:31:09 -06:00
Martin Kroeker
c40f01ccea
Merge pull request #1375 from xoviat/patch-1
...
[appveyor] Use out-of-tree build and cache
2017-11-30 22:43:54 +01:00
xoviat
c567e34e6b
[appveyor] fix syntax
2017-11-30 15:33:32 -06:00
xoviat
c917278d23
[appveyor] Use out-of-tree build and cache
2017-11-30 15:30:10 -06:00
Martin Kroeker
0639ed1258
Merge pull request #1373 from mc10/patch-1
...
README: Use the SVG Travis badge
2017-11-30 12:54:52 +01:00
Kevin Ji
f017e169dc
README: Use the SVG Travis badge
2017-11-29 15:21:12 -08:00
Martin Kroeker
7e860acd38
Correct zgeadd_k prototype
2017-11-29 19:57:35 +01:00
Martin Kroeker
db00a51e6b
Merge pull request #1371 from martin-frbg/develop
...
Add trivially optimized DSDOT for POWER8
2017-11-29 19:55:21 +01:00
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
2017-11-28 18:38:07 +01:00
Martin Kroeker
6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot
...
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Martin Kroeker
d8b3c3c7db
Merge pull request #1368 from brada4/develop
...
Eliminate warnings
2017-11-28 18:15:04 +01:00
Martin Kroeker
beb18492fd
Merge pull request #1366 from martin-frbg/develop
...
Update LAPACK to 3.8.0
2017-11-26 19:12:00 +01:00
Andrew
441a9c8385
more dead increments clang4 scan-build deadcode.deadstores
2017-11-26 17:24:08 +01:00
Andrew
1236dbe5a6
Eliminate 2-8 dead increments code
2017-11-26 13:26:11 +01:00
Andrew
ef95cd471f
elminate unread variable, after reiteration 3 of them (clang4)
2017-11-25 02:54:37 +01:00
Martin Kroeker
c92cd6d162
Add trivially optimized dsdot based on sdot
2017-11-24 20:05:27 +01:00
Martin Kroeker
cae5d9a20b
Add trivially optimized dsdot based on sdot
2017-11-24 20:04:29 +01:00
Martin Kroeker
3d891c3106
Add trivially optimized dsdot based on sdot
2017-11-24 20:03:40 +01:00