Martin Kroeker
3f7720ec4b
LAPACKE: fix wrong number of columns in ?ormlq
...
Copied from lapack https://github.com/Reference-LAPACK/lapack/pull/127 by vladimir-ch (with earlier changes from echeresh's
PR 115 "lapacke_*ormlq_work: move declarations under if" there as they touched some of the same files)
2017-02-20 16:20:43 +01:00
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
2017-02-17 03:06:32 -08:00
Martin Kroeker
e545a66a5b
Merge pull request #1091 from staticfloat/sf/corei5_7600k
...
CPUID mappings for Core i5-7600K (Kaby Lake)
2017-02-17 10:30:09 +01:00
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
2017-02-14 04:10:06 -08:00
Martin Kroeker
9e2f316ede
Power8 inline assembly fixes
...
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
2017-02-13 23:38:50 +01:00
Martin Kroeker
e2489c9a92
Merge pull request #1096 from martin-frbg/pkg-config
...
Build only openblas.pc for pkg-config and install it from cmake as well
2017-02-12 17:00:17 +01:00
Martin Kroeker
c4ea9eea67
Add cmake template for openblas.pc
2017-02-12 14:38:32 +01:00
Martin Kroeker
cd8f80634f
Create and install openblas.pc in cmake builds
2017-02-12 14:37:33 +01:00
Martin Kroeker
faf06f0d8b
Create and install only a single openblas.pc file
2017-02-12 14:35:48 +01:00
Martin Kroeker
c6fa4aef0c
Rename blas.pc.in to openblas.pc.in
2017-02-12 14:34:03 +01:00
Martin Kroeker
1029dcd60d
Merge pull request #1095 from martin-frbg/lapack370-cmake
...
Update cmakefiles for netlib 3.7.0
2017-02-12 14:30:29 +01:00
Martin Kroeker
d12c8bbcbb
Add zlasyf_aa to lapack.cmake
2017-02-12 13:49:49 +01:00
Martin Kroeker
15f0d65010
Add another bunch of lapack 3.7 functions to cmake list
2017-02-12 01:59:30 +01:00
Martin Kroeker
7d831af1ba
Add LAPACK 3.7 files not mentioned in announcement
2017-02-12 01:37:35 +01:00
Martin Kroeker
ee3e87cf46
Update cmake file list for lapacke 3.7.0
2017-02-12 00:40:16 +01:00
Martin Kroeker
8772c00bb0
Update cmake file list for lapack 3.7.0
2017-02-11 23:11:26 +01:00
Martin Kroeker
0a4a7e18f6
Merge pull request #1094 from martin-frbg/cmake-1
...
Update cmakefiles with changes from netlib 3.6.1
2017-02-11 20:48:41 +01:00
Martin Kroeker
357ef3cd8c
Reflect name change of lapacke_mangling.h template
2017-02-11 19:56:02 +01:00
Martin Kroeker
002e646476
Add new functions from LAPACK 3.6.1
2017-02-11 19:54:02 +01:00
Martin Kroeker
3dad87bbb5
Merge pull request #1093 from martin-frbg/restore-cmakeinstall
...
Restore cmake install target
2017-02-11 17:41:39 +01:00
Martin Kroeker
bdd51cdabc
Add cmake install target
...
Add CMAKE install target (based on patch provided by PrimarchOfTheSpaceWolves in #957 )
This was originally merged as 988 but accidentally reverted by my subsequent PR the following day
2017-02-11 16:43:46 +01:00
Elliot Saba
1d8ab99e09
Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection
2017-02-10 15:23:55 -08:00
Elliot Saba
04b2b06665
CPUID mappings for Core i5-7600K (Kaby Lake)
2017-02-10 14:53:15 -08:00
Martin Kroeker
8a83daf4bf
Merge pull request #1084 from isuruf/develop
...
Install pkg-config files
2017-02-08 01:01:18 +01:00
Martin Kroeker
39abb079fb
Merge pull request #1087 from grisuthedragon/enable-a12
...
Enable EXCAVATOR kernels for A12-9800
2017-02-08 01:00:32 +01:00
Martin Koehler
76c6e33e54
Enable EXCAVATOR kernels for A12-9800
2017-02-07 21:38:28 +01:00
Martin Kroeker
a9594e8072
Merge pull request #1085 from vladimir-ch/lapacke_laswp_work
...
LAPACKE: fix incorrect value of lda_t in lapacke_?laswp_work
2017-02-07 11:40:41 +01:00
Ashwin Sekhar T K
8e89668f62
THUNDERX2T99: Fix bug in SNRM2
2017-02-07 02:14:33 -08:00
Ashwin Sekhar T K
f63deae9de
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
2017-02-07 01:35:55 -08:00
Vladimir Chalupecky
4c2b713ce5
LAPACKE: fix incorrect value of lda_t in lapacke_?laswp_work
...
Fixed in Reference LAPACK in commit:
07e1fbd897
2017-02-07 09:21:46 +01:00
Isuru Fernando
cdc954675c
Install pkg-config files
2017-02-06 12:15:58 +05:30
Martin Kroeker
60eea75409
Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
...
More optimized implementations for ThunderX2T99
2017-02-04 17:25:43 +01:00
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
2017-02-03 03:55:06 -08:00
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K
3918d17025
LAPACK: Fix lapack-test errors in ARM64 threaded version
2017-01-31 23:36:23 +05:30
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Zhang Xianyi
ff41e13385
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
...
Add more THUNDERX2T99 Optimized APIs
2017-01-25 22:17:05 +08:00
Ashwin Sekhar T K
1de6fa0f50
Update .gitignore
2017-01-24 23:14:09 -08:00
Ashwin Sekhar T K
efda640723
Benchmark: Add MFlops print in iamax benchmark
2017-01-24 23:13:47 -08:00
Ashwin Sekhar T K
1530e78cfe
Benchmarks: Avoid building lapack benchmarks when NO_LAPACK=1
2017-01-24 20:50:23 -08:00
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K
01e1d85339
Update .gitignore
2017-01-19 11:58:59 +05:30