Elliot Saba
7ae64f4f9c
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-09 15:22:40 -08:00
Zhang Xianyi
90e02ccf68
Support ARM softfp ABI for sgemm on ARMV7.
...
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Zhang Xianyi
503dcbfde6
Merge branch 'develop' into arm_soft_fp_abi
2017-03-06 13:53:56 +08:00
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
2017-03-06 04:27:40 +04:00
Martin Kroeker
4227049c7d
Merge pull request #1111 from martin-frbg/kaby-no-avx
...
Fix core detection for Kaby Lake without AVX (G4560)
2017-03-02 18:43:59 +01:00
Martin Kroeker
688267edf3
Fix core detection for Kaby Lake without AVX (G4560)
...
Should fix #1109 )
2017-03-02 17:36:16 +01:00
Martin Kroeker
d1fe040d9b
Merge pull request #1110 from quickwritereader/develop
...
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf
411982715c
conventional usage of the register save area
2017-03-01 20:39:39 +04:00
Abdurrauf
e831d6924e
changed to conventional register save area
2017-03-01 03:13:21 +04:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Martin Kroeker
f09a9afa03
Merge pull request #1107 from quickwritereader/develop
...
ztrmm(zgemm) complex double precision kernel for ibm z13
2017-02-26 09:49:01 +01:00
Abdurrauf
0d96b0e2a7
Merge branch 'z13' into develop
2017-02-26 06:17:33 +04:00
Abdurrauf
848cb27b1e
ztrmm kernel.
2017-02-26 06:14:12 +04:00
Martin Kroeker
dc34a0da96
Merge pull request #915 from mdong/small_fix_for_icc
...
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
2017-02-23 10:02:15 -08:00
Martin Kroeker
47e9fe0bb4
Merge pull request #1105 from martin-frbg/testing-eig-typos
...
TESTING/EIG: fix spurious EXTERNAL references to nonexistent functions
2017-02-22 22:42:52 +01:00
Martin Kroeker
c7bc0ee823
Remove spurious names from EXTERNAL list
...
Remove unused (and nonexistent) functions ZHETRD_SY2SB and ZHETRD_SB2ST from comment and EXTERNAL declaration
2017-02-22 21:48:35 +01:00
Martin Kroeker
6bdee6d50a
Remove spurious names from EXTERNAL list
...
Remove unused (and nonexistent) ZHETRD_SY2SB and ZHETRD_SB2ST
2017-02-22 21:45:27 +01:00
Martin Kroeker
009c0d2e5a
Fix typo in EXTERNAL declaration
...
ZHBTRD_HB2ST should be ZHETRD_HB2ST
2017-02-22 21:41:07 +01:00
Martin Kroeker
4d88e1a4ad
Merge pull request #1104 from martin-frbg/lapack-comma
...
LAPACK: fix missing comma on continued lines
2017-02-22 10:31:39 +01:00
Martin Kroeker
0958b49811
Fix missing comma on continued line
...
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:40:39 +01:00
Martin Kroeker
09b240f1ef
Fix missing comma on continued line
...
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:39:06 +01:00
Martin Kroeker
69f4e8b86c
Fix missing comma on continued line
...
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:34:20 +01:00
Martin Kroeker
e072e68aa0
Fix missing comma in continued line
...
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:32:20 +01:00
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
2017-02-21 21:42:32 -08:00
Martin Kroeker
e527dbffaa
Merge pull request #1103 from vladimir-ch/fix-lapacke-ormbr
...
LAPACKE: fix wrong matrix size in ?ormbr
2017-02-21 22:58:30 +01:00
Vladimir Chalupecky
eeaee46e86
LAPACKE: fix wrong matrix size in ?ormbr
...
Changes made upstream in Reference LAPACK in
https://github.com/Reference-LAPACK/lapack/pull/128
2017-02-21 21:57:18 +01:00
Martin Kroeker
040672ecf6
Merge pull request #1098 from martin-frbg/amodra-power8
...
Power8 inline assembly fixes
2017-02-21 15:26:14 +01:00
Martin Kroeker
c8ce9e4377
Merge pull request #1101 from martin-frbg/martin-frbg-patch-1
...
LAPACKE: fix wrong number of columns in ?ormlq
2017-02-21 15:19:56 +01:00
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
2017-02-21 03:24:32 -08:00
Martin Kroeker
39eecfd20c
Merge pull request #1102 from brada4/develop
...
Correct Apollo Lake CPUID identification in dynamic_arch builds
2017-02-21 08:26:39 +01:00
Andrew
5088523786
detect apollo lake for real
2017-02-20 23:54:59 +01:00
Martin Kroeker
3f7720ec4b
LAPACKE: fix wrong number of columns in ?ormlq
...
Copied from lapack https://github.com/Reference-LAPACK/lapack/pull/127 by vladimir-ch (with earlier changes from echeresh's
PR 115 "lapacke_*ormlq_work: move declarations under if" there as they touched some of the same files)
2017-02-20 16:20:43 +01:00
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
2017-02-17 03:06:32 -08:00
Martin Kroeker
e545a66a5b
Merge pull request #1091 from staticfloat/sf/corei5_7600k
...
CPUID mappings for Core i5-7600K (Kaby Lake)
2017-02-17 10:30:09 +01:00
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
2017-02-14 04:10:06 -08:00
Martin Kroeker
9e2f316ede
Power8 inline assembly fixes
...
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
2017-02-13 23:38:50 +01:00
Martin Kroeker
e2489c9a92
Merge pull request #1096 from martin-frbg/pkg-config
...
Build only openblas.pc for pkg-config and install it from cmake as well
2017-02-12 17:00:17 +01:00
Martin Kroeker
c4ea9eea67
Add cmake template for openblas.pc
2017-02-12 14:38:32 +01:00
Martin Kroeker
cd8f80634f
Create and install openblas.pc in cmake builds
2017-02-12 14:37:33 +01:00
Martin Kroeker
faf06f0d8b
Create and install only a single openblas.pc file
2017-02-12 14:35:48 +01:00
Martin Kroeker
c6fa4aef0c
Rename blas.pc.in to openblas.pc.in
2017-02-12 14:34:03 +01:00
Martin Kroeker
1029dcd60d
Merge pull request #1095 from martin-frbg/lapack370-cmake
...
Update cmakefiles for netlib 3.7.0
2017-02-12 14:30:29 +01:00
Martin Kroeker
d12c8bbcbb
Add zlasyf_aa to lapack.cmake
2017-02-12 13:49:49 +01:00
Martin Kroeker
15f0d65010
Add another bunch of lapack 3.7 functions to cmake list
2017-02-12 01:59:30 +01:00