Martin Kroeker
fab746240c
Merge pull request #3304 from xianyi/develop
...
Merge develop int0 0.3.0 for release 0.3.16
2021-07-12 00:12:52 +02:00
Martin Kroeker
847607c768
Merge branch 'release-0.3.0' into develop
2021-07-12 00:12:25 +02:00
Martin Kroeker
4c81d1c3fe
Update version to 0.3.16
2021-07-12 00:09:35 +02:00
Martin Kroeker
db4908ebfa
Update version to 0.3.16
2021-07-12 00:08:55 +02:00
Martin Kroeker
ed3eb18cb2
Merge pull request #3303 from martin-frbg/changelog16
...
Update Changelog for 0.3.16
2021-07-11 23:50:02 +02:00
Martin Kroeker
239ff330f8
Update Changelog for 0.3.16
2021-07-11 23:48:39 +02:00
Martin Kroeker
19c81a07cb
Merge pull request #3300 from martin-frbg/AzureAlpine
...
Move Alpine Linux build job from Travis to Azure
2021-07-11 22:50:20 +02:00
Martin Kroeker
e008646ba9
Merge pull request #3302 from martin-frbg/small_cleanup
...
Clean up some warnings
2021-07-11 22:26:41 +02:00
Martin Kroeker
498479b13e
Update azure-pipelines.yml
2021-07-11 18:29:17 +02:00
Martin Kroeker
b4cbfe6677
Update azure-pipelines.yml
2021-07-11 18:08:30 +02:00
Martin Kroeker
be1a42507c
Merge pull request #3297 from outerpassage/develop
...
fix compilation with musl libc
2021-07-11 17:10:20 +02:00
Martin Kroeker
7bb59fceb7
Clean up some warnings
2021-07-11 16:00:29 +02:00
Martin Kroeker
eba2cd951e
Revert addition of test_install
2021-07-11 14:38:49 +02:00
Martin Kroeker
836c7fb9f5
Revert addition of test_install target
2021-07-11 14:37:38 +02:00
Martin Kroeker
d2693eac04
Update azure-pipelines.yml
2021-07-11 11:54:02 +02:00
Martin Kroeker
8acb6fe3a8
Update azure-pipelines.yml
2021-07-11 11:29:52 +02:00
Martin Kroeker
c47e35acee
Update azure-pipelines.yml
2021-07-11 09:38:48 +02:00
Martin Kroeker
a27a61bb9a
Update azure-pipelines.yml
2021-07-11 08:24:20 +02:00
Martin Kroeker
69560ad3ce
Update azure-pipelines.yml
2021-07-11 07:25:07 +02:00
Martin Kroeker
b2319fd97a
Merge pull request #3301 from martin-frbg/syr2bench
...
Handle OPENBLAS_LOOPS in SYR2 benchmark
2021-07-11 07:20:19 +02:00
Martin Kroeker
0266ba7cb6
Update azure-pipelines.yml
2021-07-10 23:21:58 +02:00
Martin Kroeker
7e09570e04
Update azure-pipelines.yml
2021-07-10 22:41:49 +02:00
Martin Kroeker
14e33e0f7e
Handle OPENBLAS_LOOPS in SYR2 benchmark
2021-07-10 21:27:53 +02:00
Martin Kroeker
db57c449dc
Update azure-pipelines.yml
2021-07-10 20:57:21 +02:00
Martin Kroeker
993e56b7b3
Merge pull request #3299 from martin-frbg/issue3298
...
Fix copy-paste error in LIBCORE assignment for Tiger Lake
2021-07-10 20:48:53 +02:00
Martin Kroeker
c9304199cf
Update azure-pipelines.yml
2021-07-10 20:12:33 +02:00
Martin Kroeker
d86290edf0
add sudo for install in Alpine
2021-07-10 19:52:04 +02:00
Martin Kroeker
89429fdaa2
fix typo
2021-07-10 19:03:42 +02:00
Martin Kroeker
d511063098
Move Alpine Linux build job from Travis to Azure
2021-07-10 18:52:44 +02:00
Martin Kroeker
4f4e286bf6
Fix copy-paste error in LIBCORE assignment for Tiger Lake
2021-07-10 18:20:40 +02:00
River Dillon
ddb6cee0d5
Contribution note
2021-07-10 01:34:47 -07:00
River Dillon
cecc2c65aa
Add test of installed <openblas_config.h>
2021-07-10 01:26:05 -07:00
River Dillon
220f6a1c55
Add feature test macro for proper inclusion of <sched.h>
2021-07-10 00:38:02 -07:00
River Dillon
2f6326a630
Remove <linux/unistd.h>
2021-07-10 00:36:07 -07:00
Martin Kroeker
c0d0406b97
Merge pull request #3296 from martin-frbg/issue3295
...
Support Zhaoxin/Centaur family 7 processors as Nehalem
2021-07-08 21:24:15 +02:00
Martin Kroeker
8f22ac552b
Add vendor string Shanghai as successor to Centaur
2021-07-08 18:28:49 +02:00
Martin Kroeker
da623ae838
Add vendor string Shanghai as the successor to Centaur
2021-07-08 18:26:23 +02:00
Martin Kroeker
eb2fdd3af0
Recognize newer Zhaoxin/Centaur processors as Nehalem
2021-07-08 12:23:15 +02:00
Martin Kroeker
0d8d261dd4
Recognize newer Zhaoxin/Centaur cpus as Nehalem
2021-07-08 12:20:19 +02:00
Martin Kroeker
40caaef052
Merge pull request #3265 from TAAPArthur/improve_portability
...
Removed use of non portable '-p' arg to install
2021-07-07 20:58:29 +02:00
Martin Kroeker
25b602d8a6
Merge pull request #3293 from martin-frbg/issue3290
...
Enable (C)EXTRALIB as for any other platform when building the tests on RISCV C910V
2021-07-07 20:46:54 +02:00
Martin Kroeker
4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit
...
Add lower limit for multithreading in xSYRK
2021-07-07 20:46:28 +02:00
Martin Kroeker
f20c4edc33
Merge pull request #3288 from martin-frbg/getrf-2
...
Add lower threshold for multithreading in ?GETRF
2021-07-07 20:45:57 +02:00
Martin Kroeker
3cfdb1770c
Remove code that disabled EXTRALIB on RISCV C910V
2021-07-06 20:21:07 +02:00
Martin Kroeker
8186963d8c
Add lower limit for multithreading
2021-07-04 17:00:26 +02:00
Martin Kroeker
a4543e4918
Handle OPENBLAS_LOOP
2021-07-04 16:59:43 +02:00
Martin Kroeker
2376aa1e8c
Merge pull request #3289 from martin-frbg/issue3283
...
Update README to mention availability of the Windows binaries in the Releases section
2021-07-02 00:19:06 +02:00
Martin Kroeker
4620f98812
Mention availability of the Windows binaries in the Releases section
2021-07-01 19:24:35 +02:00
Martin Kroeker
726c44242b
Add lower threshold for multithreading
2021-07-01 17:41:05 +02:00
Martin Kroeker
dcfc5cf714
Handle OPENBLAS_LOOPS for more stable results
2021-07-01 17:39:37 +02:00
Martin Kroeker
06e3b07ecb
Handle OPENBLAS_LOOPS and OPENBLAS_TEST options
2021-07-01 17:38:45 +02:00
Martin Kroeker
623be6600a
Merge pull request #3284 from martin-frbg/potrf_potri
...
Add lower thresholds for multithreading in POTRF/POTRI and improve the related benchmark
2021-06-30 07:42:45 +02:00
Martin Kroeker
7ddc9d384c
Merge pull request #3287 from martin-frbg/appveyor-conda
...
Work around current conda/tqdm auto-update problem on Appveyor
2021-06-29 20:09:26 +02:00
Martin Kroeker
6ebcce229f
Work around current conda/tqdm auto-update problem
2021-06-29 17:17:34 +02:00
Martin Kroeker
1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri
2021-06-26 23:47:41 +02:00
Martin Kroeker
1f8bda71b9
Add OPENBLAS_LOOPS support to potrf/potrs/potri benchmark
2021-06-26 23:46:00 +02:00
Martin Kroeker
3be660c000
Add interface declarations for ?potri
2021-06-26 23:44:56 +02:00
Martin Kroeker
1a8b6134c2
Merge pull request #3278 from brada4/A55
...
Add CORTEXA55 cpuid 0xd05 support
2021-06-23 13:05:17 +02:00
Martin Kroeker
f0b822a709
Update cpuid_arm64.c
2021-06-23 10:11:01 +02:00
User User-User
130327e9af
OK
2021-06-22 23:58:59 +02:00
User User-User
750719528a
bugz
2021-06-20 16:40:43 +02:00
User User-User
91e2b11d3c
add to cmake listings too
2021-06-20 15:32:42 +02:00
User User-User
548aa522e5
remove misplaced file
2021-06-20 15:29:25 +02:00
User User-User
6423b282a1
dynamic_arch
2021-06-20 14:19:41 +02:00
User User-User
9335d42740
add gcc8 version matching
2021-06-19 22:21:39 +02:00
User User-User
39ef0880ae
copy conf
2021-06-19 21:49:58 +02:00
User User-User
b7da75e4fd
WiP CORTEX A55 support
2021-06-19 21:37:51 +02:00
Martin Kroeker
a7627c5afd
Merge pull request #3276 from martin-frbg/issue3274
...
Add workaround for another macro name collision with Windows 10 SDK winnt.h
2021-06-16 16:37:30 +02:00
Martin Kroeker
9499ab0d45
Merge pull request #3275 from martin-frbg/lapack580
...
Fix missing EXTERNAL declarations in LAPACK TESTING (LAPACK PR 580)
2021-06-16 13:41:38 +02:00
Martin Kroeker
307c4c0786
Fix typo
2021-06-16 13:41:16 +02:00
Martin Kroeker
e83df93975
Work around another recent macro name collision with winnt.h
2021-06-16 12:32:34 +02:00
Martin Kroeker
13fa9f737d
Modify defines for CR and RC to work around name collision on Windows
2021-06-16 12:17:25 +02:00
Martin Kroeker
5958ffc9b6
Declare DZASUM as EXTERNAL
2021-06-16 09:43:39 +02:00
Martin Kroeker
cd0e4aadb1
Declare ZDROT as EXTERNAL
2021-06-16 09:41:18 +02:00
Martin Kroeker
e2621ef93a
Declare SROT as EXTERNAL
2021-06-16 09:40:15 +02:00
Martin Kroeker
9e1b43ea9b
Declare DROT as EXTERNAL
2021-06-16 09:39:28 +02:00
Martin Kroeker
5269348178
Declare CSROT as EXTERNAL
2021-06-16 09:35:12 +02:00
Martin Kroeker
92e024bbb3
Declare SCASUM as EXTERNAL
2021-06-16 09:33:23 +02:00
Martin Kroeker
c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
...
Power10: Fix for SBGEMM
2021-06-15 22:58:48 +02:00
Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
...
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
2021-06-15 13:07:47 -05:00
Martin Kroeker
baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
...
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker
7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
...
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker
29417adf4c
Merge pull request #3270 from ggouaillardet/topic/dznrm2_tx2
...
arm64: add the missing d9 register to the clobber list
2021-06-14 13:00:33 +02:00
Gilles Gouaillardet
9d292d37b2
arm64: add the missing d9 register to the clobber list
...
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp >
2021-06-14 17:01:28 +09:00
Martin Kroeker
2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
...
Remove spurious casts from PPC parameters and fix compilation for older targets
2021-06-10 18:05:47 +02:00
Martin Kroeker
dbba381dc3
Merge pull request #3260 from intelmy/sgemv_t_opt
...
Optimized sgemv_t for small N based on AVX512
2021-06-10 16:08:24 +02:00
Martin Kroeker
f61991d439
Merge pull request #3264 from RajalakshmiSR/sbgemmp10
...
POWER10: Fixes for sbgemm kernel
2021-06-10 16:07:47 +02:00
Martin Kroeker
efdbdd8f82
Add prefetch values for power3
2021-06-10 11:20:29 +02:00
Martin Kroeker
3906ef3b0f
Add prefetch values for power3
2021-06-10 11:19:40 +02:00
Martin Kroeker
8adf0971d8
Add prefetch values for power3
2021-06-10 11:18:22 +02:00
Martin Kroeker
08e2e60762
Add prefetch values for power3
2021-06-10 11:17:33 +02:00
Martin Kroeker
fb9e678235
Fix caxpy/zaxpy for big-endian
2021-06-10 11:15:48 +02:00
Martin Kroeker
dc4fcb48df
Fix inverted conditional for caxpy/zaxpy
2021-06-10 11:14:03 +02:00
Martin Kroeker
7a48247761
fix c/zrot and sgemv for POWER5
2021-06-10 11:11:56 +02:00
Martin Kroeker
7dfc45e840
Remove casts for PPC/POWER and complete parameters for POWER3/4
2021-06-10 11:09:50 +02:00
Arthur Williams
7fb6e576c2
Removed use of non portable '-p' arg to install
...
Not all versions of install support '-p' flag and it isn't worth failing
the build in the installed files' timestamps get updated.
2021-06-09 20:50:36 -05:00
Rajalakshmi Srinivasaraghavan
cbb70438df
POWER10: Fixes for sbgemm kernel
...
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
2021-06-09 12:20:09 -05:00
Ma, Yu
706a08d4a0
Optimized sgemv_t for small N based on AVX512
2021-06-08 15:08:28 -04:00
Zhang Xianyi
9f3d903817
Merge pull request #3259 from zhaofengli/riscv64-fixes
...
riscv64 fixes
2021-06-08 16:26:56 +08:00
Zhaofeng Li
590be3fae3
riscv64: Add Makefile
2021-06-07 22:55:56 +00:00
Zhaofeng Li
3521cd48cb
RISCV64_GENERIC: Use generic kernel for DSDOT for better precision
...
The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and
the generic kernel seems to have better precision. Tested on SiFive
FU740 (HiFive Unmatched) and QEMU.
Also see #1469 .
2021-06-07 22:50:23 +00:00
Zhaofeng Li
1e0192a5cc
riscv64/imin: Fix wrong comparison
...
Same as #1990 .
2021-06-07 22:49:39 +00:00
Martin Kroeker
fe9aff17fe
Merge pull request #3258 from martin-frbg/hbaction
...
revert "try to work around gcc update problems" in Homebrew workflow
2021-06-06 22:15:29 +02:00
Martin Kroeker
8c25b440a0
revert "try to work around gcc update problems"
...
...as homebrew has dropped at least gcc8 now
2021-06-06 19:17:36 +02:00
Martin Kroeker
f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-29 22:28:00 +02:00
Martin Kroeker
734bd265a8
revert symv changes for now
2021-05-29 15:40:03 +02:00
Martin Kroeker
1217eb910d
Fix copy-paste errors in variables used
2021-05-28 09:38:48 +02:00
Martin Kroeker
d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-27 22:39:18 +02:00
Martin Kroeker
f0e7345fb8
Add shortcut for small-size gemv_n with increments of one
2021-05-26 22:02:34 +02:00
Martin Kroeker
42f048cf6c
Merge pull request #3249 from MikaelUrankar/develop
...
Fix typo
2021-05-26 15:26:30 +02:00
MikaelUrankar
4fbc0777f4
Fix typo
2021-05-26 12:14:57 +02:00
Martin Kroeker
d7472606d5
Merge pull request #3244 from martin-frbg/issue3237
...
Add fast path for small xSYR with INCX==1
2021-05-22 22:38:09 +02:00
Martin Kroeker
03297ff9f0
Add fast path for small xSYR with INCX==1
2021-05-22 20:41:18 +02:00
Martin Kroeker
2d8d0af0ea
Merge pull request #3243 from martin-frbg/lapack564
...
Fix spurious error exit test failures in the ?chktsqr tests (LAPACK564)
2021-05-22 19:25:56 +02:00
Martin Kroeker
5f677e782e
Merge pull request #3196 from guowangy/skylakex-gemm-batch-k
...
GEMM: skylake: improve the performance when m is small
2021-05-22 19:25:28 +02:00
Martin Kroeker
04c60cee5d
Merge pull request #3242 from martin-frbg/issue3239
...
Handle inadvertent use of DYNAMIC_ARCH=0
2021-05-22 19:24:46 +02:00
Martin Kroeker
3a53207cc9
Fix spurious error exit test failures in the ?chktsqr tests (LAPACK564)
2021-05-22 14:29:45 +02:00
Martin Kroeker
0e73d20629
Handle inadvertent use of DYNAMIC_ARCH=0
2021-05-22 14:23:49 +02:00
Martin Kroeker
02087a62e7
Merge pull request #3205 from intelmy/sgemv_n_opt
...
optimize on sgemv_n for small n
2021-05-17 17:49:01 +02:00
Martin Kroeker
03b4d79a7e
Merge pull request #3238 from martin-frbg/lapack555
...
Correct function name in error message from SLASQ2 (LAPACK PR555)
2021-05-17 17:32:23 +02:00
Martin Kroeker
5c729c6dce
Correct function name in error message from SLASQ2 (Reference-LAPACK PR 555)
2021-05-17 14:47:14 +02:00
Martin Kroeker
e1911b2e60
Merge pull request #3236 from martin-frbg/issue3234
...
Add -lm for FreeBSD on ARM/ARM64
2021-05-16 17:17:18 +02:00
Martin Kroeker
8f33da4f94
Merge pull request #3235 from dnoan/develop
...
Update Makefile.arm64
2021-05-16 17:15:45 +02:00
Martin Kroeker
26ccf643a3
Add -lm for FreeBSD on ARM/ARM64
2021-05-16 13:04:38 +02:00
Noan
32264ba496
Update Makefile.arm64
...
Added -march and -mtune flags for EMAG processors when GCC 9 or later
2021-05-16 09:49:13 +00:00
Martin Kroeker
4ecf631f95
Merge pull request #3228 from martin-frbg/issue3226
...
filter out -mavx flag on Sandybridge zgemm/ztrmm kernels
2021-05-15 09:06:12 +02:00
Martin Kroeker
5af510081d
Merge pull request #3233 from martin-frbg/issue3230
...
Add autodetection for Intel Ice Lake SP
2021-05-15 01:04:09 +02:00
Martin Kroeker
164551d5a2
Merge pull request #3232 from martin-frbg/lapack553
...
Reduce stack size requirements in the LAPACK LIN tests (LAPACK PR 553)
2021-05-14 23:28:45 +02:00
Martin Kroeker
310b76aad7
Merge pull request #3231 from martin-frbg/issue3227
...
Support compilation with pre-C99 versions of MSVC
2021-05-14 23:28:06 +02:00
Martin Kroeker
c4da892ba0
Only filter out -mavx on Sandybridge ZGEMM/ZTRMM kernels
2021-05-14 23:19:10 +02:00
Martin Kroeker
cbfd3c87e1
Recognize Intel Ice Lake SP as Cooper Lake
2021-05-14 20:44:06 +02:00
Martin Kroeker
26e87ac517
Support Intel Ice Lake SP as Cooper Lake
2021-05-14 20:39:55 +02:00
Martin Kroeker
15b9d6b4a7
Delete zchkaa.f
2021-05-14 19:55:31 +02:00
Martin Kroeker
f7bcd962c1
Delete schkaa.f
2021-05-14 19:54:54 +02:00
Martin Kroeker
93cc066921
Delete dchkaa.f
2021-05-14 19:54:13 +02:00
Martin Kroeker
2c7d4a7766
Delete cchkaa.f
2021-05-14 19:53:38 +02:00
Martin Kroeker
eef1c42f03
Convert ?chkaa to use dynamic allocation for the larger arrays
2021-05-14 19:53:03 +02:00
Martin Kroeker
73f637e584
Support compilation with pre-C99 versions of MSVC
2021-05-14 15:08:12 +02:00
Martin Kroeker
8b90e5f202
Drop redundant inclusion of complex.h
2021-05-14 15:06:44 +02:00
Martin Kroeker
bd60fb6ffc
filter out -mavx flag on zgemm kernels as it can cause problems with older gcc
2021-05-13 23:05:00 +02:00
Martin Kroeker
37ea8702ee
Merge pull request #3192 from damonyu1989/develop
...
Update the intrinsic api to the offical name.
2021-05-11 16:00:45 +02:00
Martin Kroeker
ec7d6c02bc
Add an Android crossbuild on OSX to Azure CI ( #3224 )
...
* Add an Android crossbuild on OSX
2021-05-10 08:02:01 +02:00
Martin Kroeker
c90c23e78f
Merge pull request #3223 from martin-frbg/develop
...
Use percent instead of ampersand as placeholder for substitutions
2021-05-07 08:51:45 +02:00
Martin Kroeker
bda8820da7
Use percent instead of ampersand as placeholder for substitutions
2021-05-06 20:20:08 +02:00
Martin Kroeker
c0ca63ea46
Fix missing conditionals for non-SKX kernels
2021-05-05 14:55:36 +02:00
Martin Kroeker
f497bb949b
Merge pull request #3219 from austinpagan/Gemm.ErrorFix
...
Add error message token for SBGEMM in gemm.c
2021-05-05 14:30:41 +02:00
Martin Kroeker
f86b1bc3da
Merge pull request #3220 from drhpc/drhpc-fixup
...
Delete lapack_wrappers.c.orig
2021-05-05 14:30:24 +02:00
drhpc
206e03fdac
Delete lapack_wrappers.c.orig
...
This looks like a leftover from patching and confuses further patching;-)
2021-05-04 21:02:07 +02:00
Gordon Fossum
8b599836db
Add error message token for SBGEMM in gemm.c
2021-05-04 13:55:02 -05:00
Martin Kroeker
9721b57ecf
Update version to 0.3.15.dev
2021-05-03 00:01:08 +02:00
Martin Kroeker
380f955078
Update version to 0.3.15.dev
2021-05-03 00:00:29 +02:00
Martin Kroeker
49d18e65e3
Merge pull request #3217 from xianyi/release-0.3.0
...
merge 0.3.15 back into develop to copy tag
2021-05-02 23:59:55 +02:00
Martin Kroeker
904f9a267d
Update version to 0.3.15
2021-05-02 23:50:22 +02:00
Martin Kroeker
4c033730bb
Update version to 0.3.15
2021-05-02 23:49:49 +02:00
Martin Kroeker
65502c6af6
Merge pull request #3216 from xianyi/develop
...
Update from develop for 0.3.15 release
2021-05-02 23:48:28 +02:00
Martin Kroeker
f71627fa2e
Merge pull request #3215 from martin-frbg/cl0315
...
Update Changelog for 0.3.15
2021-05-02 23:47:24 +02:00
Martin Kroeker
d8d7bd33cb
Update Changelog for 0.3.15
2021-05-02 23:46:55 +02:00
Martin Kroeker
e72420e8c5
Merge pull request #3214 from martin-frbg/lapack-3.9.1hrt
...
Add new Householder Reconstruction functions from LAPACK 3.9.1
2021-05-02 23:40:03 +02:00
Martin Kroeker
d00709e016
Add files via upload
2021-05-02 20:47:58 +02:00
Martin Kroeker
d444344497
Add LAPACKE interfaces for the new Householder Reconstruction functions from 3.9.1
2021-05-02 19:57:47 +02:00
Martin Kroeker
fb7308b9b5
Add entries for the new Householder Reconstruction functions from 3.9.1
2021-05-02 19:56:11 +02:00
Martin Kroeker
db50b24a4a
Add entries for the new Householder Reconstruction functions from 3.9.1
2021-05-02 19:55:15 +02:00
Martin Kroeker
88b70fba3e
Add new tests for Householder reconstruction functions from 3.9.1
2021-05-02 19:28:21 +02:00
Martin Kroeker
4c1d47098b
Add new files for Householder reconstruction functions from 3.9.1
2021-05-02 19:25:43 +02:00
Martin Kroeker
40000d1f64
Add entries for Householder reconstruction functions from 3.9.1
2021-05-02 19:21:59 +02:00
Martin Kroeker
dc3664993c
Merge pull request #26 from xianyi/develop
...
rebase
2021-05-02 19:19:28 +02:00
Martin Kroeker
b8232c9054
Merge pull request #3213 from martin-frbg/lapack382
...
Avoid allocating the transposed triangular matrix in LAPACKE_xlantr_work (Reference-LAPACK 382)
2021-05-02 18:45:15 +02:00
Martin Kroeker
114bbbc6d7
Merge pull request #3212 from martin-frbg/lapack463
...
Initialize X and Y to zero for N=0 in xGGGLM (Reference-LAPACK PR463)
2021-05-02 18:44:59 +02:00
Martin Kroeker
b67a92c19f
Merge pull request #3211 from martin-frbg/lapack471
...
Handle norm NaN value in xGESDD (Reference LAPACK PR471)
2021-05-02 18:44:29 +02:00
Martin Kroeker
4bf00da8fb
Avoid allocating the transposed triangular matrix (Reference-LAPACK PR382)
2021-05-02 12:18:17 +02:00
Martin Kroeker
c26780d451
Initialize X and Y to zero for N=0 (Reference-LAPACK PR463)
2021-05-02 11:40:56 +02:00
Martin Kroeker
d77d9bc920
Handle norm NaN value (Reference LAPACK PR471)
2021-05-02 11:24:50 +02:00
Martin Kroeker
37d3e2bd94
Merge pull request #3210 from martin-frbg/lapack502
...
Fix possible division by zero in LAPACK xTGSJA (Reference-LAPACK PR502)
2021-05-02 09:02:11 +02:00
Martin Kroeker
de8656769c
Fix possible division by zero in xTGSJA (Reference-LAPACK PR502)
2021-05-01 21:31:13 +02:00
Martin Kroeker
d43e07198d
Merge pull request #3208 from martin-frbg/lapack534
...
Apply MKL team fixes to the LAPACKE interfaces (Reference-LAPACK PR 534)
2021-05-01 20:18:29 +02:00
Martin Kroeker
da16764c7a
Merge pull request #3209 from martin-frbg/issue3160
...
Add casts to prevent overflow of intermediate results
2021-05-01 20:08:24 +02:00
Martin Kroeker
98ebc8ac59
Add casts to prevent overflow of intermediate result
2021-05-01 14:48:19 +02:00
Martin Kroeker
904b221f03
Add cast to prevent overflow of intermediate result
2021-05-01 14:47:22 +02:00
Martin Kroeker
5cc35abc3d
Apply MKL team fixes to the LAPACKE interfaces (Reference-LAPACK PR 534)
...
Removed spurious checks for INFO in xLACPY,xLASET after routines not returning any,and redundant requirements for ldvt in xGESVD_WORK
2021-05-01 13:22:10 +02:00
Martin Kroeker
254774f5a6
Add const qualifiers
2021-05-01 13:10:16 +02:00
Martin Kroeker
ae9cdee753
Merge pull request #3207 from hjl-tools/hjl/cet/develop
...
x86: Enable Intel CET
2021-05-01 12:42:54 +02:00
H.J. Lu
53ee0b76bb
x86: Enable Intel CET
...
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
Martin Kroeker
dc6b04c375
Merge pull request #3206 from martin-frbg/lapack480535
...
Import packing improvements to LAPACK xLAQR from Reference-LAPACK (PR 480+535)
2021-04-30 21:42:44 +02:00
pnp
3d4ccd2a13
fix for build error
2021-04-30 12:25:33 -04:00
pnp
c59652f0ce
optimize on sgemv_n for small n
2021-04-30 12:14:58 -04:00
Martin Kroeker
87d2e314db
Import packing improvements in LAPACK xLAQR from Reference-LAPACK PR 480+535
2021-04-30 13:50:55 +02:00
Martin Kroeker
3a30c12019
Merge pull request #25 from xianyi/develop
...
rebase
2021-04-30 13:47:17 +02:00
Martin Kroeker
c9a82f54d1
Merge pull request #3204 from martin-frbg/lapack506
...
Correct INFO value returned by SLASQ2/DLASQ2 (Reference-LAPACK 506)
2021-04-30 13:25:48 +02:00
Martin Kroeker
444cb78be5
correct INFO value (Reference-LAPACK 506)
2021-04-30 09:26:54 +02:00
Martin Kroeker
171c20e3b6
Merge pull request #3202 from martin-frbg/issue3201
...
Fix division by zero in the non-x86 codepath of C/ZROTG
2021-04-29 18:58:27 +02:00
Martin Kroeker
c5fb91f1bc
Fix division by zero in the non-x86 codepath
2021-04-29 09:47:18 +02:00
Martin Kroeker
9a36a283d3
Merge pull request #3199 from martin-frbg/lapack537
...
Add LAPACKE fixes from Reference-LAPACK PR 537
2021-04-29 05:39:50 +02:00
Martin Kroeker
7e35d25ea0
Merge pull request #3198 from martin-frbg/lapack539
...
Apply fixes from Reference-LAPACK PR468 and 539 for array declarations in ?ORGBR/?UNGBR
2021-04-29 05:39:35 +02:00
Martin Kroeker
3704f5e5b0
Add missing break statements in the ?lascl functions
2021-04-28 20:56:55 +02:00
Martin Kroeker
6b76066632
Add const qualifiers
2021-04-28 20:55:37 +02:00
Martin Kroeker
2b01132515
Clean up misdeclaration of the dummy stand-in for A in ?ORGBR/?UNGBR workspace queries (Reference-LAPACK PR 468 and 530)
2021-04-28 19:20:08 +02:00
Martin Kroeker
8e95a1e18d
Merge pull request #3195 from martin-frbg/lapack536
...
Apply lapack-testing fix from Reference-LAPACK PR536
2021-04-28 18:17:25 +02:00
Wangyang Guo
aa7b3dc3db
GEMM: skylake: improve the performance when m is small
2021-04-28 13:56:06 +00:00
Martin Kroeker
13a29d13fd
Apply lapack-testing fix from Reference-LAPACK PR536
...
fixes changing back from a single OMP thread for error exit testing to the originally requested number of threads for computational tests
2021-04-27 15:48:22 +02:00
Martin Kroeker
a6c2cb8417
Merge pull request #3193 from martin-frbg/lapack538
...
Apply lapack-testing fixes from Reference-LAPACK PR538
2021-04-27 15:40:51 +02:00
Martin Kroeker
d511a7bb4f
Merge pull request #3191 from martin-frbg/issue3188
...
Delay creation of the (soft)link until after the library has been built
2021-04-27 13:35:16 +02:00
Martin Kroeker
3526ff2507
Apply fixes from Reference-LAPACK PR538
2021-04-27 12:52:49 +02:00
Martin Kroeker
adcfe7b789
Merge pull request #3190 from martin-frbg/issue3128-2
...
Replace spurious AVX512 requirement in the Haswell drot microkernel with an AVX2/FMA3 guard
2021-04-27 06:36:28 +02:00
damonyu
ceb44bef14
update the intrinsic api to the offical name.
2021-04-27 11:12:29 +08:00
damonyu1989
ed473267df
Merge pull request #1 from xianyi/develop
...
update
2021-04-27 10:53:59 +08:00
Martin Kroeker
0608bc5d82
delay creation of the softlink until after the library has been created
2021-04-26 22:32:23 +02:00
Martin Kroeker
3d511f0e66
replace spurious avx512 requirement with fma check
2021-04-26 21:55:30 +02:00
Martin Kroeker
0b8a436af9
Add mixed clang/ifort build on OSX to Azure CI ( #3185 )
...
* Add mixed clang/ifort build on OSX to the Azure CI config based on https://github.com/oneapi-src/oneapi-ci
(and remove debugging tools from the clang+gfortran job)
* Remove extraneous libgfortran dependency of ifort builds
* remove FEXTRALIB from link line of shared library as ifort keeps track of dependencies (and they are different for a .dylib than what f_check got for an executable)
2021-04-22 02:11:20 +02:00
Martin Kroeker
352efdd13a
Merge pull request #24 from xianyi/develop
...
rebase
2021-04-20 21:30:28 +02:00
Martin Kroeker
4855af02a3
Merge pull request #3184 from martin-frbg/ctestfix
...
Fix obscure ctest crashes on OSX and add OSX builds to Azure CI
2021-04-20 07:31:07 +02:00
Martin Kroeker
94a5a1f0f1
Add OSX build variations to Azure CI
2021-04-19 22:27:08 +02:00
Martin Kroeker
751d127d7c
Include cblas_test.h to achieve int/long size change with INTERFACE64
2021-04-19 22:26:34 +02:00
Martin Kroeker
fc101b67e5
Merge pull request #23 from xianyi/develop
...
rebase
2021-04-19 22:24:12 +02:00
Martin Kroeker
b0239a05fd
Merge pull request #3183 from martin-frbg/2715-x
...
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
2021-04-16 14:52:12 +02:00
Martin Kroeker
623d580b4c
Restore __volatile__ keyword
2021-04-16 10:27:32 +02:00
Martin Kroeker
974acb39ff
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
...
POWER10: Improve dgemm performance
2021-04-14 22:43:02 +02:00
Rajalakshmi Srinivasaraghavan
2379abaa5e
POWER10: Improve dgemm performance
...
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
2021-04-13 22:30:06 -05:00
Martin Kroeker
3caf781d7c
Merge pull request #3179 from RajalakshmiSR/zgemvp10
...
POWER10: Optimized zgemv
2021-04-11 10:01:09 +02:00
Rajalakshmi Srinivasaraghavan
55bb9f639a
POWER10: Optimized zgemv
...
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
2021-04-10 19:00:24 -05:00
Martin Kroeker
0dba04bb58
Merge pull request #3178 from martin-frbg/fix2864
...
Fix unwanted fallback to implicit typing in slanv2/dlanv2
2021-04-09 13:38:05 +02:00
Martin Kroeker
e96f5e3c65
Fix implicit typing of new variable TWO
2021-04-09 10:04:15 +02:00
Martin Kroeker
558724e99f
Fix implicit typing of new variable TWO
2021-04-09 10:03:31 +02:00
Martin Kroeker
067c96a873
Merge pull request #3177 from martin-frbg/issue3176
...
Use "old" compute(24) function with clang due to register limitations
2021-04-07 08:22:42 +02:00
Martin Kroeker
4b380c0b40
Merge pull request #3175 from LYP951018/develop
...
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
2021-04-07 08:22:28 +02:00
Martin Kroeker
2dfb24730d
Use "old" compute(24) function with clang due to register limitations
2021-04-06 19:58:32 +02:00
刘雨培
725432efaa
pass NO_AVX512 macro def
2021-04-07 00:10:41 +08:00
Martin Kroeker
a2216ef19f
Merge pull request #3173 from martin-frbg/dyna-sse3
...
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
2021-04-05 13:39:17 +02:00
Martin Kroeker
5332cbae18
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds
2021-04-04 23:12:17 +02:00
Martin Kroeker
209b026e46
Merge pull request #3172 from martin-frbg/lapack477-final
...
Copy missing fixes from the final revision of Reference-LAPACK PR477
2021-04-04 20:19:09 +02:00
Martin Kroeker
1ae607beca
Update Makefile.x86_64
2021-04-04 12:31:22 +02:00
Martin Kroeker
d393f1923f
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
...
for #3139
2021-04-03 22:18:15 +02:00
Martin Kroeker
081d5ae971
Fix typo and potentially undefined variables
...
(copies fixes made in Reference-LAPACK PR 477 after the initial cherrypick)
2021-04-03 22:11:14 +02:00
Martin Kroeker
0492f0f3f9
Merge pull request #22 from xianyi/develop
...
rebase
2021-04-03 21:58:36 +02:00
Martin Kroeker
147e0a75fd
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
...
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
2021-04-03 19:49:47 +02:00
Martin Kroeker
ee068af843
Merge pull request #3171 from RajalakshmiSR/BE_p10
...
POWER10: Adding check for little endian
2021-04-01 21:20:24 +02:00
Rajalakshmi Srinivasaraghavan
2dbcddd83d
POWER10: Adding check for little endian
...
This patch makes sure that recent POWER10 patches are used
only for little endian.
2021-03-31 21:32:42 -05:00
CodesWithWolves
d2bda3b56a
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
...
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
2021-03-31 15:44:25 -04:00
Martin Kroeker
903fd85c85
Merge pull request #3167 from xianyi/fix3126
...
Fix compilation of the benchmarks on older OSX versions
2021-03-27 12:40:42 +01:00
Martin Kroeker
d57c681a6d
Fix compilation on older OSX versions
2021-03-26 22:29:29 +01:00
Martin Kroeker
d7efe5857c
Merge pull request #3165 from martin-frbg/azure-osx
...
Add OSX build to Azure
2021-03-24 14:05:34 +01:00
Martin Kroeker
8fd694c18f
Update .travis.yml
2021-03-24 10:36:29 +01:00
Martin Kroeker
e69b0b1771
Update azure-pipelines.yml
2021-03-24 10:34:24 +01:00
Martin Kroeker
9dc0bfd617
Update azure-pipelines.yml
2021-03-24 08:54:30 +01:00
Martin Kroeker
e6664ec2c9
Update azure-pipelines.yml
2021-03-24 08:41:48 +01:00
Martin Kroeker
dbb33f412f
Update azure-pipelines.yml
2021-03-24 08:30:48 +01:00
Martin Kroeker
70b89a6205
Add OSX build to Azure
2021-03-24 07:50:35 +01:00
Martin Kroeker
07b144855a
Merge pull request #3164 from martin-frbg/travisosxomp
...
Fix xcode12 build on Travis and add OSX/OpenMP job
2021-03-24 06:56:10 +01:00
Martin Kroeker
292a0aed66
Fix xcode12 build and add OSX/OpenMP
2021-03-24 06:55:14 +01:00
Martin Kroeker
42f0201e21
Merge pull request #20 from xianyi/develop
...
rebase
2021-03-22 17:53:43 +01:00
Martin Kroeker
22db876d48
Merge pull request #3158 from austinpagan/Gemm.CZPQ
...
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 20:53:21 +01:00
Martin Kroeker
bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final
...
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
2021-03-19 15:23:12 +01:00
Martin Kroeker
7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
...
Move x86_64 DOMATCOPY_RT back to the C implementation
2021-03-19 15:22:48 +01:00
Gordon Fossum
198adea961
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 10:05:23 -04:00
Martin Kroeker
86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
2021-03-19 11:47:58 +01:00
Martin Kroeker
ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
2021-03-19 11:46:25 +01:00
Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
2021-03-19 11:44:31 +01:00
Martin Kroeker
c4b91bfcf1
Merge pull request #3155 from martin-frbg/issue3152
...
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
2021-03-19 09:55:31 +01:00
Martin Kroeker
0f5e86a0d9
Remove premature entry for DOMATCOPY_RT
2021-03-18 21:53:50 +01:00
Martin Kroeker
7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
2021-03-18 21:28:19 +01:00
Martin Kroeker
1e4b2e98d9
Merge pull request #3154 from martin-frbg/issue3153
...
Fix premature include in getarch_2nd
2021-03-18 12:35:47 +01:00
Martin Kroeker
3fd6ccdf76
Include just the definition of BLASLONG rather than all of common.h
2021-03-18 07:50:19 +01:00
Martin Kroeker
fa9a30b491
Merge pull request #19 from xianyi/develop
...
rebase
2021-03-18 07:47:03 +01:00
Martin Kroeker
d90ca75a6c
Update version to 0.3.14.dev
2021-03-17 21:14:42 +01:00
Martin Kroeker
e107454454
Update version to 0.3.14.dev
2021-03-17 21:14:05 +01:00
Martin Kroeker
d43962d013
Merge pull request #3151 from xianyi/release-0.3.0
...
Merge 0.3.14 release branch back into develop to acquire tag
2021-03-17 21:13:25 +01:00
Martin Kroeker
2f6d35c3d4
Merge pull request #3150 from xianyi/develop
...
Update branch from develop for 0.3.14 release
2021-03-17 20:21:42 +01:00
Martin Kroeker
86de5f768b
Update version to 0.3.14 for release
2021-03-17 20:20:34 +01:00
Martin Kroeker
2663e44724
Update version to 0.3.14 for release
2021-03-17 20:20:00 +01:00
Martin Kroeker
6f2900c164
Merge pull request #3149 from martin-frbg/changelog14
...
Update Changelog for 0.3.14
2021-03-17 20:14:50 +01:00
Martin Kroeker
7888b5127c
Update Changelog for 0.3.14
2021-03-17 16:17:55 +01:00
Martin Kroeker
8808c291b9
Merge pull request #3148 from martin-frbg/issue3145
...
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
2021-03-17 09:05:43 +01:00
Martin Kroeker
8cdf0825de
Add workaround for older gcc on ppc64be not supporting casts in defines
2021-03-16 21:20:05 +01:00
Martin Kroeker
9e0dbe8e59
Merge pull request #18 from xianyi/develop
...
rebase
2021-03-16 21:09:45 +01:00
Martin Kroeker
52f99d3944
Merge pull request #3147 from martin-frbg/issue3146
...
Fix DYNAMIC_ARCH builds with CLANG on ppc64
2021-03-16 20:25:42 +01:00
Martin Kroeker
186368ddc3
Fix compilation with CLANG
2021-03-16 16:52:57 +01:00
Martin Kroeker
c0b94ae1df
Merge pull request #3143 from martin-frbg/fix3088
...
Resolve circular dependency between common.h and param.h
2021-03-14 23:12:55 +01:00
Martin Kroeker
ddd86309a1
Merge pull request #3144 from xoviat/fix-test
...
disable openmp
2021-03-14 23:12:33 +01:00
xoviat
e9d453b623
disable openmp
2021-03-14 16:34:02 -05:00
Martin Kroeker
ecb4babcf4
remove inclusion of common.h again to avoid circular dependency
2021-03-14 17:36:51 +01:00
Martin Kroeker
34753eaebb
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies
2021-03-14 17:28:43 +01:00
Martin Kroeker
efa72a631b
Merge pull request #17 from xianyi/develop
...
rebase
2021-03-14 17:20:49 +01:00
Martin Kroeker
30d835168a
Merge pull request #3088 from xoviat/msvc
...
add misc fixes.
2021-03-14 17:14:28 +01:00
Martin Kroeker
8f6a744807
Merge pull request #3141 from martin-frbg/nagfor-2
...
Leave out ARM64 march/mtune options when compiling with nagfor
2021-03-13 23:04:53 +01:00
Martin Kroeker
6726771645
Support compilation with NAG fortran
2021-03-13 20:16:18 +01:00
Martin Kroeker
a51cae6b2e
Merge pull request #3140 from martin-frbg/issue3139
...
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
2021-03-12 15:35:58 +01:00
Martin Kroeker
d30b943251
Merge pull request #3138 from martin-frbg/nagfor
...
Add support for compilation with the NAG Fortran compiler
2021-03-12 12:46:19 +01:00
Martin Kroeker
0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support
2021-03-12 12:42:05 +01:00
Martin Kroeker
697e64bbb6
Fix syntax
2021-03-11 23:03:58 +01:00
Martin Kroeker
bffb9b0e95
Merge pull request #3136 from austinpagan/Gemm.PQ
...
Modifying a couple parameters in the "POWER10"-specific section of pa…
2021-03-11 15:17:48 +01:00
Martin Kroeker
6ae7af78a3
Support compilation with nagfor
2021-03-11 11:53:51 +01:00
Martin Kroeker
041a26fd79
Support compilation with nagfor
2021-03-11 11:52:29 +01:00
Martin Kroeker
3c356b1a1f
Support compilation with the NAG Fortran compiler
2021-03-11 11:51:09 +01:00
Martin Kroeker
b1215f2f8c
Merge pull request #16 from xianyi/develop
...
rebase
2021-03-11 11:48:37 +01:00
Martin Kroeker
0b73041b16
Merge pull request #3137 from RajalakshmiSR/zscal_p10
...
Optimize zscal function for POWER10
2021-03-11 07:18:05 +01:00
austinpagan
9579bd47e5
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.
2021-03-10 18:19:12 -05:00
Rajalakshmi Srinivasaraghavan
09d47af2c0
Optimize zscal function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-10 17:15:33 -06:00
Martin Kroeker
ef0238ba2b
Merge pull request #3130 from martin-frbg/issue3128
...
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
2021-03-06 19:15:53 +01:00
Martin Kroeker
a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
2021-03-06 14:35:49 +01:00
Martin Kroeker
1d254d321b
Merge pull request #3129 from RajalakshmiSR/asum_p10
...
Optimize s/dasum function for POWER10
2021-03-06 09:13:59 +01:00
Rajalakshmi Srinivasaraghavan
41646ed006
Optimize s/dasum function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-05 16:22:36 -06:00
Martin Kroeker
3679781872
Merge pull request #3126 from martin-frbg/m1bench
...
Support timing Apple M1 in the benchmarks
2021-03-02 21:27:21 +01:00
Martin Kroeker
38dcf3454b
Support timing Apple M1
2021-03-02 17:50:55 +01:00
Martin Kroeker
e34d57ca90
Merge pull request #3125 from martin-frbg/issue3123
...
Fix AMD AOCC compiler detection
2021-03-02 09:58:40 +01:00
Martin Kroeker
20f492c298
Fix AMD AOCC compiler detection
2021-03-01 21:00:10 +01:00
Martin Kroeker
c7c82be1c3
Merge pull request #3122 from martin-frbg/xeigtstz
...
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
2021-02-28 22:13:09 +01:00
Martin Kroeker
9564f688c4
Adjust build rules for ?chkee.F
2021-02-28 18:57:05 +01:00
Martin Kroeker
90c1776c86
Adjust build rules for ?chkee.F
2021-02-28 18:53:20 +01:00
Martin Kroeker
9cf861e8fa
Add rewritten cchkee.F from Reference-LAPACK PR335
2021-02-28 18:51:03 +01:00
Martin Kroeker
9b7b1da133
Add rewritten dchkee.F from Reference-LAPACK PR335
2021-02-28 18:50:26 +01:00
Martin Kroeker
a5ab891292
Add rewritten schkee.F from Reference-LAPACK PR335
2021-02-28 18:49:50 +01:00
Martin Kroeker
90bb4ac821
Add rewritten zchkee.F from Reference-LAPACK PR335
2021-02-28 18:49:10 +01:00
Martin Kroeker
23a0d1bc1f
Delete zchkee.f
2021-02-28 18:47:06 +01:00
Martin Kroeker
0e96c378fd
Delete schkee.f
2021-02-28 18:46:52 +01:00
Martin Kroeker
ee16efff3c
Delete dchkee.f
2021-02-28 18:46:38 +01:00
Martin Kroeker
0197519dd7
Delete cchkee.f
2021-02-28 18:46:08 +01:00
Martin Kroeker
865829cfac
Merge pull request #3121 from RajalakshmiSR/mmarename
...
POWER10: Rename mma builtins
2021-02-27 19:15:49 +01:00
Rajalakshmi Srinivasaraghavan
0571c3187b
POWER10: Rename mma builtins
...
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
2021-02-26 20:56:34 -06:00
Martin Kroeker
d12a2d0d04
Merge pull request #3120 from martin-frbg/3118-x
...
Fix use of undefined CC variable in f_check
2021-02-26 11:50:47 +01:00
Martin Kroeker
2d369bd916
fix undefined CC variable
2021-02-26 09:09:43 +01:00
Martin Kroeker
93843c55b6
Merge pull request #15 from xianyi/develop
...
rebase
2021-02-26 09:06:25 +01:00
Martin Kroeker
e3a6132e12
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
...
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:33 +01:00
Martin Kroeker
736f0146c3
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:04 +01:00
Martin Kroeker
897fc2b6ef
Merge pull request #3118 from martin-frbg/issue3018-2
...
Fix undefined CC in f_check (again)
2021-02-25 13:48:41 +01:00
Martin Kroeker
441c116105
fix undefined CC again
2021-02-25 13:47:34 +01:00
Martin Kroeker
8ecd80a34a
Merge pull request #14 from xianyi/develop
...
rebase
2021-02-25 13:45:27 +01:00
Martin Kroeker
4ba53db0da
Merge pull request #3117 from haampie/fix-perl
...
use /usr/bin/env perl
2021-02-24 18:39:28 +01:00
Martin Kroeker
6c365ff648
Merge pull request #3114 from martin-frbg/issue3113
...
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
2021-02-24 18:38:25 +01:00
Martin Kroeker
e33bcdbb7b
Merge pull request #3115 from martin-frbg/issue2532
...
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
2021-02-24 18:37:36 +01:00
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
2021-02-24 14:07:20 +01:00
Martin Kroeker
292d1af1a0
Update omatcopy_rt.c
2021-02-24 09:34:14 +01:00
Martin Kroeker
325b398e3c
Update omatcopy_rt.c
2021-02-24 09:13:12 +01:00
Martin Kroeker
6f5667b4d4
Enable optimized S/D OMATCOPY_RT
2021-02-24 09:03:41 +01:00
Martin Kroeker
cceeee7806
Add optimized omatcopy_rt
2021-02-24 09:00:54 +01:00
Martin Kroeker
0a4546b742
Typo fix
2021-02-23 13:14:35 +01:00
Martin Kroeker
b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
...
as suggested by MigMuc in issue 2532
2021-02-22 21:35:42 +01:00
Martin Kroeker
1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
2021-02-22 19:40:36 +01:00
Martin Kroeker
86a5f98e4a
Merge pull request #13 from xianyi/develop
...
rebase
2021-02-22 19:31:41 +01:00
Martin Kroeker
1caa44bea9
Merge pull request #3111 from hawkinsp/forkrace
...
Fix race in blas_thread_shutdown.
2021-02-19 09:57:18 +01:00
Peter Hawkins
dbbf92c1d1
Fix race in blas_thread_shutdown.
...
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
2021-02-18 13:46:50 -05:00
Martin Kroeker
cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
...
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 15:45:25 +01:00
Martin Kroeker
b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 11:14:05 +01:00
Martin Kroeker
f9aaf22fc3
Merge pull request #3105 from martin-frbg/tigerlake
...
Recognize Intel Tiger Lake CPUID as SkylakeX
2021-02-12 13:29:53 +01:00
Martin Kroeker
35ff3c731d
Merge pull request #3106 from RajalakshmiSR/ppcbe
...
Fix build issue on POWER8 with DYNAMIC_ARCH
2021-02-12 13:29:23 +01:00
Rajalakshmi Srinivasaraghavan
63fa6c832e
Fix build issue on POWER8 with DYNAMIC_ARCH
...
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
2021-02-11 21:28:03 -06:00
Martin Kroeker
e4e5042e38
Recognize Intel Tiger Lake as SkylakeX
2021-02-11 20:17:11 +01:00
Martin Kroeker
ae53e3e233
Recognize Intel Tiger Lake as SkylakeX
2021-02-11 20:16:27 +01:00
Martin Kroeker
074d9bff7f
Merge pull request #3104 from martin-frbg/issue3103
...
Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen
2021-02-11 15:42:47 +01:00
Martin Kroeker
f36862603a
Merge pull request #3101 from jake-arkinstall/issue-3100
...
Addressed issue #3100 - removing an unnecessary write to the include directory
2021-02-11 15:42:18 +01:00
Martin Kroeker
47691c031f
Use Haswell optimizations for Zen as well
2021-02-11 09:26:15 +01:00
Martin Kroeker
ce7ddd8921
Use Haswell optimizations for Zen as well
2021-02-11 09:25:36 +01:00
Martin Kroeker
950c047b49
Use Haswell optimizations for Zen as well
2021-02-11 09:24:51 +01:00
Martin Kroeker
46509953a9
Use Haswell optimizations for Zen as well
2021-02-11 09:24:16 +01:00
Martin Kroeker
db348dcff2
Enable optimized srot/drot kernels from Haswell
2021-02-11 09:23:05 +01:00
Martin Kroeker
a33f471065
Merge pull request #3102 from martin-frbg/issue3099
...
Strip pkgversion info from compiler version string before comparing
2021-02-11 08:56:46 +01:00
Martin Kroeker
ece3ce581e
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
2021-02-10 14:22:59 +01:00
Martin Kroeker
8189a98d85
Merge pull request #12 from xianyi/develop
...
rebase
2021-02-10 14:17:24 +01:00
Jake Arkinstall
d7a77091a3
Addressed issue #3100 , removing an unnecessary write to the include directory
2021-02-10 12:11:17 +00:00
Martin Kroeker
3e1e74fca6
Merge pull request #3094 from xoviat/patch-1
...
build openmp on appveyor
2021-02-02 13:36:17 +01:00
Martin Kroeker
33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
...
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker
95e19e2e23
fix case in compiler name check
...
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com >
2021-02-02 10:53:46 +01:00
Martin Kroeker
99ac042702
remove spurious lines (probably editor malfunction)
2021-02-01 21:02:53 +01:00
Martin Kroeker
774b9f8653
handle AppleClang in Cooperlake support condition
2021-02-01 20:18:53 +01:00
Martin Kroeker
eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
2021-02-01 19:45:25 +01:00
xoviat
6fa9860dbe
appveyor: cleanup and add openmp run
2021-01-31 16:32:04 -06:00
Martin Kroeker
0cc36770f1
Merge pull request #3073 from xoviat/embedded
...
add embedded option
2021-01-31 18:02:41 +01:00
Martin Kroeker
558cd543bf
Merge pull request #3093 from martin-frbg/fix3064
...
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 22:21:28 +01:00
Martin Kroeker
bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 16:46:25 +01:00
Martin Kroeker
35086cb501
Merge pull request #3092 from RajalakshmiSR/cscal_p10
...
Optimize cscal function for POWER10
2021-01-30 16:23:37 +01:00
Rajalakshmi Srinivasaraghavan
2056ffc227
Optimize cscal function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-29 13:51:43 -06:00
Martin Kroeker
7745439312
Merge pull request #3091 from martin-frbg/lapack477-2
...
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
2021-01-29 13:37:23 +01:00
Martin Kroeker
c4b5abbe43
fix data type
2021-01-29 10:45:36 +01:00
Martin Kroeker
f87842483e
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477)
2021-01-29 09:56:12 +01:00
Martin Kroeker
3dbb32c734
Merge pull request #11 from xianyi/develop
...
rebase
2021-01-29 09:52:21 +01:00
xoviat
609ea80276
enable testing
2021-01-27 16:39:52 -06:00
xoviat
3dfecaaf7c
require nofortran to be set on msvc
2021-01-27 16:39:15 -06:00
xoviat
3165c915b6
fix test helpers
2021-01-27 15:24:49 -06:00
xoviat
457ccc42c9
Merge branch 'develop' into msvc
2021-01-27 14:15:59 -06:00
Martin Kroeker
00880c720a
Merge pull request #3087 from martin-frbg/lapack477
...
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
2021-01-27 19:11:55 +01:00
Martin Kroeker
856bc36533
Add exceptional shift to fix rare convergence problems
2021-01-27 13:41:45 +01:00
Martin Kroeker
fe71887b68
Merge pull request #10 from xianyi/develop
...
rebase
2021-01-27 13:39:26 +01:00
Martin Kroeker
10094bd885
Merge pull request #3076 from martin-frbg/dyn-thunderx
...
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
2021-01-27 13:25:45 +01:00
Martin Kroeker
eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
...
Fix null pointer check in blas_memory_alloc
2021-01-26 20:11:42 +01:00
Martin Kroeker
85be43e0df
Merge pull request #3083 from martin-frbg/develop
...
Add DYNAMIC_LIST support for ARM64
2021-01-26 15:13:35 +01:00
Martin Kroeker
0cb9e9fc8d
Remove the VORTEX support bits again for now
2021-01-25 19:02:21 +01:00
Martin Kroeker
cb61d3b46b
Add DYNAMIC_LIST support for ARM64
2021-01-25 13:13:20 +01:00
Alex Henrie
113840da12
Fix null pointer check in blas_memory_alloc
2021-01-24 22:20:44 -07:00
Martin Kroeker
deb2e66bcc
Add DYNAMIC_LIST support for ARM64
2021-01-24 23:18:52 +01:00
Martin Kroeker
9b2d69aa80
Add DYNAMIC_LIST option for ARM64
2021-01-24 23:18:01 +01:00
Martin Kroeker
e3ff4cdd23
Merge pull request #9 from xianyi/develop
...
rebase
2021-01-24 23:14:45 +01:00
Martin Kroeker
0745ba43a4
Merge pull request #3082 from RajalakshmiSR/scalp10
...
Optimize s/dscal function for POWER10
2021-01-24 19:03:40 +01:00
Rajalakshmi Srinivasaraghavan
3ede843d50
Optimize s/dscal function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
xoviat
2e8d6e8690
add functions for embedded
2021-01-23 22:12:17 -06:00
Martin Kroeker
69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
...
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00
Martin Kroeker
d6905403e3
Merge pull request #3068 from alexhenrie/scan-build
...
scan-build fixes
2021-01-23 19:06:29 +01:00
Martin Kroeker
411926b572
Merge pull request #3079 from RajalakshmiSR/rotp10
...
Optimize s/drot function for POWER10
2021-01-22 08:26:00 +01:00
Rajalakshmi Srinivasaraghavan
439b93f6d2
Optimize s/drot function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-21 13:24:45 -06:00
Martin Kroeker
d6cf67778c
Merge pull request #3075 from martin-frbg/issue3074
...
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
2021-01-21 08:51:30 +01:00
Martin Kroeker
b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
2021-01-20 21:34:36 +01:00
Martin Kroeker
6178974cd9
Update .drone.yml
2021-01-20 20:21:27 +01:00
Martin Kroeker
0b9e4d1278
Add gcc10/arm64 DYNAMIC_ARCH build
2021-01-20 18:30:05 +01:00
Martin Kroeker
63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
...
fixes #3074
2021-01-20 15:41:04 +01:00
Martin Kroeker
3612d9a57a
Merge pull request #8 from xianyi/develop
...
rebase
2021-01-20 15:38:30 +01:00
xoviat
b60de4447a
add cortex-m platform
2021-01-19 08:57:44 -06:00
Martin Kroeker
16dddb760e
Merge pull request #3070 from RajalakshmiSR/cdot
...
Optimize cdot function for POWER10
2021-01-16 15:47:34 +01:00
Rajalakshmi Srinivasaraghavan
eff7c9166e
Optimize cdot function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-15 13:40:34 -06:00
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Alex Henrie
202fc9e8ed
Fix uninitialized argument value in dasum_k
2021-01-14 19:40:31 -07:00
Martin Kroeker
e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
...
Fix building "generic" TRMM kernel with CMake
2021-01-14 21:35:19 +01:00
Martin Kroeker
3628b22d49
Merge pull request #3064 from martin-frbg/issue3063
...
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
2021-01-14 16:47:59 +01:00
Martin Kroeker
af2b0d0205
Merge pull request #3066 from martin-frbg/buffsizefix
...
Fix compile-time setting of the GEMM buffer size for gmake builds
2021-01-14 16:00:38 +01:00
Martin Kroeker
4bf988959a
Merge pull request #3062 from austinpagan/GemmPreferedSize3
...
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
2021-01-14 15:59:53 +01:00
Martin Kroeker
a0e4fb3a28
Merge pull request #3061 from martin-frbg/arm64-pgi
...
Support NVIDIA HPC SDK on ARM64
2021-01-14 15:59:21 +01:00
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
...
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Albert Ziegenhagel
e3f4063683
Fix building "generic" TRMM kernel with CMake
...
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
2021-01-14 10:00:49 +01:00
Martin Kroeker
6bbe6d5b92
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
2021-01-13 22:36:04 +01:00
Martin Kroeker
89ae305e11
Workaround for cmake having its own C_COMPILER variable
2021-01-13 12:30:26 +01:00
Martin Kroeker
da8d7f09f1
try to work around gcc update problems
2021-01-13 09:46:53 +01:00
Martin Kroeker
25c986db5a
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
2021-01-13 00:30:27 +01:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
bc5b35367f
restore Makefile after accidental overwrite
2021-01-13 00:28:43 +01:00
Martin Kroeker
930aff2c2e
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:27:42 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
9ccb12b031
Add prototypes for cblas_csrot and cblas_zdrot
2021-01-12 23:20:07 +01:00
Martin Kroeker
e18a2c22db
Merge pull request #3060 from martin-frbg/dyn_arm64
...
Label the assembly part of the ARMV8 dynamic arch detection as volatile
2021-01-12 23:02:05 +01:00
Martin Kroeker
b716c0ef01
Add workaround for NVIDIA HPC
2021-01-12 16:51:35 +01:00
Martin Kroeker
2efa3b70dc
Add workaround for NVIDIA HPC
2021-01-12 16:49:39 +01:00
Martin Kroeker
49959d4f1c
Add workaround for NVIDIA HPC
2021-01-12 16:47:15 +01:00
Martin Kroeker
0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:39:35 +01:00
Martin Kroeker
c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:38:51 +01:00
Martin Kroeker
43aac5bacc
Support NVIDIA HPC compiler
2021-01-12 16:36:12 +01:00
Martin Kroeker
bff2b7c94d
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
2021-01-12 16:34:18 +01:00
Martin Kroeker
2d45a262d9
Support compilation with nvfortran
2021-01-12 16:32:29 +01:00
Gordon Fossum
ed652d8136
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
2021-01-11 21:13:53 -05:00
Martin Kroeker
6fe0f1fab9
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
2021-01-11 19:05:29 +01:00
Chen, Guobing
b0beb0b1ca
Initial code for Cooperlake BF16 GEMM kernel
2021-01-11 02:15:21 +08:00
Martin Kroeker
018dec8588
Merge pull request #7 from xianyi/develop
...
rebase
2021-01-10 17:09:46 +01:00
Martin Kroeker
5d6209e1f9
Merge pull request #3055 from RajalakshmiSR/swapp10
...
Optimize swap function for POWER10
2021-01-09 00:11:44 +01:00
Rajalakshmi Srinivasaraghavan
601b711c78
Optimize swap function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-08 08:01:36 -06:00
Martin Kroeker
78702753f2
Merge pull request #3053 from pkubaj/patch-1
...
Fix build on FreeBSD/powerpc64le
2021-01-02 16:14:07 +01:00
pkubaj
7aa1ff8ff6
Fix build on FreeBSD/powerpc64le
2021-01-01 21:19:57 +00:00
Martin Kroeker
d6c97cf010
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
...
arm64: Fix nrm2 for input vectors with Inf
2021-01-01 15:51:07 +01:00
Ashwin Sekhar T K
1b2508362b
arm64: Fix nrm2 for input vectors with Inf
...
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
2021-01-01 02:49:37 -08:00
Martin Kroeker
cd898af59f
Merge pull request #3050 from aurel32/riscv64-openblas-supported
...
getarch.c: define OPENBLAS_SUPPORTED for riscv64
2020-12-29 21:59:40 +01:00
Aurelien Jarno
0a535e58d8
getarch.c: define OPENBLAS_SUPPORTED for riscv64
2020-12-29 12:06:39 +00:00
Martin Kroeker
9ce9e295fe
Merge pull request #3049 from martin-frbg/readme
...
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
2020-12-27 22:54:20 +01:00
Martin Kroeker
9a38592c79
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
2020-12-27 21:55:08 +01:00
Martin Kroeker
9b3965b08c
Merge pull request #6 from xianyi/develop
...
rebase
2020-12-27 21:28:10 +01:00
Martin Kroeker
531cb4f673
Merge pull request #3035 from Joshua-Ashton/patch-1
...
Define BLAS acronym in README
2020-12-27 21:26:52 +01:00
Martin Kroeker
3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998
...
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
2020-12-21 13:30:08 +01:00
Martin Kroeker
8631e2976a
Temporarily revert to the old nrm2 kernels
2020-12-21 07:45:13 +01:00
Martin Kroeker
2768bc1764
Temporarily revert to the old nrm2 kernels
2020-12-21 07:42:51 +01:00
Martin Kroeker
6f4698ee1f
Temporarily revert to the old nrm2 kernel
2020-12-21 07:41:18 +01:00
Martin Kroeker
85e5165e98
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
...
Support NVIDIA HPC SDK on POWERPC
2020-12-20 11:55:53 +01:00
Martin Kroeker
17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
2020-12-19 23:21:22 +01:00
Martin Kroeker
91c3f86c2b
NVIDIA compiler does not yet support POWER10
2020-12-19 23:19:05 +01:00
Martin Kroeker
75b1f3becc
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
2020-12-19 23:17:40 +01:00
Martin Kroeker
07c5e549b2
Merge pull request #3045 from martin-frbg/nvidiasdk
...
Support NVIDIA HPC SDK 20.11 compilers on x86_64
2020-12-19 23:14:02 +01:00
Martin Kroeker
114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
2020-12-19 22:15:58 +01:00
Martin Kroeker
005cce5507
Amend SkylakeX options to support the NVIDIA compiler
2020-12-19 22:11:49 +01:00
Martin Kroeker
b859b6e79d
Add nvfortran
2020-12-19 22:09:57 +01:00
Martin Kroeker
b212a2fb9f
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
2020-12-19 22:08:37 +01:00
Martin Kroeker
e40416567a
Add version printout for PGI/NVIDIA compiler
2020-12-19 22:06:56 +01:00
Martin Kroeker
b37e5fa2f8
Merge pull request #5 from xianyi/develop
...
rebase
2020-12-19 20:11:06 +01:00
Martin Kroeker
326469ef4a
Merge pull request #3042 from martin-frbg/develop
...
Move FMA3 option setting to the kernel makefile
2020-12-19 20:04:19 +01:00
Martin Kroeker
c73d8ee40d
Conditionally add -mfma to compiler options where needed
2020-12-17 11:34:05 +01:00
Martin Kroeker
abef2ea770
Move -fma option setting to kernel/Makefile.L1
2020-12-17 11:32:27 +01:00
Martin Kroeker
b26e32c3af
Merge pull request #3040 from martin-frbg/fixfcheck
...
Fix undefined CC variable in check for clang+gfortran combo
2020-12-16 00:05:04 +01:00
Martin Kroeker
7822eff936
Merge pull request #3038 from martin-frbg/issue3037
...
Fix spurious assumption of cross-compilation on some architectures
2020-12-16 00:04:45 +01:00
Martin Kroeker
865676682d
Add Intel Rocket Lake
2020-12-14 22:40:23 +01:00
Martin Kroeker
0f7776af0b
Add Intel Rocket Lake
2020-12-14 22:30:36 +01:00
Martin Kroeker
b03dc011be
Fix undefined CC variable in clang check
2020-12-14 19:21:52 +01:00
Martin Kroeker
00ce35336e
Fix spurious removal of a trailing character from the hostarch string on x86_64
2020-12-13 21:28:01 +01:00
Martin Kroeker
723776ddf7
Merge pull request #4 from xianyi/develop
...
rebase
2020-12-13 21:22:41 +01:00
Martin Kroeker
5a77ec7f1c
Merge pull request #3036 from RajalakshmiSR/p10copyalign
...
POWER10: Improve copy performance
2020-12-13 21:21:34 +01:00
Rajalakshmi Srinivasaraghavan
2fb11f873b
POWER10: Improve copy performance
...
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-13 10:41:45 -06:00
Joshie
ad63647446
Define BLAS acronym in README
2020-12-13 09:06:14 +00:00
Martin Kroeker
87315e8a8d
Update version to 0.3.13.dev
2020-12-12 23:28:49 +01:00
Martin Kroeker
9031ebd7d5
Update version to 0.3.13.dev
2020-12-12 23:28:20 +01:00
Martin Kroeker
12b41d5598
Merge pull request #3034 from xianyi/release-0.3.0
...
Merge back the release branch into develop to copy tag
2020-12-12 23:27:40 +01:00
Martin Kroeker
d2b11c4777
Merge pull request #3033 from xianyi/develop
...
Update branch from develop to release 0.3.13
2020-12-12 18:19:29 +01:00
Martin Kroeker
7bc0e4a2e0
Update version to 0.3.13 for release
2020-12-12 18:15:33 +01:00
Martin Kroeker
d3ec787f77
Update version to 0.3.13 for release
2020-12-12 18:14:49 +01:00
Martin Kroeker
2c309c235d
Merge pull request #3031 from martin-frbg/changelog13
...
Update Changelog.txt
2020-12-12 18:13:23 +01:00
Martin Kroeker
3dec81200c
Update Changelog.txt
...
Co-authored-by: h-vetinari <h.vetinari@gmx.com >
2020-12-12 14:27:37 +01:00
Martin Kroeker
737724607f
Merge pull request #3030 from martin-frbg/fix2994
...
Make fallback from POWER10 to POWER9 depend on new enough compiler
2020-12-12 10:01:45 +01:00
Martin Kroeker
77edf82c7f
Update Changelog.txt for 0.3.13
2020-12-12 01:25:20 +01:00
Martin Kroeker
6232237dba
Make fallback from P10 to P9 conditional on suitable compiler
2020-12-11 23:41:17 +01:00
Martin Kroeker
7d81acc762
Merge pull request #3 from xianyi/develop
...
rebase
2020-12-11 23:38:42 +01:00
Martin Kroeker
18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
...
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker
043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
...
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Martin Kroeker
3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10
...
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
2020-12-10 19:42:54 +01:00
Rajalakshmi Srinivasaraghavan
346e30a46a
POWER10: Improve axpy performance
...
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-10 11:51:42 -06:00
Martin Kroeker
83de62c20d
Merge pull request #3026 from martin-frbg/revert747
...
Revert PR747 - SYRK parameter changes for Haswell and related targets
2020-12-10 16:29:41 +01:00
Martin Kroeker
658da9a769
Merge pull request #3027 from gxw-loongson/develop
...
Add msa support for loongson
2020-12-10 16:27:30 +01:00
gxw
be24c66a7c
Keep LOONGSON3A and LOONGSON3B for loongson
2020-12-10 10:53:13 +08:00
gxw
4b548857d6
Add msa support for loongson
...
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Martin Kroeker
d71fe4ed4e
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
2020-12-08 21:07:57 +01:00
Martin Kroeker
a554712439
remove extra/intermediate size step for min_jj introduced in PR747
2020-12-08 21:01:36 +01:00
Martin Kroeker
5d26223f4a
remove extra/intermediate size step of min_jj from PR747
2020-12-08 20:59:56 +01:00
Martin Kroeker
980ab349bc
Merge pull request #2 from xianyi/develop
...
rebase
2020-12-08 20:53:35 +01:00
gxw
d67babf345
Remove gcc unrecognized option '-msched-weight' when check msa
2020-12-08 19:16:39 +08:00
Martin Kroeker
7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop
...
MIPS: Fix two bugs
2020-12-08 09:39:27 +01:00
Xianyi Zhang
7834c10e2f
Add PingTouGe contribution credit.
2020-12-07 16:55:05 +08:00
Martin Kroeker
53e0837809
Merge pull request #3022 from jinboson/develop
...
Fix test errors reported by cblas_cgemm & cblas_ctrmm
2020-12-07 08:09:11 +01:00
Hao Chen
ad38bd0e89
Fix failed cgemv and zgemv test case after using msa optimization
...
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
2020-12-07 10:25:01 +08:00
Hao Chen
47b639cc9b
Fix failed sswap and dswap case by using msa optimization
...
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
2020-12-07 10:24:49 +08:00
Martin Kroeker
8fef5876d1
Merge pull request #3024 from martin-frbg/sparc
...
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
2020-12-06 22:34:36 +01:00
Martin Kroeker
6c7d557a16
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio
2020-12-06 19:20:50 +01:00
Martin Kroeker
b660008c7e
Work around DOT and SWAP test failures
2020-12-06 19:15:37 +01:00
Martin Kroeker
f8346603cf
Fix compilation with SolarisStudio
2020-12-06 19:14:16 +01:00
Martin Kroeker
93473174d6
Fix utest build with SolarisStudio compilers
2020-12-06 19:12:56 +01:00
Martin Kroeker
b0b14f4e9b
Change comments to C style for compatibility
2020-12-06 19:12:02 +01:00
Martin Kroeker
3a1b1b7c8c
Fix complex ABI for 32bit SolarisStudio builds
2020-12-06 19:08:43 +01:00
Martin Kroeker
da6d5d675c
Fix hostarch detection for sparc
2020-12-06 19:07:45 +01:00
Martin Kroeker
04fa17322c
Fix build options for SolarisStudio compilers
2020-12-06 19:05:27 +01:00
Martin Kroeker
3853014ea1
Merge pull request #1 from xianyi/develop
...
rebase
2020-12-06 18:52:51 +01:00
Jin Bo
65de6f5957
Fix test errors reported by cblas_cgemm & cblas_ctrmm
...
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
2020-12-05 15:08:17 +08:00
Gordon Fossum
213c0e7abb
Added special unrolled vectorized versions of "Solve" for specific sizes,
...
in DTRSM and STRSM, to improve performance in Power9 and Power10.
2020-12-04 17:07:06 -06:00
Martin Kroeker
f21618684b
Merge pull request #3018 from martin-frbg/issue3015
...
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
2020-12-04 22:08:17 +01:00
Martin Kroeker
441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
...
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-04 22:07:16 +01:00
Martin Kroeker
66302b3c06
Merge pull request #3013 from martin-frbg/gcc46
...
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
2020-12-04 08:54:11 +01:00
Martin Kroeker
07e9a12349
Merge pull request #3011 from cyyever/fix_link
...
link math lib on FreeBSD
2020-12-04 08:50:59 +01:00
Martin Kroeker
dd1adbdec4
Merge pull request #3019 from RajalakshmiSR/dgemm_param
...
POWER10: Update param.h
2020-12-04 08:49:28 +01:00
Martin Kroeker
a1eecccda2
Update f_check
2020-12-03 23:43:17 +01:00
Rajalakshmi Srinivasaraghavan
41fe6e864e
POWER10: Update param.h
...
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
2020-12-03 14:40:11 -06:00
Martin Kroeker
74b5850581
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
2020-12-03 21:28:10 +01:00
Martin Kroeker
da0c94c76f
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
2020-12-03 21:25:57 +01:00
Martin Kroeker
a6692dc129
use gfortran-10 with xcode 12
2020-12-03 14:32:21 +01:00
Martin Kroeker
72a553f5bc
Update .travis.yml
2020-12-03 09:17:27 +01:00
Martin Kroeker
dcbb3b5ef1
fix misplaced lines
2020-12-02 23:13:13 +01:00
Martin Kroeker
57456c248b
fix gfortran requirement in osx interface64 test
2020-12-02 15:56:21 +01:00
Martin Kroeker
c361313564
Disable deprecated 32bit xcode
2020-12-02 07:49:43 +01:00
Gengxin Xie
0cb7a403b2
fix error declare function blas_level1_thread_with_return_value
2020-12-02 09:51:52 +08:00
Martin Kroeker
77a538d4ba
Update an overlooked instance of xcode 10.0 as well
2020-12-01 22:05:35 +01:00
Martin Kroeker
9621062eba
Update OSX xcode version to 11.5
2020-12-01 12:23:30 +01:00
Gengxin Xie
b766c1e9bb
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-01 16:49:26 +08:00
Martin Kroeker
22574b474e
Suppress -mfma as well for gcc 4.6
2020-11-30 21:41:51 +01:00
Martin Kroeker
f662022994
Move the version check to avoid overwriting unprocessed compiler data
2020-11-30 17:24:27 +01:00
Martin Kroeker
5e81e81478
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
...
POWER10: Optimize dgemv_n
2020-11-30 08:18:24 +01:00
Rajalakshmi Srinivasaraghavan
7d46e31de1
POWER10: Optimize dgemv_n
...
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
2020-11-29 15:28:28 -06:00
Martin Kroeker
62a2eb884f
Add SSE flags for x86
2020-11-29 15:33:07 +01:00
Martin Kroeker
2e99e2699b
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
2020-11-29 15:32:17 +01:00
Martin Kroeker
006b13299f
Merge pull request #3012 from martin-frbg/restore-getarch
...
Restore RISCV entries accidentally trashed by my PR 3005
2020-11-29 13:27:47 +01:00
Martin Kroeker
ca17d3dc3d
Restore RISCV entries accidentally trashed by my PR 3005
2020-11-29 13:19:51 +01:00
Martin Kroeker
52ed2741c5
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
...
add Fujitsu compilers
2020-11-29 11:36:43 +01:00
cyy
3b4c016110
link math lib on FreeBSD
2020-11-29 17:17:35 +08:00
Gilles Gouaillardet
358100ec15
add Fujitsu compilers
...
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com >
2020-11-29 14:35:42 +09:00
Martin Kroeker
3788b6d156
Merge pull request #3005 from martin-frbg/ssefix
...
Add -msse for x86 and silence build warning in getarch
2020-11-23 08:35:32 +01:00
Martin Kroeker
bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
...
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
2020-11-23 08:35:12 +01:00
Martin Kroeker
2f42d23104
Merge pull request #3002 from martin-frbg/issue3000
...
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
2020-11-22 22:51:26 +01:00
Martin Kroeker
b72dd007dc
Merge pull request #3001 from martin-frbg/issue2996
...
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
2020-11-22 22:50:41 +01:00
Martin Kroeker
11ebe5fa25
Avoid redefinition warning
2020-11-22 21:16:07 +01:00
Martin Kroeker
01f01dae98
Add -msse if supported
2020-11-22 21:15:08 +01:00
Martin Kroeker
e7bf8ced6c
Build fix for systems that do not support getauxval
2020-11-22 20:20:28 +01:00
Martin Kroeker
0256294921
Fix syntax mixup
2020-11-22 17:41:44 +01:00
Martin Kroeker
2b114c3f30
Restore proper Makefile
2020-11-22 17:16:22 +01:00
Martin Kroeker
60e1fddca7
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
2020-11-22 16:48:22 +01:00
Martin Kroeker
ebb8788696
Use ifneq instead of ifdef for CROSS option
2020-11-22 16:33:34 +01:00
Martin Kroeker
857afcc41d
Use ifeq instead of ifdef for user-definable build options
2020-11-22 16:31:44 +01:00
Martin Kroeker
5fa305172a
Use ifeq instead of ifdef for user-definable options
2020-11-22 16:29:56 +01:00
Martin Kroeker
d3ff1f889f
Convert ifndefs to ifneq
2020-11-22 16:27:17 +01:00
Martin Kroeker
65eb7afaf4
Change ifndef CROSS to ifneq
2020-11-22 16:25:36 +01:00
Martin Kroeker
8a6b17f97d
Change ifndefs to ifneq
2020-11-22 16:19:31 +01:00
Martin Kroeker
0f863f96e4
Merge pull request #112 from xianyi/develop
...
rebase
2020-11-22 16:17:19 +01:00
Martin Kroeker
437702e0e1
Merge pull request #2965 from epsilon-0/develop
...
allow setting soname without suffix or prefix
2020-11-22 12:25:33 +01:00
Martin Kroeker
f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
...
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Martin Kroeker
613e3b2baf
Merge pull request #2997 from Flamefire/reproduce_crash
...
Add reproducer test for crash after fork
2020-11-22 12:22:57 +01:00
Xianyi Zhang
05a0ea2340
Merge branch 'risc-v' into develop
2020-11-22 16:05:32 +08:00
Xianyi Zhang
7037849498
Merge branch 'develop' into risc-v
2020-11-22 16:04:50 +08:00
Xianyi Zhang
c6c9c24d1b
Update doc for C910.
2020-11-22 16:02:19 +08:00
Martin Kroeker
6dd71af0c3
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
...
Don't overwrite blas_thread_buffer if already set
2020-11-20 09:42:10 +01:00
Alexander Grund
a05dc6e62b
Add reproducer test for crash after fork
...
See #2993 for an analysis
2020-11-19 15:46:37 +01:00
Alexander Grund
60005eb47b
Don't overwrite blas_thread_buffer if already set
...
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
2020-11-19 14:51:51 +01:00
Anton Blanchard
043f3d6faa
POWER10: Use POWER9 as a fallback
...
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
2020-11-19 21:04:10 +11:00
Anton Blanchard
fdf71d66b3
POWER10: Fix ld version detection
...
LDVERSIONGTEQ35 needs to escape the '>' character.
LDVERSIONGTEQ35 is checking the system ld version which may be different
to the toolchain being used to compile OpenBLAS. We don't have a path
to the linker in our Makefiles, so (ab)use gcc -Wl,--version to get the
version of ld in our toolchain.
2020-11-19 20:50:42 +11:00
Martin Kroeker
7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum
...
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Martin Kroeker
be075d53cf
Merge pull request #2983 from Qiyu8/optimize-srot
...
Optimize the performance of rot by using universal intrinsics
2020-11-16 08:38:37 +01:00
Qiyu8
b00a0de132
remove the -mfma flag in when the host has AVX.
2020-11-16 09:14:56 +08:00
Martin Kroeker
d341a0fea0
Merge pull request #2989 from martin-frbg/cmake-fma
...
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
2020-11-13 12:35:09 +01:00
Martin Kroeker
ec4d77c47c
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
2020-11-13 09:16:34 +01:00
Martin Kroeker
02699226d0
Merge pull request #111 from xianyi/develop
...
rebase
2020-11-13 09:14:23 +01:00
Gengxin Xie
d6e7e05bb3
Improve the performance of dasum and sasum when SMP is defined
2020-11-13 14:20:52 +08:00
Qiyu8
ae0b1dea19
modify system.cmake to enable fma flag
2020-11-13 10:20:24 +08:00
Qiyu8
e0dac6b53b
fix the CI failure of target specific option mismatch
2020-11-12 20:31:03 +08:00
Qiyu8
e5c2ceb675
fix the CI failure of lack the head
2020-11-12 17:35:17 +08:00
Qiyu8
a87e537b8c
modify macro
2020-11-11 15:53:48 +08:00
Qiyu8
5bc0a7583f
only FMA3 and vector larger than 128 have positive effects.
2020-11-11 15:18:01 +08:00
Qiyu8
8c0b206d4c
Optimize the performance of rot by using universal intrinsics
2020-11-11 14:33:12 +08:00
Qiyu8
c4c591ac5a
fix sum optimize issues
2020-11-10 16:16:38 +08:00
Xianyi Zhang
1ea6cfefdb
Refs #2899 . Merge branch 'damonyu1989-openblas-open-910' into risc-v
2020-11-10 09:38:43 +08:00
Xianyi Zhang
fc35b72ae1
Refs #2899
...
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang
913cc9a4ca
Merge branch 'develop' into risc-v
2020-11-10 09:18:25 +08:00
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
...
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker
433637ccd8
Merge pull request #2980 from martin-frbg/fixgetarch
...
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 17:39:05 +01:00
Martin Kroeker
ec088bf33a
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 13:15:40 +01:00
Martin Kroeker
110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
...
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Martin Kroeker
d2faa1be4e
Merge pull request #2978 from martin-frbg/fixdynfeatures
...
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
2020-11-08 10:19:17 +01:00
Martin Kroeker
1c4cfdc139
Stay compatible with old gmake that did not support undefine
2020-11-08 00:12:55 +01:00
Martin Kroeker
f6a57d8f63
Update Makefile.system
2020-11-08 00:01:36 +01:00
Martin Kroeker
f4b7ba12b7
Update Makefile.system
2020-11-07 23:37:21 +01:00
Rajalakshmi Srinivasaraghavan
6e364981a8
Optimize sdot/ddot for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker
b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
2020-11-07 20:39:56 +01:00
Martin Kroeker
a04f532edf
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
2020-11-07 20:37:03 +01:00
Martin Kroeker
ccb9731c7b
Fix propagation of cpu properties to compiler options
2020-11-07 20:30:15 +01:00
Martin Kroeker
a29338aaa6
Remove extraneous quotes that caused a cmake policy warning
2020-11-07 20:27:42 +01:00
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
2020-11-07 20:26:12 +01:00
Martin Kroeker
e5967810b7
Merge pull request #110 from xianyi/develop
...
rebase
2020-11-07 20:22:41 +01:00
Martin Kroeker
ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
...
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
2020-11-07 12:17:49 +01:00
Gengxin Xie
725ffbf041
fix typo
2020-11-05 16:25:17 +08:00
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-05 15:12:36 +08:00
Martin Kroeker
60ab9c783f
Merge pull request #2966 from martin-frbg/issue2964
...
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
2020-11-04 16:02:46 +01:00
Martin Kroeker
8cc73fee98
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
2020-11-03 23:47:04 +01:00
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
2020-11-03 23:45:49 +01:00
Martin Kroeker
a9f9354296
Fix target test
2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4
Add files via upload
2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939
Merge pull request #2967 from RajalakshmiSR/dgemm88
...
POWER10: Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Aisha Tammy
60997ddd73
allow setting soname without suffix or prefix
...
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
2020-11-02 13:04:53 +00:00
Martin Kroeker
e5f8c2bf8a
typo fix
2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
2020-11-01 22:11:48 +01:00
Martin Kroeker
40a93c232b
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
...
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
2020-11-01 21:58:26 +01:00
Martin Kroeker
fab952bee4
Merge pull request #2962 from brada4/develop
...
add openbsd 68+ gfortran name
2020-11-01 14:24:40 +01:00
Martin Kroeker
1cf04a6f0e
Merge pull request #2963 from martin-frbg/issue2959
...
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
2020-11-01 09:14:54 +01:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
...
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
2020-11-01 00:00:43 +01:00
User User-User
9fab65e90a
add openbsd gfortran
2020-11-01 00:38:08 +02:00
Martin Kroeker
9efc3f0815
Merge pull request #109 from xianyi/develop
...
rebase
2020-10-31 22:33:52 +01:00
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
...
fix avx2 detection
2020-10-31 20:24:21 +01:00
Guillaume Horel
1f564d729b
fix avx2 detection
...
reword commits to make it clearer
2020-10-31 10:00:48 -04:00
Martin Kroeker
9349dcd206
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
...
Optimize caxpy for POWER10
2020-10-30 08:54:10 +01:00
Rajalakshmi Srinivasaraghavan
b435491885
Optimize caxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Martin Kroeker
9a058f2451
Merge pull request #2940 from Qiyu8/optimize-benchmark
...
Refactor the performance measurement system
2020-10-29 20:28:37 +01:00
Martin Kroeker
074927a7d0
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
...
Implementation of BF16 based gemv
2020-10-29 09:22:33 +01:00
Martin Kroeker
60b22e3462
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
...
Fix cooperlake compile issue
2020-10-29 09:22:07 +01:00
Chen, Guobing
c5e62dad69
Fix cooperlake compile issue
...
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
2020-10-29 03:37:59 +08:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com >
2020-10-29 02:08:23 +08:00
Martin Kroeker
67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
...
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Martin Kroeker
6e13a7e99e
Merge pull request #2951 from martin-frbg/cleanup_make
...
Minor Makefile cleanup
2020-10-28 09:37:56 +01:00
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
...
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
2020-10-28 09:37:32 +01:00
Martin Kroeker
5d643929dd
Merge pull request #2948 from martin-frbg/issue2947
...
Expressly enable neon for use with intrinsics if available
2020-10-28 09:37:09 +01:00
Martin Kroeker
e8cbf0fc50
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
2020-10-27 23:01:19 +01:00
Martin Kroeker
b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
2020-10-27 17:51:32 +01:00
Martin Kroeker
e2f9005db8
Merge pull request #2950 from RajalakshmiSR/saxpy
...
Optimize saxpy for POWER10
2020-10-27 00:02:18 +01:00
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
2020-10-26 21:37:04 +01:00
Martin Kroeker
878b6d1f41
Remove spurious expr in flang version check
2020-10-26 21:35:40 +01:00
Rajalakshmi Srinivasaraghavan
c24ba8b1dd
Optimize saxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Qiyu8
f917c26e83
Refractoring remaining benchmark cases.
2020-10-26 10:25:05 +08:00
Martin Kroeker
76203e2120
Merge pull request #2946 from martin-frbg/issue2945
...
Move definitions that are neither needed nor supported on Solaris
2020-10-26 00:43:44 +01:00
Martin Kroeker
eec517af0e
Expressly enable neon for use with intrinsics if available
2020-10-26 00:21:56 +01:00
Martin Kroeker
fd7da56965
Move definitions that are neither needed nor supported on SUNOS
2020-10-25 12:01:50 +01:00
Martin Kroeker
2f9fc9be30
Update version to 0.3.12.dev
2020-10-24 23:29:05 +02:00
Martin Kroeker
81fcfd5ed3
Update version to 0.3.12.dev
2020-10-24 23:28:29 +02:00
Martin Kroeker
addf7593ae
Merge pull request #2944 from xianyi/release-0.3.0
...
Merge back 0.3.12 tag (and Changelog typo fixes) from release
2020-10-24 13:10:51 +02:00
Martin Kroeker
c5f280a7f0
Fix typos
2020-10-24 13:03:28 +02:00
Martin Kroeker
6e3a05f2c9
Merge pull request #2943 from xianyi/develop
...
Merge from develop for 0.3.12 release
2020-10-24 12:52:59 +02:00
Martin Kroeker
89db73569b
Update Changelog with 0.3.12 changes
2020-10-24 12:50:04 +02:00
Martin Kroeker
e1c18e4eeb
Update version to 0.3.12 for release
2020-10-24 12:15:33 +02:00
Martin Kroeker
26f658c9d2
Update version to 0.3.12 for release
2020-10-24 12:14:45 +02:00
Martin Kroeker
dc35477317
Merge pull request #2942 from martin-frbg/makebuildtypes
...
Comment out BUILD_SINGLE etc. in Makefile.rule and add a short explanation
2020-10-24 09:26:50 +02:00
Martin Kroeker
365f28787c
Comment out BUILD_SINGLE etc. and add a short explanation
2020-10-23 23:32:06 +02:00
Martin Kroeker
2f2e9ddb65
Merge pull request #2941 from martin-frbg/exportsfix
...
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
2020-10-23 20:47:35 +02:00
Martin Kroeker
0d140e61ac
Fix wrong grouping of dcombssq
2020-10-23 15:53:40 +02:00
Martin Kroeker
4c45cd6294
fix missing split of sladiv1/dladiv/ilaenv2stage by build type
2020-10-23 15:31:25 +02:00
Martin Kroeker
680f744abf
Merge pull request #108 from xianyi/develop
...
rebase
2020-10-23 15:29:48 +02:00
Martin Kroeker
6f9460f0f6
Merge pull request #2937 from martin-frbg/pwr-buffersz
...
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
2020-10-23 07:15:32 +02:00
Qiyu8
dd6ebdfdab
Refactor the performance measurement system
2020-10-23 10:32:03 +08:00
Guillaume Horel
1917a4e7b8
reuse variables defined in Makefile.system
2020-10-22 22:04:25 -04:00
Martin Kroeker
6c970fa998
Merge pull request #2938 from martin-frbg/2934-3
...
Fix twisted spelling that broke the gfortran version test again
2020-10-23 00:19:49 +02:00
Martin Kroeker
b23cb05231
Fix twisted spelling that broke the gfortran version test again
2020-10-23 00:18:29 +02:00
Martin Kroeker
1d4c96fa0c
Increase BUFFERSIZE further
2020-10-23 00:12:06 +02:00
Martin Kroeker
34c3c407ef
label always_inline function as inline to silence a gcc warning
2020-10-22 22:14:26 +02:00
Martin Kroeker
3f84a9ca15
Merge pull request #2936 from martin-frbg/issue2934-2
...
Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)
2020-10-22 22:08:46 +02:00
Martin Kroeker
7e265c50bf
Merge pull request #2935 from martin-frbg/lapack458
...
Fix macro used in argument conversion (LAPACK PR 458)
2020-10-22 19:25:58 +02:00
Martin Kroeker
ee90f30384
Increase BUFFERSIZE for POWER8-10 and use same value for POWER6
...
to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH
2020-10-22 18:47:07 +02:00
Martin Kroeker
2e48d560ba
Fix compiler version check
2020-10-22 16:23:29 +02:00
Martin Kroeker
ab7f466467
Merge pull request #106 from xianyi/develop
...
rebase
2020-10-22 16:21:09 +02:00
Martin Kroeker
f95031204e
Fix macro used in argument conversion (LAPACK PR 458)
2020-10-22 16:19:26 +02:00
Martin Kroeker
909068facf
Merge pull request #2932 from RajalakshmiSR/copyp10
...
Optimize scopy/ccopy for POWER10
2020-10-22 00:29:46 +02:00
Martin Kroeker
5b7438fdde
Merge pull request #2934 from thrasibule/improve_version_check
...
actually check that version is greater than 4.7
2020-10-22 00:29:02 +02:00
Guillaume Horel
47696b43e9
actually check that version is greater than 4.7
2020-10-21 16:42:37 -04:00
Rajalakshmi Srinivasaraghavan
ad745c0bae
Optimize scopy/ccopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
Martin Kroeker
17c46bf06a
Merge pull request #2930 from ismail/fix-no-return
...
Fix build with -Werror=return-type
2020-10-21 11:43:01 +02:00
Martin Kroeker
28242096cd
Merge pull request #2928 from martin-frbg/issue2917
...
Enable -mavx2 for flang as well where supported
2020-10-21 10:11:02 +02:00
İsmail Dönmez
4a1d00f589
Fix build with -Werror=return-type
...
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Martin Kroeker
00813363be
Enable -mavx2 for flang as well
2020-10-20 23:56:30 +02:00
Martin Kroeker
336e35469a
Merge pull request #105 from xianyi/develop
...
rebase
2020-10-20 23:48:53 +02:00
Martin Kroeker
29668458f7
Merge pull request #2925 from martin-frbg/issue2911-2
...
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
2020-10-20 11:27:36 +02:00
Martin Kroeker
ee83e29046
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
...
x86_64: clobber all xmm registers after vzeroupper
2020-10-20 09:24:47 +02:00
Martin Kroeker
1a0f57c8f0
Fix missing backquotes
2020-10-20 08:37:53 +02:00
Bart Oldeman
b073d759d0
x86_64: clobber all xmm registers after vzeroupper
...
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
eddc65c7b7
Add POWER10 support flag (unconditionally for now)
2020-10-20 01:09:49 +02:00
Martin Kroeker
bb8c3f6861
Add ld/binutils version check for POWER10 support
2020-10-20 01:04:20 +02:00
Martin Kroeker
ff65952e46
Move HAVE_P10_SUPPORT to the build system
...
to be able to include a binutils version check
2020-10-20 00:55:41 +02:00
Martin Kroeker
6208c9899e
Merge pull request #104 from xianyi/develop
...
rebase
2020-10-20 00:52:08 +02:00
Martin Kroeker
8e20ab21c8
Merge pull request #2924 from martin-frbg/issue2920
...
Put back all symbols accidentally dropped in the reorganization of gensymbol
2020-10-19 23:33:45 +02:00
Martin Kroeker
dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
...
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
4ad33c46b0
Add back symbols that got dropped when splitting by type
2020-10-19 20:37:52 +02:00
Martin Kroeker
fe2a922ada
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT
2020-10-19 17:43:53 +02:00
Martin Kroeker
9cac379655
Merge pull request #103 from xianyi/develop
...
rebase
2020-10-19 15:56:20 +02:00
Martin Kroeker
a61c086408
Fix spurious trailing whitespace in comment
2020-10-19 09:12:12 +02:00
Martin Kroeker
5b9ebe4f8a
Merge pull request #2919 from isuruf/export
...
Fix exporting some lapack and cblas symbols
2020-10-19 08:14:27 +02:00
Martin Kroeker
7eddaf0d6f
Remove -mmma again (reduntant with cpu=power10) and add override statements
2020-10-19 08:11:22 +02:00
Isuru Fernando
14b1d33933
Fix exporting some lapack and cblas
2020-10-18 22:45:58 -05:00
Martin Kroeker
77669b019d
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
...
sgemm_direct_skylakex: fix 75eeb26 regression.
2020-10-19 00:09:54 +02:00
Martin Kroeker
5e8ddc9001
Merge pull request #2913 from martin-frbg/issue2910
...
Support cross-compiling for Apple Vortex
2020-10-18 23:04:56 +02:00
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26 regression.
...
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c
Clean up STACKSIZE redefinition
2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f
Clean up STACKSIZE redefinition
2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d
Clean up STACKSIZE redefinition
2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46
Clean up STACKSIZE redefinition
2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
2020-10-18 19:29:45 +02:00
Martin Kroeker
7d6c85f9da
Add compiler option -mmma for POWER10
2020-10-18 19:27:51 +02:00
Martin Kroeker
2e7ee7c716
Fix naming of L2 cache size item reported for Vortex
2020-10-18 19:22:05 +02:00
Martin Kroeker
efd47b0104
Merge pull request #2909 from isuruf/patch-1
...
Need a space when redirecting to file
2020-10-18 19:16:08 +02:00
Martin Kroeker
f5902ab0a1
Support cross-compiling for Apple Vortex
2020-10-18 19:10:58 +02:00
Martin Kroeker
1a0c185122
Support cross-compiling for Apple Vortex
2020-10-18 18:54:54 +02:00
Martin Kroeker
89eea6b455
Merge pull request #102 from xianyi/develop
...
rebase
2020-10-18 18:49:59 +02:00
Isuru Fernando
a5c667b55c
Need a space when redirecting to file
...
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
2020-10-18 09:40:31 -05:00
Martin Kroeker
0ac6102708
Update version string to 0.3.11.dev
2020-10-17 22:40:47 +02:00
Martin Kroeker
26a701f4ad
Update version string to 0.3.11.dev
2020-10-17 22:40:06 +02:00
Martin Kroeker
fcd0fa1a3a
Merge pull request #2908 from xianyi/release-0.3.0
...
Synchronyse tag with release 0.3.11
2020-10-17 22:38:58 +02:00
Martin Kroeker
51c22612eb
Merge pull request #2907 from xianyi/develop
...
Update from develop for 0.3.11
2020-10-17 22:14:12 +02:00
Martin Kroeker
b8f689200e
Update version number to 0.3.11
2020-10-17 22:11:34 +02:00
Martin Kroeker
fe9015b619
Update version for 0.3.11 release
2020-10-17 22:10:50 +02:00
Martin Kroeker
f99b8c1502
Merge pull request #2906 from martin-frbg/changelog-0311
...
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:07:14 +02:00
Martin Kroeker
5381a18056
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:05:36 +02:00
Martin Kroeker
e35576c6fc
Merge pull request #2905 from martin-frbg/aocc-clang
...
Add -mavx for clang & aocc
2020-10-17 09:45:22 +02:00
Martin Kroeker
f1bb85d378
Add AVX flags for clang/aocc as well
2020-10-16 20:52:15 +02:00
Martin Kroeker
25907e672b
Merge pull request #101 from xianyi/develop
...
rebase
2020-10-16 20:48:58 +02:00
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
2020-10-16 23:27:38 +08:00
Martin Kroeker
9789375389
Merge pull request #2900 from martin-frbg/fixcmake_sse
...
Add compiler options for SSE to the cmake support files
2020-10-16 16:17:36 +02:00
Martin Kroeker
f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1
2020-10-16 10:47:06 +02:00
Martin Kroeker
786c0a3ce8
Add sse options for use of intrinics with older compilers
2020-10-16 10:41:53 +02:00
Martin Kroeker
df70667043
fix core list for sse/sse2
2020-10-16 09:55:48 +02:00
Martin Kroeker
e6c5b13a18
Merge pull request #2898 from martin-frbg/morefixes
...
More pre-release fixes
2020-10-16 07:26:39 +02:00
Martin Kroeker
f071d1207a
add sse2
2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01
Silence a redefinition warning
2020-10-15 19:08:12 +02:00
Martin Kroeker
ac8af9cec6
Add -msse where supported, apparently required for older gcc
2020-10-15 19:06:45 +02:00
Martin Kroeker
10379fc83b
Use ifdef instead of if
2020-10-15 19:05:37 +02:00
Martin Kroeker
a85ac71633
Merge pull request #100 from xianyi/develop
...
rebase
2020-10-15 18:54:20 +02:00
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
...
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
damonyu
ef8e7d0279
Add the support for RISC-V Vector.
...
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
2020-10-15 16:09:02 +08:00
Martin Kroeker
9b9ee92d5f
Merge pull request #2897 from Qiyu8/usimd-double
...
Add double precision universal intrinsics for X86/ARM
2020-10-15 08:38:24 +02:00
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
2020-10-15 08:37:02 +02:00
Qiyu8
4fac91ef37
adapt arm platform
2020-10-15 11:08:10 +08:00
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
2020-10-15 10:29:42 +08:00
Martin Kroeker
ebf0470fc2
add sse4.1 for DYNAMIC_ARCH kernels
2020-10-14 20:34:33 +02:00
Martin Kroeker
ca160bb440
Add -msse4.1 when SSE4.1 is supported
2020-10-14 19:18:07 +02:00
Martin Kroeker
c9c3ae07af
Add double precision operations
2020-10-14 18:10:45 +02:00
Martin Kroeker
a897bc3bd2
Merge pull request #99 from xianyi/develop
...
rebase
2020-10-14 18:09:20 +02:00
Martin Kroeker
756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
...
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker
01492decf4
Merge pull request #2895 from martin-frbg/sb-tests
...
Fix remaining build errors related to bfloat16 and cmake
2020-10-14 09:01:16 +02:00
Martin Kroeker
bd0752444a
Merge pull request #2894 from RajalakshmiSR/bf16_packing
...
POWER10: Change the packing format for bfloat16
2020-10-14 08:12:08 +02:00
Martin Kroeker
c1f4f5d4e7
Replace Makefile with simplified version again
2020-10-14 01:08:50 +02:00
Martin Kroeker
75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake)
2020-10-14 01:01:58 +02:00
Martin Kroeker
2a329baa81
Add the BFLOAT16 functions to cmake builds
2020-10-13 23:21:38 +02:00
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
...
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
2020-10-13 16:05:10 -05:00
Martin Kroeker
4bb73c0171
Rename "HALF" type to "BFLOAT16"
2020-10-13 20:07:19 +02:00
Martin Kroeker
bc5c7f9578
Cleanup
2020-10-13 19:56:09 +02:00
Martin Kroeker
437b7fe261
sh prefix renamed to sb
2020-10-13 19:55:14 +02:00
Martin Kroeker
a0ada4bcb8
Merge pull request #98 from xianyi/develop
...
rebase
2020-10-13 18:50:30 +02:00
Martin Kroeker
602a0c7a69
Merge pull request #2892 from RajalakshmiSR/bf16_make
...
Fix build issues with bfloat16
2020-10-13 18:48:37 +02:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
137ae618db
Fix typo
2020-10-13 15:02:17 +02:00
Martin Kroeker
9e3cff5cf2
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well
2020-10-13 14:41:25 +02:00
Martin Kroeker
d85b968424
Merge pull request #2891 from martin-frbg/fix-2886
...
Fix several bugs and omissions from the BFLOAT16 rename
2020-10-13 13:46:17 +02:00
Martin Kroeker
5f60a32cac
Add -mssse3 if supported by the hardware
2020-10-13 11:57:04 +02:00
Martin Kroeker
fecedc9c69
Add -mssse3
2020-10-13 11:55:41 +02:00
Martin Kroeker
0eacbca85f
Add Haswell and Zen to temporary sse3 whitelist
2020-10-13 11:42:39 +02:00
Martin Kroeker
6999086a2b
whitelist SANDYBRIDGE for SSE3
2020-10-13 10:32:19 +02:00
Martin Kroeker
9dca578c79
Cleanup
2020-10-13 10:14:08 +02:00
Martin Kroeker
1e7eb7b7a9
Fix typos in currently unused sections
2020-10-13 09:17:15 +02:00
Martin Kroeker
84949754a0
Fix bfloat16 conditional
2020-10-13 09:11:36 +02:00
Martin Kroeker
2ae8785603
Add a POWER9 build with BFLOAT16 enabled
2020-10-13 09:07:50 +02:00
Martin Kroeker
e05af6575e
Fix some overlooked "SHBLAS" entries
2020-10-13 09:05:04 +02:00
Martin Kroeker
c1643006ae
Merge pull request #97 from xianyi/develop
...
rebase
2020-10-13 09:01:49 +02:00
Martin Kroeker
8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
2020-10-13 00:14:29 +02:00
Martin Kroeker
08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
...
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker
0c84ffe05f
Merge pull request #2881 from mattip/fninit
...
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Martin Kroeker
cb4274e3ad
Merge pull request #2888 from Qiyu8/usimd-sum
...
Optimize the performance of sum by using universal intrinsics
2020-10-12 23:22:08 +02:00
Matti Picus
403eb513a0
use emms instead, add WIN guards
2020-10-12 18:15:01 +03:00
Martin Kroeker
cb839575ed
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme
2020-10-12 14:44:33 +02:00
Qiyu8
0ed1f07660
Optimize the performance of sum by using universal intrinsics
2020-10-12 19:48:53 +08:00
Martin Kroeker
bb74dd29db
Restore -msse3
2020-10-12 00:42:05 +02:00
Martin Kroeker
629c497b6c
common_sh.h renamed to common_sb.h
2020-10-12 00:27:11 +02:00
Martin Kroeker
2c552f1074
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:11:31 +02:00
Martin Kroeker
7ae9e8960e
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:08:29 +02:00
Martin Kroeker
e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:07:37 +02:00
Martin Kroeker
006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:06:06 +02:00
Martin Kroeker
85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:05:05 +02:00
Martin Kroeker
ae1ab5bfdf
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:03:21 +02:00
Martin Kroeker
052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:02:16 +02:00
Martin Kroeker
3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:00:55 +02:00
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:56:17 +02:00
Martin Kroeker
2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:54:53 +02:00
Martin Kroeker
dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:53:50 +02:00
Martin Kroeker
32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:52:45 +02:00
Martin Kroeker
3bc8e8c334
Rename "HALF" and "sh" to "BFLOAT16"and "sb"
2020-10-11 23:51:34 +02:00
Martin Kroeker
573508f0ee
Rename common_sh.h to common_sb.h
2020-10-11 23:50:54 +02:00
Martin Kroeker
ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:49:22 +02:00
Martin Kroeker
5800758b43
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:44:38 +02:00
Martin Kroeker
924fd806d0
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:43:36 +02:00
Martin Kroeker
4db09c6cec
Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c
2020-10-11 23:42:45 +02:00
Martin Kroeker
fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:42:07 +02:00
Martin Kroeker
68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c
2020-10-11 23:41:13 +02:00
Martin Kroeker
d7dd9b396c
Rename shdot.c to sbdot.c
2020-10-11 23:40:43 +02:00
Martin Kroeker
9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:39:42 +02:00
Martin Kroeker
d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
2020-10-11 23:37:38 +02:00
Martin Kroeker
f0883740e4
Merge pull request #96 from xianyi/develop
...
rebase
2020-10-11 23:34:36 +02:00
Martin Kroeker
1c0b03efb4
Merge branch 'develop' into develop
2020-10-11 23:34:14 +02:00
Martin Kroeker
c589c3e2a1
Merge pull request #2882 from martin-frbg/issue2709
...
Use generic C for (D/Z)NRM2 on Windows x86_64
2020-10-11 22:22:30 +02:00
Martin Kroeker
ec638a82bf
Merge pull request #2852 from martin-frbg/issue2588-cmake
...
Support building only a subset of variable types
2020-10-11 22:21:33 +02:00
Martin Kroeker
caa0d757ca
repair TABs
2020-10-11 18:29:34 +02:00
Martin Kroeker
6154f72d6d
Copy BUILD_ settings to the LAPACK make.inc
2020-10-11 18:25:16 +02:00
Martin Kroeker
ae8b0d257a
Set BUILD_ options to 1 instead of just defining them
2020-10-11 18:08:21 +02:00
Martin Kroeker
1da32cc1fc
Add cblas_xerbla interface
2020-10-11 17:45:41 +02:00
Martin Kroeker
8c5e08076e
If none of the BUILD_ options is set, enable them all
2020-10-11 17:33:51 +02:00
Martin Kroeker
5f23bdf437
remove debug output
2020-10-11 17:23:08 +02:00
Martin Kroeker
b593e6b650
Merge pull request #2885 from martin-frbg/ifexists
...
Improve CMAKE check for conflicting config_kernel.h
2020-10-11 15:45:24 +02:00
Martin Kroeker
082c86a538
Merge pull request #2884 from martin-frbg/sse_fixup
...
Add workaround for unwanted default activation of -msse3 in DYNAMIC_ARCH builds
2020-10-11 15:14:03 +02:00
Martin Kroeker
e396ec8b56
Allow building support for only a subset of variable types
2020-10-11 15:11:15 +02:00
Martin Kroeker
68e6823d36
Adapt for supporting only a subset of variable types
2020-10-11 15:01:32 +02:00
Martin Kroeker
887e00fd7f
Adapt for supporting only a subset of variable types
2020-10-11 14:58:57 +02:00
Martin Kroeker
886a8e3190
Adapt for supporting only a subset of variable types
2020-10-11 14:57:32 +02:00
Martin Kroeker
0f7d73ff6d
Allow supporting only a subset of variable types
2020-10-11 14:53:26 +02:00
Martin Kroeker
6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types
2020-10-11 14:52:09 +02:00
Martin Kroeker
a6570108c5
Add Makefile support for enabling only some variable types
2020-10-11 14:49:58 +02:00
Martin Kroeker
ef552bc578
Add Makefile support for enabling only some variable types
2020-10-11 14:49:06 +02:00
Martin Kroeker
efe1ad4700
Add Makefile support for enabling only some variable types
2020-10-11 14:48:23 +02:00
Martin Kroeker
b27ca78a21
Adapt to having only a subset of variable types supported
2020-10-11 14:46:24 +02:00
Martin Kroeker
93454022a9
Adapt to having only a subset of variable types supported
2020-10-11 14:45:40 +02:00
Martin Kroeker
20cf1d773f
Adapt to having only a subset of variable types supported
2020-10-11 14:44:56 +02:00
Martin Kroeker
5c657fffad
Adapt to having only a subset of variable types supported
2020-10-11 14:44:13 +02:00
Martin Kroeker
b262058059
Adapt to having only a subset of variable types supported
2020-10-11 14:43:13 +02:00
Martin Kroeker
bc319cee82
Adapt to having only a subset of variable types supported
2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606
Adapt to having only a subset of variable types supported
2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f
Adapt to having only a subset of variable types supported
2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb
Adapt to having only a subset of variable types supported
2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72
Adapt to having only a subset of variable types supported
2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7
Adapt to having only a subset of variable types supported
2020-10-11 14:38:25 +02:00
Martin Kroeker
d33de97d60
Adapt to having only a subset of variable types supported
2020-10-11 14:36:45 +02:00
Martin Kroeker
6a83c591d6
Adapt for having only a subset of variable types
2020-10-11 14:34:12 +02:00
Martin Kroeker
f6d2827d0c
Adapt ctests to having only a subset of types in the build
2020-10-11 14:32:00 +02:00
Martin Kroeker
08f4749eb4
Adapt tests to having only a subset of types in the build
2020-10-11 14:25:24 +02:00
Martin Kroeker
63d7dad04c
Adapt utests for builds supportin only some variable types
2020-10-11 14:15:35 +02:00
Martin Kroeker
ac653c94f3
Merge branch 'develop' into issue2588-cmake
2020-10-11 13:57:07 +02:00
Martin Kroeker
190b74dd24
Add files via upload
2020-10-11 13:26:05 +02:00
Martin Kroeker
9d43140d61
Improve check for conflicting config_kernel.h
2020-10-11 12:58:17 +02:00
Martin Kroeker
8ef600f1a3
Merge pull request #95 from xianyi/develop
...
rebase
2020-10-11 12:53:18 +02:00
Martin Kroeker
88928650c4
Merge pull request #2883 from martin-frbg/issue2872
...
Minor CMAKE fixes
2020-10-11 10:30:33 +02:00
Martin Kroeker
7a53128481
Add whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled
2020-10-11 01:06:46 +02:00
Martin Kroeker
0c773b8205
Do not rely on HAVE_SSE3 in DYNAMIC_ARCH builds
2020-10-11 01:04:57 +02:00
Martin Kroeker
fbda20c856
Merge pull request #94 from xianyi/develop
...
rebase
2020-10-11 01:03:00 +02:00
Martin Kroeker
82a497ec5d
restore PRESCOTT default for DYNAMIC_LIST
2020-10-11 00:43:09 +02:00
Martin Kroeker
de27e4f5fb
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
...
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
2020-10-11 00:40:22 +02:00
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
...
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8
f32d34a015
add sse3 compiler flag
2020-10-10 10:36:15 +08:00
Martin Kroeker
599777ecb7
Merge pull request #2879 from martin-frbg/issue2839
...
Default BLAS3_MEM_ALLOC_THRESHOLD on all platforms to 32
2020-10-06 23:26:52 +02:00
Martin Kroeker
7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug
2020-10-06 21:33:16 +02:00
Matti Picus
a5b164946c
add fninit to reset fpu registers before assembler routines
2020-10-05 22:13:25 +03:00
Martin Kroeker
a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows
2020-10-04 23:01:06 +02:00
Martin Kroeker
dc8e4e1959
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable
2020-10-04 22:59:24 +02:00
Martin Kroeker
cccd1438da
Merge pull request #93 from xianyi/develop
...
rebase
2020-10-04 22:57:11 +02:00
Martin Kroeker
f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes
...
Avoid out of bounds access on invalid memory free
2020-10-04 15:16:51 +02:00
Martin Kroeker
f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix
...
Lazyly reinit threads after a fork in OMP mode
2020-10-03 22:52:17 +02:00
Martin Kroeker
9828343e12
Merge pull request #2878 from brada4/asms
...
fix clang std=c18 compilation on aarch64
2020-10-03 22:51:49 +02:00
User User-User
d2333e7842
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
Alexander Grund
3094fc6c83
Lazyly reinit threads after a fork in OMP mode
...
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
2020-10-01 15:41:42 +02:00
Alexander Grund
3c05f54df8
Avoid out of bounds access on invalid memory free
2020-10-01 10:48:45 +02:00
Alexander Grund
dee7c49938
Fix TABs and trailing space
2020-10-01 10:43:16 +02:00
Martin Kroeker
d3c0d6811b
Merge pull request #2873 from martin-frbg/issue2871
...
Check for __linux rather than linux in cpuid code and benchmarks
2020-10-01 06:38:22 +02:00
Martin Kroeker
9637cd1fd1
Merge pull request #2865 from thisch/backticks
...
Consolidate usage of backticks for build options
2020-10-01 06:38:06 +02:00
Martin Kroeker
2367726578
Remove redundant status message
2020-09-30 23:28:49 +02:00
Martin Kroeker
5464eb13ea
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:59:41 +02:00
Martin Kroeker
e1574cbc83
Change ifdef linux to __linux for C11 compatibility
...
and add a fallback for unsupported operating systems in detect()
2020-09-30 22:50:21 +02:00
Martin Kroeker
0b2bb5696a
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:47:25 +02:00
Martin Kroeker
a7d5d0078d
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:46:25 +02:00
Martin Kroeker
be40440ec5
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:45:18 +02:00
Martin Kroeker
2bf70c8e3b
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:43:25 +02:00
Qiyu8
60e6c68e38
Adapt ARM architect
2020-09-29 16:36:14 +08:00
Martin Kroeker
64629cb5c7
Merge pull request #91 from xianyi/develop
...
rebase
2020-09-28 22:48:53 +02:00
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-09-28 20:36:53 +08:00
Martin Kroeker
0d98ce202c
Merge pull request #2866 from RajalakshmiSR/p10_dcopy
...
Optimize dcopy/zcopy for POWER10
2020-09-28 07:22:54 +02:00
Rajalakshmi Srinivasaraghavan
2df4235e00
Optimize dcopy/zcopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-27 21:42:32 -05:00
Thomas Hisch
fe8cd5ae7e
Consolidate usage of backticks for build options
...
There were some build options in the README that were not
highlighted. Now all are highlighted.
2020-09-28 00:42:17 +02:00
Martin Kroeker
ba31c8f5f9
Merge pull request #2853 from Qiyu8/usimd-daxpy
...
Optimize the performance of daxpy by using universal intrinsics
2020-09-27 23:19:59 +02:00
Martin Kroeker
e961d4d609
Merge pull request #2864 from martin-frbg/lapack445
...
FIx underflow/rounding errors in LAPACK (S,D)LANV2
2020-09-27 23:11:17 +02:00
Martin Kroeker
7ed25e9e10
FIx underflow/rounding errors in LAPACK (S,D)LANV2
...
Reference-LAPACK PR 445, fixing their issue 263
2020-09-27 22:59:20 +02:00
Martin Kroeker
7b169379e0
Merge pull request #2863 from martin-frbg/readmefixes
...
Readmefixes
2020-09-27 22:50:25 +02:00
Martin Kroeker
7f539fb850
Update cpu list, outline cmake build, clarify scope of set_num_threads extension
2020-09-27 22:48:41 +02:00
Martin Kroeker
caf7a12295
Merge pull request #90 from xianyi/develop
...
rebase
2020-09-27 22:35:45 +02:00
Martin Kroeker
72b5b73647
Merge pull request #2850 from xiaojiayuan111/develop
...
fix a bug of trmm
2020-09-27 12:12:35 +02:00
Qiyu8
881c15179f
remove default support for FMA4 on zen architect
2020-09-27 09:35:50 +08:00
Martin Kroeker
896bbd55e1
Add support for building only selected variable types
2020-09-26 23:25:55 +02:00
Martin Kroeker
c5a32288c6
Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16
2020-09-26 23:24:37 +02:00
Martin Kroeker
dfaafd3b55
Merge pull request #2854 from martin-frbg/travis-graviton
...
Add an AWS-Graviton2 build to Travis CI
2020-09-23 21:59:18 +02:00
Martin Kroeker
f2e9a24e1a
Add AWS Graviton2 build
2020-09-23 19:02:20 +02:00
Martin Kroeker
98153875e9
Adapt tests to having only a subset of types in the library
2020-09-22 23:28:57 +02:00
Martin Kroeker
0eaae30e8c
Adapt tests to having only a subset of types in the build
2020-09-22 23:28:03 +02:00
Martin Kroeker
dfbc62ef7e
Support building only a subset of types
2020-09-22 23:25:59 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Martin Kroeker
357bff06b5
Add BUILD_vartype defines
2020-09-22 23:24:22 +02:00
Martin Kroeker
988a6f429e
Add BUILD_vartype defines
2020-09-22 23:23:33 +02:00
Martin Kroeker
e5e2fbd593
Support building only selected types
2020-09-22 23:21:30 +02:00
Martin Kroeker
3287848c8f
Support building only seleced types
2020-09-22 23:20:51 +02:00
Martin Kroeker
26611af8e1
fix grouping of sources used for more than one type
2020-09-22 23:20:05 +02:00
Martin Kroeker
b886bd672b
add defines for building a subset of types
2020-09-22 23:18:55 +02:00
Martin Kroeker
61fae59298
Merge pull request #88 from xianyi/develop
...
rebase
2020-09-22 23:15:33 +02:00
Martin Kroeker
33d22f99f1
Merge pull request #2851 from martin-frbg/travis-xcode12
...
Add an OSX build with xcode12
2020-09-22 21:44:55 +02:00
Martin Kroeker
5ba01dd1a8
Add an OSX build with xcode12
2020-09-22 17:26:19 +02:00
Qiyu8
14f7dad3b7
performance improved
2020-09-22 16:52:15 +08:00
y00512012
06cf73a239
fix a bug of trmm
2020-09-22 16:47:10 +08:00
Qiyu8
325b539c26
Optimize the performance of daxpy by using universal intrinsics
2020-09-22 10:38:35 +08:00
Martin Kroeker
0f112077e6
Merge pull request #2847 from mhillenibm/fixup_cscal
...
s390x: fix cscal and zscal implementations
2020-09-21 22:22:43 +02:00
Marius Hillenbrand
22aa81f3e5
s390x: fix cscal and zscal implementations
...
The implementation of complex scalar * vector multiplication for Z14
makes some LAPACK tests fail because the numerical differences to the
reference implementation exceed the threshold (as can be seen by running
make lapack-test and replacing kernel/zarch/cscal.c with a generic
implementation for comparison).
The complex multiplication uses terms of the form a * b + c * d for both
real and imaginary parts. The assembly code (and compiler-emitted code
as well) uses fused multiply add operations for the second product and
sum. The results can be "surprising", for example when both terms in the
imaginary part nearly cancel each other out. In that case, the second
product contributes more digits to the sum than the first product that
has been rounded before.
One option is to use separate multiplications (which then round the same
way) and a distinct add. Change the code to pursue that path, by (1)
requesting the compiler not to contract the operations into FMAs and (2)
replacing the assembly kernel with corresponding vectorized C code
(where change 1 also applies).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-21 13:10:05 +02:00
Marius Hillenbrand
77ea73f5e5
s390x: for clang use fp-contract=on instead of fast
...
Make clang slightly more cautious when contracting floating-point
operations (e.g., when applying fused multiply add) by setting
-ffp-contract=on (instead of fast).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-21 11:32:08 +02:00
Marius Hillenbrand
f91057cbad
s390x: move common vector definitions and utils into header
...
... to facilitate reuse beyond gemm_vec.c and avoid code duplication.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-21 11:32:08 +02:00
Martin Kroeker
992d7ca63d
Merge pull request #2845 from martin-frbg/lapack443
...
Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)
2020-09-18 23:18:41 +02:00
Martin Kroeker
7e4d5c237c
Fix workspace query in xGELQ (Reference-LAPACK PR443)
2020-09-18 09:19:46 +02:00
Martin Kroeker
8d12027a79
Merge pull request #86 from xianyi/develop
...
rebase
2020-09-18 09:17:49 +02:00
Martin Kroeker
b1e0bcceec
Merge pull request #2844 from RajalakshmiSR/daxpy_p10
...
Optimize daxpy/zaxpy for POWER10
2020-09-17 23:46:32 +02:00
Rajalakshmi Srinivasaraghavan
be43d2cb96
Optimize daxpy/zaxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-17 12:56:28 -05:00
Martin Kroeker
2855e6000c
Merge pull request #2841 from martin-frbg/cpp_gemvtest
...
Make thread safety tests available to CMAKE and support running only the GEMV version
2020-09-17 17:29:56 +02:00
Martin Kroeker
144a03446d
Merge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch
...
s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
2020-09-17 17:28:43 +02:00
Marius Hillenbrand
75d440caa0
s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
...
An unrelated commit and merge inadvertently reverted our recent two
changes for simplifying DYNAMIC_ARCH on s390x. Simply reapply the
changes.
Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.
To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.
We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.
Closes : #2842
Fixes: ba644378dc ("Copy BUILD_ options available to the compiler flags"
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-17 17:09:03 +02:00
Martin Kroeker
6abca76c4e
Add option for running only the less demanding GEMV version of the thread safety tests
2020-09-17 13:49:24 +02:00
Martin Kroeker
84c00c3c6e
Support running just the GEMV version of the thread safety test
2020-09-17 13:46:41 +02:00
Martin Kroeker
8c5c991bd7
Add cpp_thread_test options
2020-09-17 13:45:40 +02:00
Martin Kroeker
2e3b15d68b
Add CMakeLists.txt
2020-09-17 13:43:55 +02:00
Martin Kroeker
eaf7f825bd
Merge pull request #85 from xianyi/develop
...
rebase
2020-09-17 13:42:47 +02:00
Martin Kroeker
4c10a1673d
Merge pull request #2840 from martin-frbg/fixup2833
...
Fix for cmake BUILD_ settings PR 2833
2020-09-16 18:55:50 +02:00
Martin Kroeker
c4aeeeb9f4
Activate all BUILD_ options if none was specified
2020-09-15 23:15:34 +02:00
Martin Kroeker
3843bd188c
Merge pull request #84 from xianyi/develop
...
rebase
2020-09-15 23:13:30 +02:00
Martin Kroeker
ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm
...
Adding performance patch for trmm, just like trsm (#2836 )
2020-09-15 21:17:48 +02:00
fossum
dfeca46098
Adding performance patch for trmm, just like #2836
2020-09-15 08:59:50 -05:00
Martin Kroeker
f8950f40a2
Merge pull request #2836 from austinpagan/gordon_trsm
...
Fixing a performance bug in trsm_[LR].c.
2020-09-15 11:26:37 +02:00
fossum
274d6e015b
Fixing a performance bug in trsm_[LR].c.
2020-09-14 13:10:48 -05:00
Martin Kroeker
91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
...
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker
1ee1e7b495
Merge pull request #2833 from martin-frbg/issue2830
...
Make building the tests for individual data types conditional on the respective BUILD option
2020-09-14 07:24:23 +02:00
Martin Kroeker
ba644378dc
Copy BUILD_ options available to the compiler flags
2020-09-14 00:03:33 +02:00
Martin Kroeker
9e11c2d62f
Add BUILD_SINGLE etc
2020-09-13 23:55:11 +02:00
Martin Kroeker
4d250d0cdf
Rearrange ifdefs
2020-09-13 23:29:01 +02:00
Martin Kroeker
de139337b8
Remove spurious tests for complex ASUM and NRM2
2020-09-13 22:20:41 +02:00
Martin Kroeker
ec2948f147
Make tests conditional on BUILD_DOUBLE
2020-09-13 22:17:46 +02:00
Martin Kroeker
ce89398636
Make tests for individual variable types conditional on the respective BUILD_ option
2020-09-13 21:52:18 +02:00
Martin Kroeker
593ce9e237
Make building individual tests depend on BUILD_SINGLE etc defines
2020-09-13 21:50:12 +02:00
Martin Kroeker
74e358bcd5
Remove spurious complex16 tests
2020-09-13 21:49:01 +02:00
Martin Kroeker
26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests
2020-09-13 21:47:55 +02:00
Martin Kroeker
6b52c7e172
Merge pull request #2832 from martin-frbg/issue2831
...
Fix gfortran detection by vendor matching
2020-09-13 21:20:30 +02:00
Martin Kroeker
746ad3bd19
Fix vendor match for GCC gfortran
2020-09-13 18:40:59 +02:00
Martin Kroeker
55d4d470ec
Merge pull request #83 from xianyi/develop
...
rebase
2020-09-13 18:30:11 +02:00
Martin Kroeker
a270894730
Merge pull request #2829 from mhillenibm/clang_s390x
...
Fix DYNAMIC_ARCH=1 with clang s390x
2020-09-08 23:36:41 +02:00
Marius Hillenbrand
047b8d7aff
Add an s390 build with clang to the Travis configuration
...
Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.
Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.
LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-08 20:59:06 +02:00
Marius Hillenbrand
f7731a358a
Update CONTRIBUTERS.md - clang build fixes for IBM z
...
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-08 19:34:18 +02:00
Marius Hillenbrand
a55fe06f25
s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
...
Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-08 19:34:18 +02:00
Marius Hillenbrand
4f34bcfb5e
s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
...
... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.
To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.
We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-08 19:34:18 +02:00
Marius Hillenbrand
0629d8ebdb
s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang
...
Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-08 19:34:18 +02:00
Martin Kroeker
15da2f9acb
Merge pull request #2828 from martin-frbg/lapack438
...
Correct xLASET arguments in LAPACK EIG tests
2020-09-08 10:25:19 +02:00
Martin Kroeker
7d9c77f421
Correct dimension argument to xLASET
...
from Reference-LAPACK PR 438
2020-09-07 22:03:46 +02:00
Martin Kroeker
c8f029a518
Merge pull request #82 from xianyi/develop
...
rebase
2020-09-07 21:59:13 +02:00
Martin Kroeker
e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum
...
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
2020-09-06 18:32:15 +02:00
Martin Kroeker
6e0f6c5f00
Merge pull request #2824 from martin-frbg/asumbench
...
Use POSIX2001 clock.gettime in asum benchmark if available
2020-09-06 10:05:47 +02:00
Martin Kroeker
6f8fad87c5
Use POSIX2001 clock.gettime for higher resolution
2020-09-05 19:44:01 +02:00
Martin Kroeker
ed0f2d3dd7
Merge pull request #2816 from martin-frbg/silicon
...
Add basic support for Apple Vortex (ARM64) cpu
2020-09-05 19:17:59 +02:00
Martin Kroeker
43a31b7786
Merge pull request #2823 from martin-frbg/fix2778
...
Improve fix for lapack-test EIG/cchkhb2stg from PR 2778
2020-09-05 17:29:38 +02:00
Martin Kroeker
8a2a137a9e
Correct argument to SLASET (Improves fix from PR2778)
...
as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior
2020-09-05 13:06:31 +02:00
Martin Kroeker
0d1f30a297
Merge pull request #81 from xianyi/develop
...
rebase
2020-09-05 12:47:03 +02:00
Martin Kroeker
70a254d507
Merge pull request #2822 from martin-frbg/issue2821
...
Fix potential domain error in sqrt
2020-09-05 12:39:32 +02:00
Martin Kroeker
330044d821
Fix potentiol domain error in sqrt
2020-09-05 09:44:33 +02:00
Martin Kroeker
97636b2c8a
Merge pull request #2819 from h-vetinari/carry_lapack_437
...
Carry lapack#437
2020-09-04 23:50:43 +02:00
Martin Kroeker
4d36711547
Merge pull request #2820 from RajalakshmiSR/clang
...
POWER9: Fix mcpu option with clang
2020-09-04 23:09:31 +02:00
Rajalakshmi Srinivasaraghavan
718f67421a
POWER9: Fix mcpu option with clang
...
Adding check for compiler type before checking GCC version in Makefile.
This allows clang to use power9 instead of power8 when CORE is POWER9.
2020-09-04 10:36:19 -05:00
H. Vetinari
3426519ae2
adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h
2020-09-04 17:33:24 +02:00
H. Vetinari
1c6c71fa85
Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures
...
Based on how the surrounding functions in lapack.h are handling the
parameters, particularly the ?ggsv?3-variants of the affected functions
2020-09-04 17:33:11 +02:00
H. Vetinari
860247b5da
Follow-up to lapack#434 & lapack#409: fix signature mismatches
2020-09-04 17:32:53 +02:00
Martin Kroeker
c61771e335
Merge pull request #2778 from martin-frbg/lapackeig
...
Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite
2020-09-04 10:06:02 +02:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com >
2020-09-04 02:31:25 +08:00
Martin Kroeker
c7ef7174e4
Merge pull request #2817 from martin-frbg/lapack436
...
LAPACKE: fix declaration of work arrays in [cz]gesvdq
2020-09-03 17:10:23 +02:00
Martin Kroeker
775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX
2020-09-03 08:44:20 +02:00
Martin Kroeker
af5bc95503
Rename SILICON to VORTEX and fix duplicate numbering
2020-09-03 08:43:26 +02:00
Martin Kroeker
ea3a58c844
Rename SILICON to VORTEX
2020-09-03 08:38:53 +02:00
Martin Kroeker
17dca035de
rename SILICON to VORTEX
2020-09-03 08:38:08 +02:00
Gengxin Xie
1b0f17eeed
align to 64, using SSE when input size is small
2020-09-03 14:25:54 +08:00
Martin Kroeker
c31b72965e
Fix data type of work array in zgesvdq prototype
2020-09-02 23:44:44 +02:00
Martin Kroeker
0ce2aa3163
Fix data type of rwork array
2020-09-02 23:41:51 +02:00
Martin Kroeker
80794fe8fd
Create KERNEL.SILICON
2020-09-02 22:56:58 +02:00
Martin Kroeker
4a4d1ca6e0
Add AppleSIlicon cpu
2020-09-02 22:52:12 +02:00
Martin Kroeker
b37d17382a
Add Apple Silicon
2020-09-02 22:48:49 +02:00
Martin Kroeker
029fd01cfb
Detect AppleSilicon cpu on OSX
2020-09-02 22:47:38 +02:00
Martin Kroeker
9d1ea75aa0
Merge pull request #80 from xianyi/develop
...
rebase
2020-09-02 22:16:41 +02:00
Martin Kroeker
776d005f4c
Merge pull request #2815 from mhillenibm/clang_s390x
...
Fix build with clang on s390x
2020-09-02 16:56:01 +02:00
Marius Hillenbrand
2ee5b899ce
s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
...
The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
095f4e6964
s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)
...
gcc's default setting for floating-point expression contraction is
"fast", which allows the compiler to emit fused multiply adds instead of
separate multiplies and adds (amongst others). Fused multiply-adds,
which assembly kernels typically apply, also bring a significant
performance advantage to the C implementation for matrix-matrix
multiplication on s390x. To enable that performance advantage for builds
with clang, add -ffp-contract=fast to the compiler options.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
87e5bbd887
s390x: avoid variable-length arrays in struct for asm operands
...
... since it is not required and clang does not support that gcc
extension. Instead, use a variable-length array directly for these
operands.
Note that, while the actual inline assembly code does not directly use
these memory operands, they serve to inform the compiler that it cannot
reorder reads or writes to/from the input and output data across the
inline asm statements.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:31 +02:00
Marius Hillenbrand
b9b3265ec8
s390x: avoid inline assembly for vector loads for clang
...
... since clang does not support the instruction format for inline
assembly and also it is not required for current versions of clang.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:30 +02:00
Marius Hillenbrand
a1616a0b86
s390x: replace nop with "nop 0" in inline assembly
...
... as a bandaid for building with clang until LLVM's internal assembler
supports nops without operand.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:30 +02:00
Marius Hillenbrand
60ef193258
s390x: use "lghi" for immediate values to fix build with clang
...
Some of the kernels written in assembly utilize a "load address"
instruction for loading an immediate value into a register. That is
both unnecessarily complex and LLVM's assembler does not understand that
specific syntax. Thus, replace with the appropriate "load immediate"
instruction, which is also clearer to read.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-09-02 13:49:30 +02:00
Martin Kroeker
18bfb6d6f7
Merge pull request #2813 from martin-frbg/issue2804-2
...
Fix for c_check misinterpreting arm64 in uname -m output as armv7
2020-09-01 23:39:46 +02:00
Martin Kroeker
e4900caa11
Fix c_check misinterpreting arm64 in uname output to mean armv7
...
additionla fix for upcoming OSX on ARM64 related to #2804 , as suggested by fxcoudert in #2805
2020-09-01 19:54:08 +02:00
Martin Kroeker
68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
...
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker
4074770d00
Merge pull request #2797 from martin-frbg/relafixes1
...
ReLAPACK fixes
2020-09-01 16:04:03 +02:00
Martin Kroeker
b87a77da02
Merge pull request #79 from xianyi/develop
...
rebase
2020-09-01 12:03:53 +02:00
Martin Kroeker
f42e84d46c
Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ ( #2808 )
...
* Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_
* Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.
2020-09-01 10:44:48 +02:00
Martin Kroeker
0a4c5c4c44
Merge pull request #2807 from martin-frbg/issue2804
...
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker
3210a42734
Report cpu as ARMV8 instead of just giving up on non-Linux hosts
2020-08-31 20:03:21 +02:00
Martin Kroeker
5feb087c05
Handle Apple labeling armv8 as arm64 rather than aarch64
2020-08-31 20:02:08 +02:00
Gengxin Xie
448152cdd8
define __AVX2__ to ensure the haswell code compiled with avx2
2020-08-31 14:39:08 +08:00
Gengxin Xie
cb3c190a3a
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
2020-08-31 11:44:08 +08:00
Martin Kroeker
59e01b1aec
Merge pull request #2799 from RajalakshmiSR/p10_ger
...
POWER10: Avoid setting accumulators to zero in gemm kernels
2020-08-28 22:52:11 +02:00
Rajalakshmi Srinivasaraghavan
317ff27cda
POWER10: Avoid setting accumulators to zero in gemm kernels
...
For the first iteration, it is better to use xvf*ger instead of xvf*gerpp
builtins which helps to avoid setting accumulators to zero. This helps
to reduce few instructions.
2020-08-28 10:42:54 -05:00
Martin Kroeker
514a3d7d63
Merge pull request #2798 from kadler/aix-cpuid
...
Fix compile error on AIX cpuid detection
2020-08-28 08:30:59 +02:00
Kevin Adler
085aae8bdb
Fix compile error on AIX cpuid detection
...
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
2020-08-27 23:10:45 -05:00
Martin Kroeker
de63675717
Add early returns and fix sign errors in workspace calculations
2020-08-27 11:25:18 +02:00
Martin Kroeker
d64cc2be81
Add early returns
2020-08-27 11:22:50 +02:00
Martin Kroeker
c9b67141f0
Add early returns
2020-08-27 11:20:31 +02:00
Martin Kroeker
6797a3a1e0
Add early returns
2020-08-27 11:15:12 +02:00
Martin Kroeker
936966a42c
Make ILAENV and xGETRF2 functions available
2020-08-27 10:59:08 +02:00
Martin Kroeker
5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
...
Fix OMP num specify issue
2020-08-24 20:18:09 +02:00
Martin Kroeker
e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1
...
Add aliases for armv6, armv7
2020-08-24 08:03:39 +02:00
pkubaj
48a1364e10
Add aliases for armv6, armv7
...
FreeBSD uses those names for 32-bit ARM variants.
2020-08-23 18:50:19 +00:00
Chen, Guobing
0c1c903f1e
Fix OMP num specify issue
...
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com >
2020-08-24 02:45:54 +08:00
Martin Kroeker
a073fa870e
Merge pull request #2791 from martin-frbg/issue2787
...
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
2020-08-23 19:33:03 +02:00
Martin Kroeker
b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
2020-08-23 15:08:16 +02:00
Martin Kroeker
b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
...
Add OpenMP dependency to pkgconfig information if needed
2020-08-23 14:42:35 +02:00
Martin Kroeker
1840bc5b52
Add OpenMP dependency to pkgconfig file if needed
2020-08-22 13:55:18 +02:00
Martin Kroeker
7c0977c267
Add OpenMP dependency to pkgconfig file if needed
2020-08-22 13:53:44 +02:00
Martin Kroeker
fb3d80c42a
Merge pull request #78 from xianyi/develop
...
rebase
2020-08-22 13:52:29 +02:00
Martin Kroeker
9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
...
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker
bd3207b4b4
Update system.cmake
2020-08-19 22:51:10 +02:00
Martin Kroeker
b8ebfc9335
Update system.cmake
2020-08-19 22:30:19 +02:00
Martin Kroeker
7c1986640b
fallback from cooperlake to skylake if gcc<10
2020-08-19 20:48:39 +02:00
Martin Kroeker
71d33c952d
Typo fix
2020-08-19 17:44:23 +02:00
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
2020-08-19 17:22:12 +02:00
Martin Kroeker
430f741b30
-march=cooperlake requires gcc10
2020-08-19 17:17:53 +02:00
Martin Kroeker
6f4dc7445d
Fix typo
2020-08-19 16:36:55 +02:00
Martin Kroeker
81fbe8d088
-march=cooperlake only available in gcc >= 10
2020-08-19 16:10:15 +02:00
Martin Kroeker
bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1
2020-08-19 15:06:30 +02:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
...
Do not require pkg-config to generate the *.pc file
2020-08-19 14:42:58 +02:00
Albert Ziegenhagel
6b731d917f
Do not require pkg-config to generate the *.pc file
...
Generating the pkg-config file does not actually depend on pkg-config being available.
2020-08-18 08:48:48 +02:00
Martin Kroeker
5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
...
Add fallback typedef for bfloat16 to openblas_config.h template
2020-08-17 19:06:13 +02:00
Martin Kroeker
aa286e301b
Add typedef for bfloat16 if needed
2020-08-17 15:32:14 +02:00
Martin Kroeker
9f0ef9cdfc
Merge pull request #77 from xianyi/develop
...
rebase
2020-08-17 15:28:15 +02:00
Martin Kroeker
6bfc66663c
revert
2020-08-17 15:20:41 +02:00
Martin Kroeker
a8c6fb9e1c
revert
2020-08-17 15:20:16 +02:00
Martin Kroeker
5ec8f716cf
revert
2020-08-17 15:19:40 +02:00
Martin Kroeker
82f8a0aeba
Update .drone.yml
2020-08-15 15:46:18 +02:00
Martin Kroeker
d57d503c15
Update Makefile
2020-08-15 14:46:26 +02:00
Martin Kroeker
37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build
2020-08-15 13:38:05 +02:00
Martin Kroeker
6a93e3b2ba
Add simple sgemm preicsion test
2020-08-15 13:33:52 +02:00
Martin Kroeker
47ce1dd08f
Update gemm64.cpp
2020-08-15 13:31:28 +02:00
Martin Kroeker
f5fcc5baec
Add trivial gemm test for multithread consistency
2020-08-15 13:30:29 +02:00
Martin Kroeker
597010a968
Fix incorrect argument to SLASET
...
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:41:56 +02:00
Martin Kroeker
d64f1ef26b
Fix incorrect argument to SLASET
...
Reference-LAPACK issue 425 (and 318)
2020-08-14 00:40:24 +02:00
Martin Kroeker
c62aad62e5
Fix incorrect calls to DLASET
...
Reference-LAPACK issue 429
2020-08-14 00:35:45 +02:00
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
...
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
efdd237a91
Add a dedicated POWER9 build to the Travis CI ( #2774 )
...
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
2020-08-12 23:08:38 +02:00
Martin Kroeker
4573cb2f43
Merge pull request #2765 from martin-frbg/issue2760
...
Add memory barrier to the PPC blas_lock implementation for Linux
2020-08-11 22:40:17 +02:00
Martin Kroeker
2a4bb797db
Merge pull request #2773 from martin-frbg/issue2770
...
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 21:02:55 +02:00
Martin Kroeker
cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
...
s390x: GEMM tuning for z14
2020-08-11 18:14:09 +02:00
Martin Kroeker
619343278d
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0
2020-08-11 13:40:40 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Martin Kroeker
62f4c84f27
Merge pull request #76 from xianyi/develop
...
rebase
2020-08-11 13:25:12 +02:00
Marius Hillenbrand
e115c97e05
s390x/SGEMM: adjust default P and Q to multiples of M
...
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:56:46 +02:00
Marius Hillenbrand
07c334e7be
s390x: Factor out small block sizes for SGEMM/DGEMM on z14
...
For small register blockings that are too small to fill up vector
registers with column vectors, we currently use a generic code block.
Replace that with instantiations of the generic code as individual
functions, so that the compiler can optimize each one separately.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:56:39 +02:00
Marius Hillenbrand
e2828e30aa
s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
...
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.
Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-08-11 12:55:42 +02:00
Martin Kroeker
7219c9cb87
Merge pull request #2764 from martin-frbg/lapacktests
...
Fix array overruns in the LIN part of the LAPACK testsuite
2020-08-10 13:27:51 +02:00
Martin Kroeker
c9d32674ea
Add memory barrier to the blas_lock implementation for Linux
...
as recommended by cparrott73 in #2760
2020-08-09 19:17:04 +02:00
Martin Kroeker
64259d521a
Fix use of unallocated array in workspace query and wrong type of argument to xSCAL
2020-08-09 13:02:27 +02:00
Martin Kroeker
6f5ca44c1a
Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3
2020-08-09 12:59:20 +02:00
Martin Kroeker
d28b3f2776
Create Jenkinsfile for OSUOSL PowerCI
2020-08-08 18:05:20 +02:00
Martin Kroeker
ba3f7b3acf
Merge pull request #2761 from RajalakshmiSR/Makefile_err
...
Remove extra symbol in Makefile
2020-08-08 12:20:04 +02:00
Rajalakshmi Srinivasaraghavan
475b5c95b9
Remove extra symbol in Makefile
...
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker
cd60080d4a
Merge pull request #2758 from martin-frbg/undef_shift
...
Fix GCC ubsan warnings in x86_64 complex dot and gemv_t kernels
2020-08-03 23:30:26 +02:00
Martin Kroeker
4847bfdddd
Merge pull request #2757 from martin-frbg/cmake64
...
Fix lapack-tests linking to a suffixed libopenblas in cmake builds
2020-08-02 23:05:21 +02:00
Martin Kroeker
81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number
...
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number
...
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb
Multiply by two instead of left-shifting one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c
Multiply by two rather than left shift by one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Martin Kroeker
aaf1a17168
Apply current library name suffix
2020-08-02 17:58:33 +02:00
Martin Kroeker
53add6a80d
Apply library name suffix to openblas if any
2020-08-02 17:57:12 +02:00
Martin Kroeker
9eb897cc01
Merge pull request #75 from xianyi/develop
...
rebase
2020-08-02 17:50:06 +02:00
Martin Kroeker
7cead56258
Merge pull request #2753 from martin-frbg/issue2751
...
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-08-02 15:32:46 +02:00
Martin Kroeker
6794ac3415
Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed
2020-08-02 11:20:08 +02:00
Martin Kroeker
ecf4b9e0fc
Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition
2020-08-01 17:06:03 +02:00
Martin Kroeker
dfe5d09641
Merge pull request #2756 from martin-frbg/issue2755
...
Protect against inadvertent activation of USE_CUDA
2020-08-01 15:19:02 +02:00
Martin Kroeker
60cd5e55fc
Protect against inadvertent activation of USE_CUDA
2020-08-01 12:31:39 +02:00
Martin Kroeker
da9e2a7ada
Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes
2020-07-31 16:03:33 +02:00
Martin Kroeker
c88cbc5e0d
Merge pull request #2752 from kadler/cpuid_aix
...
Use systemcfg APIs for CPU detection on AIX
2020-07-31 12:52:24 +02:00
Kevin Adler
589c74aed3
Use systemcfg APIs for CPU detection on AIX
...
AIX libc already provides ready access to an integer that contains a bit
identifying the CPU it's running on, so there's no need to call a
program and grep its output. Additionally, prtconf is not available in
the PASE runtime, which provides an AIX emulation layer on the IBM i
operating system.
The AIX systemcfg.h also provides macro definitions like POWER_8,
POWER_9, etc for all the bits defining the CPUs as well as macros like
__power_8(), __power_9_andup() that return booleans, but I did not use
them. Since these macros depend on the level of the OS in which it is
built, they may not be defined and instead the associated hex literals
are used directly.
2020-07-30 21:06:40 -05:00
Martin Kroeker
104aa678b0
Fix inadvertent version number reversal to 0.3.9.dev caused by #2710
2020-07-30 11:40:52 +02:00
Martin Kroeker
c6b48e0394
Merge pull request #2749 from martin-frbg/make_ppc
...
Reorganize OpenMP build options for POWER and allow compiling for POWER9 with old gcc
2020-07-30 11:35:53 +02:00
Martin Kroeker
4927251298
Merge pull request #2750 from RajalakshmiSR/dgemv_p10
...
dgemv optimization for POWER10
2020-07-30 10:13:19 +02:00
Rajalakshmi Srinivasaraghavan
f77b6a83f4
dgemv optimization for POWER10
...
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1. Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Martin Kroeker
39724e8128
Separate OpenMP handling and allow compilation of Power9 code with older gcc
2020-07-30 01:14:08 +02:00
Martin Kroeker
525db5401c
Merge pull request #74 from xianyi/develop
...
rebase
2020-07-30 01:04:09 +02:00
Martin Kroeker
cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
...
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker
7c02f4b1f7
Merge pull request #2744 from martin-frbg/issue2738
...
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker
383262035d
Merge pull request #2740 from RajalakshmiSR/clang-power
...
Fix compilation issues with clang on POWER
2020-07-28 18:15:25 +02:00
Martin Kroeker
5fa581c87e
Put hint to use git develop rather than master branch in README
2020-07-28 14:22:41 +00:00
Martin Kroeker
12918358aa
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
...
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Martin Kroeker
200f5c44cc
Add AMD Renoir models and preliminary support for ZEN3 as ZEN2
...
also remap erroneous family 16 entry to BOBCAT and reclaim erroneous family 25 "Barcelona" for Zen3
2020-07-28 13:45:23 +00:00
Martin Kroeker
64e2e4aaf3
missing braces
2020-07-27 20:19:22 +00:00
Martin Kroeker
921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel
2020-07-27 19:54:46 +00:00
Rajalakshmi Srinivasaraghavan
d557584b71
Fix compilation issues with clang on POWER
...
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Martin Kroeker
a4ceb1ade9
Merge pull request #2737 from ashwinyes/add_thunderx3_target
...
ARM64: Add THUNDERX3T110 Target
2020-07-27 15:19:47 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
49b83e00b7
Merge pull request #2735 from martin-frbg/move_potrf
...
Move potrf_parallel.c from lapack/getrf to lapack/potrf where it belongs
2020-07-26 19:54:11 +02:00
Martin Kroeker
769ed9ffad
Merge pull request #2734 from RajalakshmiSR/p10_fix
...
Fix to store results in correct order for POWER10 GEMM kernels
2020-07-25 09:02:32 +02:00
Martin Kroeker
f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
...
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff )
the only changes since then were
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50
Change _STDC_VERSION__ to __STDC_VERSION__
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99
Delete potrf_parallel.c (moving it to ../potrf)
2020-07-25 06:42:39 +00:00
Rajalakshmi Srinivasaraghavan
9be2688c78
Fix to store results in correct order for POWER10 GEMM kernels
...
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker
6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694
...
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker
251a09ec90
Typo fix
2020-07-24 16:04:58 +00:00
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
2020-07-24 10:13:46 +00:00
Martin Kroeker
3523bb778e
Merge pull request #2721 from martin-frbg/p8align
...
Fix alignment errors in the power8 saxpy kernel
2020-07-24 11:06:20 +02:00
Martin Kroeker
a50d0e29c8
Merge pull request #2731 from martin-frbg/pgippc
...
Fixes for compilation on POWER with PGI compilers
2020-07-24 11:05:16 +02:00
Martin Kroeker
bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only
2020-07-23 20:40:13 +00:00
Martin Kroeker
ca3561cab9
Add ifdefs around call to altivec microkernel
2020-07-23 18:30:42 +00:00
Martin Kroeker
21072e502a
Typo fix
2020-07-23 17:34:56 +00:00
Martin Kroeker
7c6e56b5df
Rewrite assignment to complex for better portability
2020-07-23 17:10:59 +02:00
Martin Kroeker
661c6bfa5a
Exclude altivec code paths if the compiler does not support them
2020-07-23 17:08:20 +02:00
Martin Kroeker
9796e552ea
Avoid undefining NAME,CNAME etc for pgcc as it makes it ignore the new defininitions
2020-07-23 17:03:28 +02:00
Martin Kroeker
d6b6e5ccd7
Merge pull request #73 from xianyi/develop
...
rebase
2020-07-23 16:59:06 +02:00
Martin Kroeker
349b722d8d
Merge pull request #2729 from martin-frbg/issue2728
...
Unify BUFFER_SIZE settings for x86_64 again to fix DYNAMIC_ARCH crashes
2020-07-22 22:45:57 +02:00
Martin Kroeker
6c33764ca4
Unify BUFFER_SIZE settings for x86_64 again to fix potentially fatal mismatch in DYNAMIC_ARCH builds
2020-07-22 17:30:55 +00:00
Martin Kroeker
d1b9613fd4
Merge pull request #2727 from wyphan/develop
...
Patch for building on POWERPC with PGI compilers (was Patch for building on Summit)
2020-07-21 17:06:53 +02:00
Martin Kroeker
3cfc74b1a0
Merge pull request #2726 from martin-frbg/2725-2
...
Add detection of stdatomic.h for cmake
2020-07-21 16:42:06 +02:00
Wileam Phan
9ae154ba89
Patch for building on Summit
2020-07-20 23:30:28 -04:00
Martin Kroeker
9e21a100e3
Add trivial check for stdatomic.h
2020-07-20 22:52:09 +00:00
Martin Kroeker
31d30312dc
Merge pull request #72 from xianyi/develop
...
rebase
2020-07-21 00:49:12 +02:00
Martin Kroeker
fcfb7ffafb
Merge pull request #2725 from martin-frbg/ccheck_c11
...
Have c_check probe availability of C11 atomics support and stdatomic.h
2020-07-18 23:08:08 +02:00
Martin Kroeker
bbe119ee3b
Update conditional for atomics to use HAVE_C11
2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd
Update conditional for atomics to use HAVE_C11
2020-07-18 17:14:50 +00:00
Martin Kroeker
a36eb19ae0
Update conditional for C11 atomics to use HAVE_C11
2020-07-18 17:13:24 +00:00
Martin Kroeker
ce45af8151
Update conditional for atomics to use HAVE_C11
2020-07-18 17:09:56 +00:00
Martin Kroeker
6f38de06d2
Update conditional for atomics to use HAVE_C11
2020-07-18 17:09:01 +00:00
Martin Kroeker
09eb9d2584
Update conditional for atomics to HAVE_C11
2020-07-18 17:07:38 +00:00
Martin Kroeker
791e046744
Update conditional for atomics to use HAVE_C11
2020-07-18 17:05:59 +00:00
Martin Kroeker
94bab9d1f9
Update conditional for atomics to use HAVE_C11
2020-07-18 17:03:31 +00:00
Martin Kroeker
97d6eb97b1
Report availability of C11 support
2020-07-18 16:59:33 +00:00
Martin Kroeker
4afd11dae5
Add a check for C11 atomics and stdatomic.h
2020-07-18 16:57:41 +00:00
Martin Kroeker
72ec6280c7
Merge pull request #2724 from martin-frbg/loongsonreadme
...
Update cross-compiling example in README to reflect change in Loongson gcc
2020-07-18 18:08:40 +02:00
Martin Kroeker
26b7f24d16
Update cross-compiling example to reflect change in Loongson gcc
...
for #2723
2020-07-18 12:51:37 +00:00
Martin Kroeker
0db4218fed
Merge pull request #2722 from martin-frbg/cmakefcheck
...
Handle lack of fortran compiler more gracefully in cmake
2020-07-17 10:33:03 +02:00
Martin Kroeker
9d000ecaa2
include CheckLanguage module
2020-07-16 22:36:35 +00:00
Martin Kroeker
a847d00366
handle missing lack of fortran compiler more gracefully
2020-07-16 22:17:39 +00:00
Martin Kroeker
0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan
2020-07-16 23:32:54 +02:00
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
2020-07-15 10:00:07 +02:00
Martin Kroeker
4f5d26bb02
Merge pull request #2716 from RajalakshmiSR/p10_ldflag
...
Add new linker option for POWER10
2020-07-15 01:20:54 +02:00
Rajalakshmi Srinivasaraghavan
417c4e8af8
Add new linker option for POWER10
...
While building with DYNAMIC_ARCH on POWER9 with POWER10
aware toolchain, new LDFLAG is needed to avoid POWER10
instructions on PLT calls .
2020-07-14 11:54:04 -05:00
Martin Kroeker
da17abec87
fix trailing whitespace
2020-07-14 18:20:03 +02:00
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
2020-07-14 18:11:19 +02:00
Martin Kroeker
b144423f0f
Do not define USE_TRMM for 32bit POWER8
2020-07-14 18:10:12 +02:00
Martin Kroeker
bd2498c886
Use POWER6 GEMM parameters on 32bit POWER8
2020-07-14 18:07:58 +02:00
Martin Kroeker
d8e2edfc20
Merge pull request #71 from xianyi/develop
...
rebase
2020-07-14 18:01:34 +02:00
Martin Kroeker
419b8686d1
Merge pull request #2682 from martin-frbg/aix
...
[WIP] fix compilation on AIX
2020-07-13 14:43:24 +02:00
Martin Kroeker
3ab15ff34c
Merge pull request #2651 from leezu/actionsflang
...
Add flang build to Github Actions
2020-07-13 13:00:39 +02:00
Martin Kroeker
8916c4ae2c
Merge branch 'develop' into actionsflang
2020-07-12 20:37:29 +02:00
Martin Kroeker
4fa283de66
Merge pull request #2706 from jussienko/use-always-omp-threads
...
Fix OpenMP builds defaulting to singlethreading with OMP_PLACES or OMP_PROC_BIND set
2020-07-12 20:17:11 +02:00
Martin Kroeker
5865c7d4d6
Make 32bit POWER8 use POWER6 kernels for now
2020-07-12 18:59:01 +02:00
Martin Kroeker
ae3a90f78f
merge overwritten part of power10 support
2020-07-12 18:51:58 +02:00
Martin Kroeker
009864edde
Merge pull request #2710 from martin-frbg/cmake-lapacktest
...
Add LAPACK-TESTING to the cmake build
2020-07-10 12:06:50 +02:00
Martin Kroeker
3de80b3f5a
Merge pull request #2713 from RajalakshmiSR/p10-gcc10
...
Change minimum gcc version for POWER10
2020-07-10 10:43:33 +02:00
Rajalakshmi Srinivasaraghavan
af1e140e35
Change minimum gcc version for POWER10
...
As the MMA patches for POWER10 are backported to gcc10.2, changing
the minimum gcc version needed to build OpenBLAS for POWER10.
2020-07-09 21:46:06 -05:00
Martin Kroeker
d4a0299e16
Do not build lapack-test on MSVC for now (same as with BLAS test)
2020-07-09 13:57:27 +02:00
Martin Kroeker
f766024749
enable fortran for cmake
2020-07-09 13:44:25 +02:00
Martin Kroeker
c502760bef
Modify for building with OpenBLAS
2020-07-09 13:13:16 +02:00
Martin Kroeker
29b5887d5f
Modify for building with OpenBLAS
2020-07-09 13:12:35 +02:00
Martin Kroeker
60188a8c82
Append crude hack for enabling lapack tests in the OpenBLAS build
2020-07-09 11:44:31 +02:00
Martin Kroeker
1d63631afe
Add lapack-test
2020-07-09 11:42:02 +02:00
Martin Kroeker
e82bb953a7
Merge pull request #2708 from RajalakshmiSR/p10_future
...
Changing mcpu option as power10
2020-07-08 12:26:44 +02:00
Martin Kroeker
ed7e155c35
Merge branch 'develop' into aix
2020-07-07 18:52:06 +02:00
Rajalakshmi Srinivasaraghavan
45d819ca82
Changing mcpu option as power10
...
As compiler enabled mcpu option as power10, changing it from future.
2020-07-07 11:25:20 -05:00
Martin Kroeker
8751a69271
Obtain actual cpu count on AIX and suppress spurious NO_AVX512 on non-x86
2020-07-07 15:46:32 +02:00
Jussi Enkovaara
10a2923f64
fixes #2238
...
Always obey omp_get_max_threads() when build with USE_OPENMP
2020-07-07 13:35:43 +03:00
Martin Kroeker
5ff83a4261
Merge pull request #2670 from mhillenibm/dumpfullversion_on_gcc7
...
RFC: Use -dumpfullversion to get minor version on gcc-7 and newer
2020-07-07 00:12:28 +02:00
Martin Kroeker
5bc9680a86
Merge pull request #2703 from martin-frbg/issue2702
...
Compatibility fix for gcc < 4.7
2020-07-02 22:32:51 +02:00
Martin Kroeker
4ab3651591
Option -mavx2 requires at least gcc 4.7
2020-07-02 17:00:15 +02:00
Martin Kroeker
a83680b40b
Merge pull request #69 from xianyi/develop
...
rebase
2020-07-02 16:56:00 +02:00
Martin Kroeker
c3aa036e99
Merge pull request #2693 from EGuesnet/AIX-build-on-POWER8-32bits
...
AIX build on POWER8 32bits
2020-07-01 08:29:52 +02:00
EGuesnet
634e1305f9
Update cgemm_kernel_8x4_power8.S
2020-06-30 15:16:39 +02:00
Martin Kroeker
c467516132
Merge pull request #2688 from martin-frbg/cometlake
...
Add autodetection of Intel Comet Lake H and S models
2020-06-27 17:47:24 +02:00
Martin Kroeker
83f4746825
Add support for Comet Lake H and S
2020-06-27 14:41:24 +02:00
Martin Kroeker
584ef8d4ae
Add support for Comet Lake H & S
2020-06-27 14:36:37 +02:00
Martin Kroeker
8dfda02e89
Merge pull request #68 from xianyi/develop
...
rebase
2020-06-27 14:29:29 +02:00
Martin Kroeker
28d69e0097
Merge pull request #2687 from martin-frbg/utfbom
...
Strip UTF8 byte order marker from source files
2020-06-26 22:53:09 +02:00
Martin Kroeker
c2467c9619
Merge pull request #2686 from RajalakshmiSR/p10_shgemm
...
powerpc: Optimized SHGEMM kernel for POWER10
2020-06-26 22:52:45 +02:00
Martin Kroeker
f86e749df4
Merge pull request #2683 from mtreinish/add-comet-lake-support
...
Add cpu detection support for comet lake U
2020-06-26 12:11:03 +02:00
Martin Kroeker
d199c2787d
Merge pull request #2680 from kavanabhat/aix_makefile_fix
...
Fix for #2671
2020-06-26 11:27:28 +02:00
Martin Kroeker
e30ad0e521
Strip UTF8 byte order marker from source
2020-06-26 09:00:43 +02:00
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
...
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Matthew Treinish
2f9c10810c
Also set CPUTYPE in get_cpuname()
2020-06-25 15:53:56 -04:00
Matthew Treinish
f37e941d52
Add support to driver/others/dynamic.c too
2020-06-25 11:56:49 -04:00
Matthew Treinish
2a91452bdd
Add cpu detection support for comet lake U
...
Comet Lake U CPUs have family: 6, model: 6, extended family: 0, and
extended model: 10 were not being correctly detected by GETARCH during
openblas builds and would show CORE=UNKNOWN and LIBCORE=unknown. This
commit adds the necessary information to cpuid_x86 to detect extended
family 10 model 6 and return the proper core information. It's
essentially just a skylake cpu, not skylake x, so I just took the used
the same return fields as skylake.
2020-06-25 11:32:09 -04:00
Martin Kroeker
c854ef5471
Fix variable names in conditional
2020-06-25 13:29:52 +02:00
Martin Kroeker
c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
...
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Martin Kroeker
c592f0f80a
Fix utest build on AIX
2020-06-25 12:58:13 +02:00
Martin Kroeker
3f613b1301
Tentative changes for building on AIX
2020-06-25 12:57:00 +02:00
Martin Kroeker
72a0ec8e75
Fix reading of CPU name from prtconf output on AIX
2020-06-25 12:55:10 +02:00
Martin Kroeker
3446e58daf
Fix handling of uname output on AIX
2020-06-25 12:31:35 +02:00
Martin Kroeker
4ca8becc4b
Merge pull request #67 from xianyi/develop
...
rebase
2020-06-25 10:33:03 +02:00
Martin Kroeker
dfe819f3bd
Merge pull request #2679 from RajalakshmiSR/P10_GEMM
...
POWER10 Optimized GEMM kernels
2020-06-25 08:31:38 +02:00
Martin Kroeker
4369e52555
Merge pull request #2677 from brada4/develop
...
address vs2019 C4293 warning
2020-06-25 08:31:17 +02:00
Gordon Fossum
bb2f52844b
powerpc: Optimized ZGEMM kernel for POWER10
...
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan
571eadb880
powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
...
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat
df4ade070f
Fix for #2671
2020-06-24 04:25:47 -05:00
User User-User
e6b9275034
address vs2019 C4293
2020-06-24 09:12:23 +03:00
Martin Kroeker
53ea5bfece
Merge pull request #66 from xianyi/develop
...
rebase
2020-06-23 10:13:44 +02:00
Martin Kroeker
93592d1260
Merge pull request #2675 from wjc404/develop
...
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
Martin Kroeker
6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
...
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker
45d542c9d1
Merge pull request #65 from xianyi/develop
...
rebase
2020-06-21 12:41:01 +02:00
wjc404
086d87a302
AVX512 dgemm tcopy_16 function
2020-06-20 00:07:43 +08:00
Martin Kroeker
af501eb753
Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection
...
Zarch fix gcc detection
2020-06-17 17:55:25 +02:00
Martin Kroeker
0eb6c4dded
Merge pull request #2672 from mhillenibm/test_num_threads
...
cpp_thread_test: Change adjustment of concurrency on systems with <52 hw threads
2020-06-17 17:54:31 +02:00
Marius Hillenbrand
de838c38ef
cpp_thread_test/dgemv: fail early if concurrency is zero
...
The two test cases dgemv_tester and dgemm_tester accept the degree of
concurrency as command line argument (amongst others). Fail early if
value 0 has been specified, instead of later with less-clear symptoms.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-17 16:15:44 +02:00
Marius Hillenbrand
478898b37a
cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems
...
... instead of (number of hw threads - 4) to avoid invalid numbers on
smaller systems. Currently, systems with 4 or fewer CPUs (e.g., small CI
VMs) would fail the test. Fixes one of the issues discussed in #2668
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-17 16:08:48 +02:00
Marius Hillenbrand
cde4690721
RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x
...
In gcc-7.1, the behavior of -dumpversion changed to be configured
at compile-time. On some distributions it only dumps the major version
(e.g., Ubuntu), so the current checks for the gcc minor version report
false negatives. As a replacement, gcc-7.1 introduced -dumpfullversion
which always prints the full version.
Update the gcc version detection in Makefile.system to employ
-dumpfullversion with gcc-7 and newer.
Posting this patch for discussion, since it emerged from discussions
around issue #2668 and PR #2669 . It is not solving a problem right now,
but may be useful in the future.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:45:59 +02:00
Marius Hillenbrand
2389291766
Makefile.system: remove duplicate variable GCCVERSIONGT5
...
... to bring unified gcc version detection with common variables to the
one remaining spot in Makefile.system.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:06:03 +02:00
Marius Hillenbrand
a2d13ea611
Fix gcc version detection for zarch
...
Employ common variables for gcc version detection and fix the broken
check for gcc >= 5.2.
Fixes #2668
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-06-16 15:06:03 +02:00
Martin Kroeker
1bd3cd66c2
Increment version to 0.3.10.dev
2020-06-14 22:05:19 +02:00
Martin Kroeker
1c53e1366d
Increment version to 0.3.10.dev
2020-06-14 22:04:37 +02:00
Martin Kroeker
63b03efc2a
Merge pull request #2667 from xianyi/develop
...
Merge develop into 0.3.0 for 0.3.10 release
2020-06-14 22:03:04 +02:00
Martin Kroeker
95dbeff66d
Merge branch 'release-0.3.0' into develop
2020-06-14 22:02:45 +02:00
Martin Kroeker
3b673a24b7
Increment version to 0.3.10.dev
2020-06-14 21:57:52 +02:00
Martin Kroeker
1eb1979050
Increment version to 0.3.10.dev
2020-06-14 21:57:15 +02:00
Martin Kroeker
efc53b6e7e
Merge pull request #2665 from martin-frbg/flang-fixes-2a
...
Fix spelling of flang option -Mrecursive, add -Kieee and workaround for AOCC optimizer bug
2020-06-14 21:56:08 +02:00
Martin Kroeker
72888497e2
Update with 0.3.10 changes
2020-06-14 21:55:31 +02:00
Martin Kroeker
7e3e006af6
Merge pull request #2666 from martin-frbg/blastest
...
Update BLAS tests to what netlib 3.9.0 uses
2020-06-14 18:28:37 +02:00
Martin Kroeker
d906d14402
Merge pull request #2664 from ACSimon33/exported_symbols
...
Add missing exported symbols.
2020-06-14 18:27:03 +02:00
Martin Kroeker
3785c0e82b
Merge pull request #2663 from martin-frbg/issue2654
...
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-14 18:26:43 +02:00
Martin Kroeker
f2d8879af6
Merge pull request #2661 from martin-frbg/issue2660
...
Report selected DYNAMIC_ARCH kernel rather than one of its aliases in gotoblas_corename
2020-06-14 18:25:37 +02:00
Martin Kroeker
6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead
2020-06-14 17:40:24 +02:00
Martin Kroeker
79cdcde717
Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang
2020-06-14 17:18:16 +02:00
Martin Kroeker
18a11137f1
Update BLAS tests to correspond to Reference-LAPACK 3.9.0
...
replaces calculation of machine precision with call to epsilon intrinsic and removes the requirement for previous output files to be removed before rerunning tests
2020-06-14 10:26:25 +02:00
Martin Kroeker
1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee
2020-06-14 00:09:31 +02:00
Martin Kroeker
0ed2adf0b2
Fix spelling of flang option -Mrecursive and add -Kieee
2020-06-14 00:01:20 +02:00
Martin Kroeker
abf670757b
Respect predefined defaults for AR, AS, LD and RANLIB
2020-06-13 23:21:13 +02:00
Simon Märtens
41fc6f3cd2
Added missing exported symbols.
2020-06-13 22:37:39 +02:00
Martin Kroeker
007d9f97d7
Make gotoblas_corename report the name of the selected TARGET rather than its aliases
2020-06-13 19:25:28 +02:00
Martin Kroeker
63d26090f5
Merge pull request #64 from xianyi/develop
...
rebase
2020-06-13 19:14:47 +02:00
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
...
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker
3a1b58d54a
Merge pull request #2653 from craft-zhang/cortex-a53
...
fix INIT8x4 of SGEMM on Arm Cortex-A53
2020-06-10 12:19:33 +02:00
Martin Kroeker
f7659be4a0
Merge pull request #2652 from martin-frbg/flang-fixes
...
Fixes for compilation with flang binary release 20190329
2020-06-09 20:31:06 +02:00
ZhangDanfeng
bc6fd20a40
fix INIT8x4
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-10 01:01:16 +08:00
Martin Kroeker
3ce469a34f
Limit optimization level to O1 for flang and add -frecursive
2020-06-09 16:11:13 +02:00
Martin Kroeker
ba2c5b404d
When building with flang, use it also for the final link step to get dependencies right
2020-06-09 16:09:34 +02:00
Martin Kroeker
f07a80354b
Apply previously AOCC-specific workaround to all versions of flang
2020-06-09 16:07:03 +02:00
Martin Kroeker
fdd1b50263
Merge pull request #63 from xianyi/develop
...
rebase
2020-06-09 15:54:30 +02:00
Leonard Lausen
b98923f33a
Test enforce -O1 for flang
2020-06-09 06:54:47 +00:00
Leonard Lausen
4cb1db0e3b
Test flang build
2020-06-09 06:31:17 +00:00
Martin Kroeker
430e8b45fe
Merge pull request #2648 from martin-frbg/lapack411
...
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
2020-06-07 19:45:52 +02:00
Martin Kroeker
88fe85f4e0
Merge pull request #2647 from martin-frbg/aocc-flang
...
Small fixes for flang in general and the AMD AOCC version of it in particular
2020-06-07 19:45:11 +02:00
Martin Kroeker
89091e6b64
Merge pull request #2645 from martin-frbg/misc_fixes
...
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker
522aaf53bf
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
...
Reference-LAPACK issue 411
2020-06-07 14:30:20 +02:00
Martin Kroeker
c3574ffe53
Merge pull request #2646 from wjc404/develop
...
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
Martin Kroeker
4e28dc6353
Use only -O1 with AMD AOCC version of flang
...
to prevent miscompilation of LAPACK codes and tests on Ryzen
2020-06-07 00:05:02 +02:00
Martin Kroeker
13c28889a2
Update "cosmetic fixes for non-C99 compilers"
2020-06-06 15:22:27 +02:00
wjc404
0e3ac4a06b
Add files via upload
2020-06-06 14:56:57 +08:00
Martin Kroeker
28915eed72
Cosmetic fixes for non-C99 compilers
2020-06-05 10:05:34 +02:00
Martin Kroeker
7f60fb6b91
Delete spurious copy of common_param.h
2020-06-05 10:04:16 +02:00
Martin Kroeker
0464e662ad
make blas_quickdivide unsigned and guard against miscompilation
2020-06-05 10:03:36 +02:00
Martin Kroeker
0f9a935a5a
Merge pull request #62 from xianyi/develop
...
rebase
2020-06-05 09:51:06 +02:00
Martin Kroeker
79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
...
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
2020-06-04 19:07:27 +02:00
Martin Kroeker
32c1c1e125
Update azure-pipelines.yml
2020-06-04 19:03:46 +02:00
Martin Kroeker
f1953b8b81
Update azure-pipelines.yml
2020-06-04 17:58:13 +02:00
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-04 14:45:31 +02:00
Martin Kroeker
729303e5ed
Merge pull request #2643 from craft-zhang/cortex-a53
...
Improve performance of SGEMM on Arm Cortex-A53
2020-06-04 07:58:45 +02:00
Martin Kroeker
547965530f
Merge pull request #2638 from leezu/actions
...
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
2020-06-04 00:02:37 +02:00
ZhangDanfeng
9b7877ccf1
sgemm copy source init
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1
Insert prefetch
...
Signed-off-by: ZhangDanfeng <467688405@qq.com >
2020-06-04 02:08:48 +08:00
Martin Kroeker
3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
...
Work around PPC G4 test failures
2020-06-03 16:43:46 +02:00
Martin Kroeker
a8f42ae85c
set cmake build type to Release
2020-06-03 15:28:59 +02:00
Martin Kroeker
e6e2e531bc
revert clang pragma
2020-06-03 15:16:27 +02:00
Martin Kroeker
456dc04441
Update sgemm_kernel_16x4_skylakex_3.c
2020-06-03 15:15:41 +02:00
Martin Kroeker
89323458a9
preset optimization level for apple clang
2020-06-03 15:07:25 +02:00
Martin Kroeker
e153bdeb70
Update dynamic_arch.yml
2020-06-03 13:46:43 +02:00
Martin Kroeker
c2001f7756
Make cmake build verbose to see options in use
2020-06-03 12:18:15 +02:00
Martin Kroeker
c2b3f0b3f6
Revert "keep Apple Clang from optimizing this"
2020-06-03 10:22:15 +02:00
Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
2020-06-03 09:15:29 +02:00
Martin Kroeker
b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
...
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
Martin Kroeker
9f7358d7dc
Keep Apple Clang from optimizing this
2020-06-03 08:52:53 +02:00
Martin Kroeker
54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers)
2020-06-02 17:31:45 +02:00
Leonard Lausen
5a709b8340
Print CPU info in output
2020-06-01 20:51:11 +00:00
Leonard Lausen
b31a68b835
Add Github Actions test for DYNAMIC_ARCH builds
2020-06-01 20:16:59 +00:00
Martin Kroeker
86552bf4c7
Update f_check
2020-05-31 15:22:12 +02:00
Martin Kroeker
a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
...
Fix CMAKE build Issues on OS X
2020-05-31 15:16:09 +02:00
Martin Kroeker
4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)
2020-05-31 12:39:36 +02:00
Martin Kroeker
909897f13b
Document option USE_LOCKING
2020-05-31 12:37:57 +02:00
Martin Kroeker
e79245acd9
Merge pull request #2635 from ilayn/patch-1
...
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:37:12 +02:00
Ilhan Polat
76d2612e0c
BUG: Fix the loop range in ZHEEQUB.f
2020-05-30 14:11:11 +02:00
Martin Kroeker
ced49466f0
Use the fortran compiler to link LAPACK-related benchmarks
...
to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.
2020-05-29 13:35:51 +02:00
Martin Kroeker
6e270f91ec
add support for RETURN_BY_STACK semantics, e.g. clang
2020-05-29 13:29:10 +02:00
Martin Kroeker
200296b0f4
remove libomp from link list only for pgfortran
...
at least the AMD (aocc) flavor of flang wants to link to a (real or dummy) libomp by default
2020-05-29 13:23:51 +02:00
Martin Kroeker
dd7a650792
Merge pull request #59 from xianyi/develop
...
rebase
2020-05-29 13:06:25 +02:00
Martin Kroeker
4a4c50a7ce
Merge pull request #2627 from pkubaj/patch-1
...
Add powerpc (32-bit)
2020-05-26 08:36:24 +02:00
Martin Kroeker
d069780e63
Merge pull request #2626 from docularxu/working-gcc-version-detections
...
make GCC version detection OS-independent
2020-05-26 08:35:58 +02:00
pkubaj
33c8790603
Add powerpc (32-bit)
...
Only powerpc64 is present.
2020-05-25 13:14:09 +02:00
Guodong Xu
06387ac0e6
make GCC version detection OS-independent
...
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.
The fix is to move GCC version detection into common part, not
wrapped by anything.
Signed-off-by: Guodong Xu <guodong.xu@linaro.com >
2020-05-25 10:57:03 +00:00
Martin Kroeker
f1a18d245b
Merge pull request #2618 from craft-zhang/cortex-A53
...
Improve performance of SGEMM and STRMM on Arm Cortex-A53
2020-05-25 12:14:46 +02:00
张丹枫
2a3aa91354
update CONTRIBUTORS.md, adding myself
2020-05-20 22:35:26 +08:00
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
2020-05-20 22:34:47 +08:00
张丹枫
9df79ae9a3
update sgemm and strmm kernel selecting strategy
2020-05-20 22:26:58 +08:00
张丹枫
a1fc6041cd
use general register to speedup
2020-05-20 22:26:58 +08:00
张丹枫
edb423d772
align general register using to strmm_kernel_8x8
2020-05-20 22:26:58 +08:00
zhangdanfeng
0e6eb8c247
sgemm kernel use sgemm_kernel_8x8_cortexa53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn >
2020-05-20 22:26:58 +08:00
zhangdanfeng
d475db29c6
optimized for cortex-a53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn >
2020-05-20 22:26:58 +08:00
Martin Kroeker
729ac6bd4a
Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
...
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
2020-05-20 14:51:04 +02:00
Marius Hillenbrand
89fe17f20e
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
...
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-20 10:23:35 +02:00
Marius Hillenbrand
bdd795ed03
s390x/GEMM: replace 0-init with peeled first iteration
...
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-20 10:23:35 +02:00
Martin Kroeker
e1038ea836
Merge pull request #2622 from martin-frbg/issue2619
...
Improve declaration of LAPACKE_get_nancheck
2020-05-19 23:07:22 +02:00
Martin Kroeker
6baa9a778d
Improve declaration of LAPACKE_get_nancheck
2020-05-19 17:59:31 +02:00
Martin Kroeker
cf46c9f84e
Merge pull request #2617 from martin-frbg/issue2616
...
Add workaround for unhandled gmake jobserver flags in c_check/f_check
2020-05-18 13:23:58 +02:00
Martin Kroeker
55602fce56
Ignore spurious all-numeric library names derived from mishandled jobserver flags
2020-05-17 15:28:14 +02:00
Martin Kroeker
3d5e159e7a
Ignore spurious all-numeric library names derived from mishandled jobserver flags
2020-05-17 15:26:57 +02:00
Martin Kroeker
2931feb575
Merge pull request #58 from xianyi/develop
...
rebase
2020-05-17 15:23:32 +02:00
Martin Kroeker
20245ded5f
Merge pull request #2615 from mhillenibm/z14_alignment_hints
...
s390x: improvise vector alignment hints for older compilers
2020-05-14 21:06:34 +02:00
Marius Hillenbrand
2840432e49
s390x: improvise vector alignment hints for older compilers
...
Introduce inline assembly so that we can employ vector loads with
alignment hints on older compilers (pre gcc-9), since these are still
used in distributions such as RHEL 8 and Ubuntu 18.04 LTS.
Informing the hardware about alignment can speed up vector loads. For
that purpose, we can encode hints about 8-byte or 16-byte alignment of
the memory operand into the opcodes. gcc-9 and newer automatically emit
such hints, where applicable. Add a bit of inline assembly that achieves
the same for older compilers. Since an older binutils may not know about
the additional operand for the hints, we explicitly encode the opcode in
hex.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-14 15:36:03 +02:00
Martin Kroeker
ea78106c71
Merge pull request #2614 from mhillenibm/gemm_vec_z14
...
s390x: Improve performance of SGEMM and STRMM on z14 and newer
2020-05-13 15:09:23 +02:00
Marius Hillenbrand
cb9dc36dd5
Update CONTRIBUTORS.md
...
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 16:14:00 +02:00
Marius Hillenbrand
1b0b4349a1
s390x/Z14: Change register blocking for SGEMM to 16x4
...
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.
On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
71b6eaf459
s390x: Use new sgemm kernel also for strmm on Z14 and newer
...
Employ the newly added GEMM kernel also for STRMM on Z14. The
implementation in C with vector intrinsics exploits FP32 SIMD operations
and thereby gains performance over the existing assembly code. Extend
the implementation for handling triangular matrix multiplication,
accordingly. As added benefit, the more flexible C code enables us to
adjust register blocking in the subsequent commit.
Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
43c0d4f312
s390x: Add vectorized sgemm kernel for Z14 and newer
...
Add a new GEMM kernel implementation to exploit the FP32 SIMD
operations introduced with z14 and employ it for SGEMM on z14 and newer
architectures.
The SIMD extensions introduced with z13 support operations on
double-sized scalars in vector registers. Thus, the existing SGEMM code
would extend floats to doubles before operating on them. z14 extended
SIMD support to operations on 32-bit floats. By employing these
instructions, we can operate on twice the number of scalars per
instruction (four floats in each vector registers) and avoid the
conversion operations.
The code is written in C with explicit vectorization. In experiments,
this kernel improves performance on z14 and z15 by around 2x over the
current implementation in assembly. The flexibilty of the C code paves
the way for adjustments in subsequent commits.
Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking (e.g., partial register blocks with
fewer than UNROLL_M rows and/or fewer than UNROLL_N columns).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Marius Hillenbrand
d7c1677c20
Update CONTRIBUTORS.md, adding myself
...
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:09:28 +02:00
Marius Hillenbrand
0dbe61a612
s390x: choose SIMD kernels at run-time based on OS and compiler support
...
Extend and simplify the run-time detection for dynamic architecture support for z
to check HW_CAP and only use SIMD features if advertised by the OS.
While at it, also honor the env variable LD_HWCAP_MASK and do not use
the CPU features masked there.
Note that we can only use the SIMD features on z13 or newer (i.e.,
Vector Facility or Vector-Enhancements Facilities) when the operating
system supports properly context-switching the vector registers. The OS
advertises that support as a bit in the HW_CAP value in the auxiliary
vector. While all recent Linux kernels have that support, we should
maintain compatibility with older versions that may still be in use.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
62cf391cbb
s390x: only build kernels supported by gcc with dynamic arch support
...
When building with dynamic arch support, only build kernels for
architectures that are supported by the gcc we are building with.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Marius Hillenbrand
8c338616f9
s390x: gate dynamic arch detection on gcc version and add generic
...
When building OpenBLAS with DYNAMIC_ARCH=1 on s390x (aka zarch), make
sure to include support for systems without the facilities introduced
with z13 (i.e., zarch_generic). Adjust runtime detection to fallback to
that generic code when running on a unknown platform other than Z13
through Z15.
When detecting a Z13 or newer system, add a check for gcc support for
the architecture-specific features before selecting the respective
kernel. Fallback to Z13 or generic code, in case.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 11:01:16 +02:00
Martin Kroeker
f94c53ec0a
Merge pull request #2612 from RajalakshmiSR/testshgemm
...
Improve shgemm test
2020-05-12 08:34:02 +02:00
Rajalakshmi Srinivasaraghavan
8efba9b7c0
Improve shgemm test
...
This patch adds another check to test shgemm results.
2020-05-11 17:15:10 -05:00
Martin Kroeker
4fffa556d8
Merge pull request #2611 from RajalakshmiSR/bench_half
...
Include shgemm in benchtest
2020-05-11 21:08:41 +02:00
Rajalakshmi Srinivasaraghavan
ce90e2bd3f
Include shgemm in benchtest
...
This patch is to enable benchtest for half precision gemm
when BUILD_HALF is set during make.
2020-05-11 09:57:46 -05:00
Martin Kroeker
948b6712ba
Merge pull request #2610 from martin-frbg/issue2552-3
...
Temporary workaround for excessive LAPACK test failures with COMPLEX on Skylake-X
2020-05-10 13:10:31 +02:00
Martin Kroeker
2271c3506b
Work around excessive LAPACK test failures on Skylake-X
...
Something in the plain C parts of x86_64 cscal.c and zscal.c appears to be miscompiled by both gfortran9 and ifort when compiling for skylakex-avx512, even when the optimized Haswell microkernel is not in use.
2020-05-09 23:49:18 +02:00
Martin Kroeker
db00b21445
Merge pull request #2609 from martin-frbg/issue2552-2
...
Correct ifort options
2020-05-09 21:33:02 +02:00
Martin Kroeker
58d26b4448
Correct ifort options
...
to same as suggested by reference-lapack
2020-05-09 17:15:36 +02:00
Martin Kroeker
8e47d14053
Merge pull request #2608 from martin-frbg/issue2604
...
Handle trailing whitespace and empty variables in KERNEL files
2020-05-09 16:36:14 +02:00
Martin Kroeker
cd10b35fe9
Handle trailing spaces and empty condition variables
2020-05-09 13:42:33 +02:00
Martin Kroeker
9472dd99cd
Merge pull request #57 from xianyi/develop
...
rebase
2020-05-09 13:20:44 +02:00
Martin Kroeker
7181665452
Merge pull request #2605 from RajalakshmiSR/cmake-power
...
Fix cmake compilation issue - POWER9
2020-05-09 11:29:28 +02:00
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
...
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
2020-05-08 20:31:56 -05:00
Martin Kroeker
63e45def70
Merge pull request #2603 from martin-frbg/issue2552
...
Add FFLAGS_DRV entry to the generated make.inc to fix lapack-test failure with Intel compilers
2020-05-08 22:08:39 +02:00
Martin Kroeker
ec0f228632
Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort
...
fixes #2552
2020-05-08 18:06:12 +02:00
Martin Kroeker
90e2941c61
Merge pull request #56 from xianyi/develop
...
rebase
2020-05-07 22:43:48 +02:00
Martin Kroeker
10d5f3c87b
Merge pull request #2602 from ashwinyes/thunderx2_develop
...
DAXPY Optimizations for ThunderX2
2020-05-07 22:06:41 +02:00
Ashwin Sekhar T K
8353cb245a
ARM64: Improve DAXPY for ThunderX2
...
Improve performance of DAXPY for ThunderX2
when the vector fits in L1 Cache.
2020-05-07 09:22:50 -07:00
Martin Kroeker
ec2dd7b875
Merge pull request #2601 from martin-frbg/issue818
...
Undefine NAME/CNAME etc in Makefile.system before defining them
2020-05-07 10:12:33 +02:00
Martin Kroeker
4e82eb9f8a
Undefine ASMNAME/NAME/CNAME before defining them
...
to avoid redefinition warning when environment variables like CFLAGS are being used (fixes #818 )
2020-05-07 00:31:32 +02:00
Martin Kroeker
61300bb735
Merge pull request #55 from xianyi/develop
...
rebase
2020-05-07 00:27:14 +02:00
Martin Kroeker
33e9b12464
Merge pull request #2597 from martin-frbg/appleclang
...
Use Clang 9.0.0 miscompilation fix for corresponding AppleClang version as well
2020-05-05 13:55:08 +02:00
Martin Kroeker
90dba9f716
Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version
...
As discussed on the original PR #2329 , the "Apple Clang 11.0.3" that appears to be based the same LLVM release produces the same miscompilation of this file.
2020-05-05 10:44:50 +02:00
Martin Kroeker
424d551e01
Merge pull request #53 from xianyi/develop
...
rebase
2020-05-01 15:18:46 +02:00
Martin Kroeker
596f5df9e8
Merge pull request #2591 from RajalakshmiSR/testhalf
...
Add test for shgemm
2020-05-01 09:59:39 +02:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
a54e35e780
Merge pull request #2586 from martin-frbg/miscfixes
...
Trivial fix for compiler warnings
2020-04-29 22:01:41 +02:00
Rajalakshmi Srinivasaraghavan
564b0d39ef
Add test for shgemm
...
This patch has Makefile changes to add test for shgemm which
compares sgemm and shgemm result.
2020-04-29 13:40:34 -05:00
Martin Kroeker
5d58b11101
Merge pull request #52 from xianyi/develop
...
rebase
2020-04-29 14:36:15 +02:00
Martin Kroeker
d394d4e677
Merge pull request #2585 from martin-frbg/mips64fix
...
Increase default BUFFER_SIZE on MIPS64
2020-04-28 19:47:55 +02:00
Martin Kroeker
f4248af26e
Fix compiler warnings
2020-04-28 10:43:12 +02:00
Martin Kroeker
2d89603e9d
Increase BUFFER_SIZE on mips64 to match SGEMM parameters
2020-04-28 10:40:40 +02:00
Martin Kroeker
26bc15258a
Merge pull request #51 from xianyi/develop
...
rebase
2020-04-28 10:38:50 +02:00
Martin Kroeker
141998dce2
Merge pull request #2584 from martin-frbg/issue2583
...
[WIP] Have CMAKE parse conditional lines in KERNEL files
2020-04-28 10:35:12 +02:00
Martin Kroeker
3bd56846bb
Silence a debug message
2020-04-27 16:27:09 +02:00
Martin Kroeker
e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
...
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker
b6795db731
Merge pull request #2582 from martin-frbg/mips32fix
...
Increase BUFFER_SIZE on MIPS32 to accomodate SGEMM requirements
2020-04-27 09:18:34 +02:00
Martin Kroeker
5e0dbf8dfe
Increase default BUFFER_SIZE to accomodate SGEMM parameters
...
in response to compile-time warning from #2551
2020-04-26 22:21:05 +02:00
Martin Kroeker
955d73127f
Merge pull request #50 from xianyi/develop
...
rebase
2020-04-26 22:17:56 +02:00
Martin Kroeker
a8c1bea7ae
Merge pull request #2581 from martin-frbg/raji
...
Fix travis configuration and update CONTRIBUTORS.md
2020-04-25 19:57:10 +02:00
Martin Kroeker
e43b49e064
Drop the set -e from travis scripts
2020-04-25 16:18:54 +02:00
Martin Kroeker
3e28db7f38
Update CONTRIBUTORS.md
2020-04-25 13:51:44 +02:00
Martin Kroeker
4b69ee31af
Merge pull request #2580 from martin-frbg/issue2538-3
...
Increase POWER8 ZGEMM_R and use same R values for POWER9
2020-04-25 00:28:18 +02:00
Martin Kroeker
03ff213c51
Increase POWER8 ZGEMM_R and use same R values for POWER9
...
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
2020-04-24 21:46:54 +02:00
Martin Kroeker
299d1c8de0
Merge pull request #2578 from martin-frbg/issue2576
...
Quote getarch include paths in prebuild.cmake
2020-04-24 14:32:46 +02:00
Martin Kroeker
70869d571f
Quote include paths for getarch to protect any embedded spaces
2020-04-24 10:30:44 +02:00
Martin Kroeker
cba87222b2
Merge pull request #49 from xianyi/develop
...
rebase
2020-04-24 10:21:48 +02:00
Martin Kroeker
f80dd2151e
xcode 11.4.1 for homebrew ?
2020-04-23 14:31:09 +02:00
Martin Kroeker
4412ee1754
Switch homebrew build env to new xcode 11.4
...
default 11.3.1 in the github image is causing brew to fail with "outdated xcode" message
2020-04-23 10:54:46 +02:00
Martin Kroeker
f6104b68c1
Merge pull request #2571 from martin-frbg/issue2299
...
Work around IDAMAX/IZAMAX bugs on POWER8BE with ELFv2 FreeBSD
2020-04-22 18:27:13 +02:00
Martin Kroeker
84f2c71e93
Merge pull request #2573 from martin-frbg/issue2572
...
Enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 15:04:49 +02:00
Martin Kroeker
06208c8d01
Limit this fix to ELFv2 builds
2020-04-22 14:16:40 +02:00
Martin Kroeker
c90b28dee6
Export ELF_VERSION for use in powerpc kernel configurations
2020-04-22 14:14:20 +02:00
Martin Kroeker
6275b43918
Avoid duplicate printout of byte order and report ELF_VERSION
2020-04-22 14:12:27 +02:00
Martin Kroeker
2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 11:01:28 +02:00
Martin Kroeker
57549f5c92
Merge pull request #2569 from martin-frbg/issue2472-2
...
Fix linker option passing for MSVS and ReLAPACK
2020-04-21 20:26:53 +02:00
Martin Kroeker
f5c4c28b98
Work around POWER8BE bugs on FreeBSD (ELFv2)
...
for #2299
2020-04-21 17:17:17 +02:00
Martin Kroeker
239282d5e2
Use CMAKE_SHARED_LINKER_FLAGS to pass MSVC linker option
...
target_link_libraries does not work here according to issue 2472
2020-04-20 22:30:51 +02:00
Martin Kroeker
568674477c
Merge pull request #48 from xianyi/develop
...
rebase
2020-04-20 21:51:59 +02:00
Martin Kroeker
fa42588e1f
Merge pull request #2565 from martin-frbg/mips24k
...
Support MIPS32 24K family as P5600
2020-04-20 17:13:53 +02:00
Martin Kroeker
8a6d26458b
Merge pull request #2559 from RajalakshmiSR/shgemm
...
Add half precision gemm for bfloat16 in OpenBLAS
2020-04-19 22:09:55 +02:00
Martin Kroeker
db86f516b9
Merge pull request #2568 from martin-frbg/azure-win
...
Add a Windows/CL build job to the Azure CI
2020-04-19 19:06:33 +02:00
Martin Kroeker
aec353b5a7
Add a Windows/CL build to the Azure Ci configuration
2020-04-19 19:04:33 +02:00
Martin Kroeker
c62fbefad4
Merge pull request #2567 from xianyi/revert-2566-azurewin
...
Revert "Add Windows build job on Azure CI"
2020-04-19 19:01:58 +02:00
Martin Kroeker
04706e760d
Revert "Add Windows build job on Azure CI ( #2566 )"
...
This reverts commit e1e543b145 .
2020-04-19 19:00:37 +02:00
Martin Kroeker
e1e543b145
Add Windows build job on Azure CI ( #2566 )
...
* Add Windows-CL build job on Azure
2020-04-19 16:16:15 +02:00
Martin Kroeker
e55ec82bb9
Delete KERNEL.1004K
2020-04-19 15:44:30 +02:00
Martin Kroeker
7353ea5afc
Delete KERNEL.24K
2020-04-19 15:44:19 +02:00
Martin Kroeker
6a04efb122
Rename KERNEL files to include MIPS prefix
2020-04-19 15:43:54 +02:00
Martin Kroeker
5afb66812f
Update getarch.c
2020-04-19 14:55:31 +02:00
Martin Kroeker
0d18f231fc
Update getarch.c
2020-04-19 13:52:58 +02:00
Martin Kroeker
2f4a8e5bc4
Rename the FORCE entries for 24K and 1004K to include the MIPS prefix
2020-04-19 13:22:19 +02:00
Martin Kroeker
4f70512b97
Update kernel.cmake
2020-04-19 08:10:26 +02:00
Martin Kroeker
8792fc4d5f
Disable RPCC macro on MIPS24K
2020-04-19 07:21:48 +02:00
Martin Kroeker
577c5d9f8f
Update README.md
2020-04-19 06:54:52 +02:00
Martin Kroeker
6721f2750e
Update TargetList.txt
2020-04-19 06:51:57 +02:00
Martin Kroeker
b0b02a080d
Add compiler options for MIPS32 24K/1004K
2020-04-19 06:50:51 +02:00
Martin Kroeker
a1fc98dc57
rename 1004K, 24K to MIPS1004K, MIPS24K to avoid identifier naming problem
2020-04-18 23:50:23 +02:00
Martin Kroeker
d0737b0142
Update kernel.cmake
2020-04-18 21:36:28 +02:00
Martin Kroeker
7dbb59b256
Update common_macro.h
2020-04-18 21:34:14 +02:00
Martin Kroeker
00172d440b
Typo fix in MIPS24K addition
2020-04-18 21:16:49 +02:00
Martin Kroeker
d712ea724c
Add MIPS24K support
2020-04-18 21:10:18 +02:00
Martin Kroeker
61bbae3ac1
Handle MIPS24K like P5600
...
and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)
2020-04-18 21:09:32 +02:00
Martin Kroeker
1c1ca2bc0a
Merge pull request #47 from xianyi/develop
...
rebase
2020-04-18 21:07:14 +02:00
Martin Kroeker
c7d668c248
Update common_macro.h
2020-04-18 16:04:38 +02:00
Martin Kroeker
a83a59b038
Use generic kernels for ishama,shasum,shdot,shrot
2020-04-18 15:53:51 +02:00
Martin Kroeker
0a19bd813c
Use generic codes for shamax and shcopy
2020-04-18 12:52:51 +02:00
Martin Kroeker
e7afe8a969
Define AXPBY_K fallback for float16
2020-04-18 11:10:15 +02:00
Martin Kroeker
f361de30a3
Use generic axpy.c for SHAXPY as x86 lacks saxpy.c
2020-04-18 11:07:16 +02:00
Martin Kroeker
9f6d6f6cb6
use saxpy.c instead of axpy.S for SHAXPY
2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan
22bb50fb81
cmake fixes
2020-04-17 13:35:17 -05:00
Martin Kroeker
236a3d8ce6
Merge pull request #2563 from zelong-1024/develop
...
[OpenBLAS]: benchmark error of potrf
2020-04-16 11:45:32 +02:00
l00536773
6b7ef6543a
[OpenBLAS]: benchmark error of potrf
...
[description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F.
[solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation.
[dts]:
2020-04-16 10:55:10 +08:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16
Fix warnings in clang and export symbol
2020-04-15 19:15:23 -05:00
Martin Kroeker
250e6f8039
Merge pull request #2557 from martin-frbg/dronebadge
...
Update and reformat README
2020-04-15 20:23:43 +02:00
Martin Kroeker
7a6d0016b0
Merge pull request #2556 from martin-frbg/epicdrone
...
Add a drone.io multithread test for x86_64
2020-04-15 20:23:17 +02:00
Martin Kroeker
e8e8a6e608
Restore USE_OPENMP in the x86 thread test
2020-04-15 19:26:12 +02:00
Martin Kroeker
579811fb6a
Move all 19.04-based jobs back to ubuntu 18.04
2020-04-15 17:38:33 +02:00
Rajalakshmi Srinivasaraghavan
a87793e03c
Fix DYNAMIC_ARCH compilation errors
2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan
ac6a22ae78
Update header
2020-04-14 22:58:39 -05:00
Rajalakshmi Srinivasaraghavan
ff010f496e
Build shgemm for all architecture
2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
84a9614345
try x86_64 test without openmp
2020-04-14 19:18:35 +02:00
Martin Kroeker
b969533703
Add drone.io badge, mention EMAG8180 support, reformat the DYNAMIC_ARCH paragraph
2020-04-14 10:53:28 +02:00
Martin Kroeker
0f08f3efa6
Add a multithread test for x86_64
2020-04-13 22:46:12 +02:00
Martin Kroeker
c861b2a7bd
Merge pull request #2553 from martin-frbg/issue2444
...
Add a read memory barrier to the traversal of the buffer slot list
2020-04-13 21:28:59 +02:00
Martin Kroeker
cf62adffbb
Merge pull request #2555 from martin-frbg/issue1137
...
Handle unaligned data in the SSE2 copy kernel
2020-04-13 18:29:56 +02:00
Martin Kroeker
3eec7d382c
ARMV7 does not support DMB ISHLD, use DMB ISH
2020-04-13 15:56:31 +02:00
Martin Kroeker
5b0093b5fe
Convert aligned moves to unaligned
...
should have no performance impact on reasonably modern cpus and fixes occasional crashes in actual user code.
2020-04-13 14:58:52 +02:00
Martin Kroeker
f41600e66f
Add a read barrier in the traversing of the buffer list
...
Needed on systems with weak memory ordering - the inferior, partially working fix from #2544 was already removed in #2551
2020-04-13 12:34:02 +02:00
Martin Kroeker
f5efecb7ca
Add (empty) read barrier definition
2020-04-13 12:24:10 +02:00
Martin Kroeker
a52bdd9d7b
Add (empty) read barrier definition
2020-04-13 12:22:35 +02:00
Martin Kroeker
db3226a646
Add (empty) read barrier definition
2020-04-13 12:18:48 +02:00
Martin Kroeker
69b6e258d8
Add (empty) read barrier definition
2020-04-13 12:17:41 +02:00
Martin Kroeker
3d4db4d002
Add read barrier definition
2020-04-13 12:16:44 +02:00
Martin Kroeker
99dde1d2c9
Add read barrier definition
2020-04-13 12:14:58 +02:00
Martin Kroeker
ee6b3df02c
Add read barrier definition
2020-04-13 12:14:06 +02:00
Martin Kroeker
25e879fe92
Add (empty) read barrier definition
2020-04-13 12:12:54 +02:00
Martin Kroeker
d237dc1360
Add read barrier definition
2020-04-13 12:11:58 +02:00
Martin Kroeker
8692456226
Add read barrier definition
2020-04-13 12:10:37 +02:00
Martin Kroeker
d1d69e1b9a
Add read barrier definition
2020-04-13 12:09:24 +02:00
Martin Kroeker
20d0cb2f65
Merge pull request #46 from xianyi/develop
...
rebase
2020-04-13 12:06:40 +02:00
Martin Kroeker
e7f0da9295
Merge pull request #2551 from martin-frbg/issue2538-2
...
Increase BUFFER_SIZEs and add a safeguard; supply GEMM_R for POWER8/9
2020-04-12 22:34:41 +02:00
Martin Kroeker
e9bfa2291a
Fix parameter overflow
2020-04-12 19:47:02 +02:00
Martin Kroeker
2a28448a96
Add safeguards for sufficient BUFFER_SIZE
2020-04-12 19:45:36 +02:00
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
...
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
f73391c9c9
Merge pull request #45 from xianyi/develop
...
rebase
2020-04-12 19:39:05 +02:00
Martin Kroeker
7905383cb5
Merge pull request #2547 from sharvil/develop
...
Add API to set thread affinity on Linux.
2020-04-11 00:35:38 +02:00
Martin Kroeker
a8cbd451bf
Merge pull request #2541 from bapt/develop
...
libname: treat FreeBSD and DragonFly like linux and sunos
2020-04-11 00:35:07 +02:00
Martin Kroeker
eecd8c3204
Merge pull request #2548 from gxw-loongson/develop
...
Add a GENERIC target for 64bit MIPS
2020-04-11 00:34:04 +02:00
Martin Kroeker
ea85eb2e02
Merge pull request #2549 from martin-frbg/fixthreadtest
...
Match thread count in cpp_thread_test to host capability
2020-04-10 23:54:40 +02:00
Martin Kroeker
66f89c0aaf
Match thread count to machine capability
2020-04-10 22:06:44 +02:00
gxw
8d07cf9b67
Fix compilation problem on loongson platform
...
Using "make TARGET=GENERIC" on loongson platform will get the following
error messages:
"make[1]: *** No rule to make target 'sgemm_incopy.o', needed by 'libs'"
Add kernel/mips64/KERNEL.generic to slove the problem.
2020-04-09 19:28:15 +08:00
Sharvil Nanavati
7b4773b24d
Add API to set thread affinity on Linux.
...
Issue: #2545
2020-04-08 12:49:35 -07:00
Martin Kroeker
69f277f8ee
Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues ( #2544 )
...
* Add another memory barrier in memory.c to prevent races in memory slot allocation
* Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores
2020-04-08 11:04:51 +02:00
Martin Kroeker
3a6d51c2fd
Merge pull request #44 from xianyi/develop
...
Add a Z13 build to the Travis configuration (#2542 )
2020-04-04 22:48:53 +02:00
Martin Kroeker
1c7771df96
Merge pull request #43 from martin-frbg/revert-42-z12ci
...
Revert 42 z12ci to keep forked develop clean
2020-04-04 22:46:58 +02:00
Martin Kroeker
a56c9ec52a
Revert "Add IBM Z to Travis configuration ( #42 )"
...
This reverts commit 7972beb375 .
2020-04-04 22:45:01 +02:00
Martin Kroeker
4ae6d1a01b
Add a Z13 build to the Travis configuration ( #2542 )
...
* Add IBM Z to Travis configuration
2020-04-03 16:02:11 +02:00
Martin Kroeker
7972beb375
Add IBM Z to Travis configuration ( #42 )
...
* Add IBM Z to Travis configuration
2020-04-03 15:59:18 +02:00
Baptiste Daroussin
41e802443a
libname: treat FreeBSD and DragonFly like linux and sunos
...
There is no difference in the way libnames are handle between FreeBSD
and linux or sunos. FreeBSD and DragonFly prefers having sonames as well
2020-04-03 06:20:42 +02:00
Martin Kroeker
7bd8624b79
Merge pull request #41 from xianyi/develop
...
rebase
2020-04-02 10:32:19 +02:00
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
...
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
f059e614eb
Merge pull request #2536 from martin-frbg/recurs
...
Add "recursive" option for LAPACK builds with ifort or pgfort as well
2020-04-01 20:00:13 +02:00
Martin Kroeker
e13b6773ee
ifort and pgfort need "recursive" for safe compilation of LAPACK as well
2020-04-01 15:39:16 +02:00
Martin Kroeker
a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
...
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker
c6af9bbb32
Merge pull request #2534 from martin-frbg/issue2496
...
Fix zero initialization for beta=0 case
2020-03-31 20:53:13 +02:00
Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
2020-03-31 16:53:56 +02:00
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
...
use immediate initialization instead of multiplication in case register content is a NaN
2020-03-31 00:21:02 +02:00
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
...
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
Martin Kroeker
018bb3e433
Merge pull request #2533 from martin-frbg/gemmdirect2
...
Use runtime check for AVX512 capability in DYNAMIC_ARCH builds made on SKX
2020-03-30 20:15:37 +02:00
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
2020-03-26 21:25:39 +01:00
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
2020-03-26 21:12:56 +01:00
Martin Kroeker
a986d42ea6
Merge pull request #39 from xianyi/develop
...
rebase
2020-03-26 21:06:51 +01:00
Martin Kroeker
b6a948fbee
Merge pull request #2530 from martin-frbg/dynmsg
...
Add message highlighting minimum target choice at end of DYNAMIC_ARCH…
2020-03-24 15:44:46 +01:00
Martin Kroeker
0cc352417e
Merge pull request #2529 from shengyang-3390/dev1
...
add ctest for drotm and modified ctest for drot.
2020-03-24 15:44:27 +01:00
Martin Kroeker
fe47dc8673
Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds
...
related to #2526
2020-03-23 19:35:51 +01:00
Martin Kroeker
9f67d03d3b
Merge pull request #2527 from martin-frbg/gemmdirect
...
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-23 12:47:19 +01:00
shengyang
50f4fb2fbd
add ctest for drotm and modified ctest for drot.
...
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-23 10:47:05 +08:00
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-22 14:33:16 +01:00
Martin Kroeker
8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
...
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker
5f6f6a2c7d
Merge pull request #2525 from andreas-schwab/develop
...
Fix ARCHCONFIG for Neoverse-N1
2020-03-21 18:47:48 +01:00
Andreas Schwab
71cf2acdef
Fix ARCHCONFIG for Neoverse-N1
...
../config_kernel.h:24:9: warning: missing whitespace after the macro name
24 | #define ARMV8-march armv8.2-a
| ^~~~~
2020-03-21 17:35:42 +01:00
Martin Kroeker
1d9773b800
Use proper extension on the avx512 testcase filename
...
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker
a46a8c4956
Merge pull request #2518 from shengyang-3390/dev
...
add ctest for srotm and modified ctest for srot.
2020-03-20 23:00:06 +01:00
Martin Kroeker
7ae737e04c
Merge pull request #2519 from martin-frbg/issue2472
...
Fix cmake compilation with ifort on Windows
2020-03-20 22:57:44 +01:00
wjc404
64daad4365
Update param.h
2020-03-20 21:46:18 +00:00
wjc404
b8307768e2
Add files via upload
2020-03-21 05:42:10 +08:00
Martin Kroeker
6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
...
tentative fix for #2472
2020-03-20 01:08:10 +01:00
Martin Kroeker
c0da205412
Merge pull request #38 from xianyi/develop
...
rebase
2020-03-20 01:05:22 +01:00
shengyang
a06d78556d
add ctest for srotm and modified ctest for srot.
...
make sure that test cases cover all code path when kernel uses looping unrolling.
2020-03-18 14:17:32 +08:00
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
...
Temporary fix for SKX STRSM
2020-03-17 10:12:53 +01:00
wjc404
62b9608986
Update KERNEL.SKYLAKEX
2020-03-17 12:52:55 +08:00
Martin Kroeker
717c604aeb
Merge pull request #2515 from zelong-1024/develop
...
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
2020-03-16 21:59:55 +01:00
Martin Kroeker
ce33da4cab
Merge pull request #2513 from aaawuanjun/develop
...
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
2020-03-16 21:58:55 +01:00
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
...
AVX2 STRSM kernels
2020-03-16 21:58:34 +01:00
wjc404
cdc0e9011e
Update KERNEL.ZEN
2020-03-16 16:39:37 +00:00
wjc404
fa049d49c2
AVX2 STRSM kernel
2020-03-17 00:34:08 +08:00
l00536773
d45c53ecf1
[OpenBLAS]: benchmark for her/her2 LEVEL2 functions
...
[description]: benchmark for her/her2
[solution]: added benchmark for her/her2, modified makefile in benchmark
[dts]:
2020-03-16 11:19:05 +08:00
Martin Kroeker
c2840997db
Merge pull request #2508 from liujingjue/develop
...
[OpenBLAS]:fix the iamax benchmark error
2020-03-14 14:21:30 +01:00
Martin Kroeker
c775458299
Merge pull request #2512 from martin-frbg/lapackh
...
Move declarations of lapack_complex_custom types outside the extern C
2020-03-14 13:27:40 +01:00
Martin Kroeker
c0649aa694
Merge pull request #2506 from xiaofengF/develop
...
Add benchmark for SPMV and fix segmentation fault when data size >= 50000
2020-03-14 13:08:36 +01:00
wuanjun 00447568
2428dc9fd3
[OpenBlas]: Add benchmark tpsv file and modify benchmark/Makefile
...
[Description]: Solve lack of tpsv benchmark.
2020-03-14 09:11:08 +08:00
Martin Kroeker
0fe0914437
Merge pull request #2505 from aaawuanjun/develop
...
[OpenBlas]:Add benchmark tpmv.c and modify benchmark/Makefile
2020-03-13 23:16:35 +01:00
Martin Kroeker
4cfd8a3f57
Merge pull request #2511 from martin-frbg/fixppctest
...
Prevent attempts to run ctest or test when fortran is not available
2020-03-13 23:04:01 +01:00
Martin Kroeker
ee2e758278
Move declarations of lapack_complex_custom types outside the extern C
...
fixes #2510
2020-03-13 20:34:13 +01:00
Martin Kroeker
2d8781b0dc
Do not attempt to run test without fortran
2020-03-13 20:11:19 +01:00
Martin Kroeker
c436e8af7b
Do not attempt to run ctest without fortran
...
The main Makefile takes care of this in the build process, but users or CI jobs may try to run this directly
2020-03-13 20:10:26 +01:00
l00546269
a0a3bf7c81
[OpenBLAS]:fix the iamax benchmark error
...
[Description]:the result for i?amax is not MFlops, it is MBytes
2020-03-13 10:58:39 +08:00
jayfely@qq.com
ae3f2c2e49
Remove cspmv and zspmv to remove the error occured in travis CI
2020-03-11 17:02:34 +08:00
jayfely@qq.com
83ecf9fea7
Modify Makefile in interface to remove the error occured in travis CI
2020-03-11 16:36:45 +08:00
jayfely@qq.com
649733ff15
Only keep spmv.goto and spmv.atlas
2020-03-11 15:48:58 +08:00
wuanjun 00447568
3e8f1c6cc5
[OpenBlas]:Add benchmark tpmv.c and modify Makefile
...
[Description]:Solve the problem of missing tpmv.c benchmark file
2020-03-11 12:31:48 +08:00
jayfely@qq.com
2f4c5bb3a9
Update spmv.c: solve segmentation fault when m and n are larger than 50000
2020-03-11 10:30:09 +08:00
Martin Kroeker
4e1c4e67d4
Merge pull request #2503 from martin-frbg/xerbl
...
Apply fix for LAPACK issue 394 (fixed-form code beyond column 72)
2020-03-10 23:38:07 +01:00
Martin Kroeker
b9a2a3c540
Merge pull request #2502 from martin-frbg/issue2497
...
Fix INTERFACE64 not propagating to the fortran codes on ARMV8
2020-03-10 20:01:23 +01:00
Martin Kroeker
047dfb216d
Merge pull request #2501 from jijiwawa/Fix_mistakes
...
Fix pr #2487 error
2020-03-10 16:44:40 +01:00
s00527847
cd8871f1a1
Use the correct unit of measure
2020-03-10 19:26:06 -04:00
Martin Kroeker
b25ae1fc60
Apply fix for Reference-LAPACK issue 394
...
reference to XERBLA extending beyond column 72, breaking builds with compilers that default to traditional punch card format
2020-03-10 13:37:41 +01:00
Martin Kroeker
3f7f7ab7e2
Restore INTERFACE64 for arm64
2020-03-10 12:51:07 +01:00
Martin Kroeker
9c22170f52
Merge pull request #37 from xianyi/develop
...
rebase
2020-03-10 12:49:21 +01:00
jayfely@qq.com
08e1d8cbae
Modify Makefile in Benchmark
2020-03-10 14:32:18 +08:00
jayfely@qq.com
ff40a4e726
Add benchmark for SPMV
2020-03-10 14:22:18 +08:00
Zhang Xianyi
51019feae1
Merge pull request #2498 from njutcz/develop
...
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
2020-03-09 16:04:33 +08:00
s00548429
bec7923a0d
Fix the functional bugs for zamax.
2020-03-09 15:36:50 +08:00
s00548429
c5bdd21352
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
2020-03-09 14:59:03 +08:00
njutcz
d2d16d091e
Merge pull request #1 from xianyi/develop
...
update
2020-03-09 10:39:40 +08:00
Martin Kroeker
b6a6ccbbea
Merge pull request #2495 from ZuoQ3/develop
...
add benchmark for axpby test
2020-03-08 08:09:58 +01:00
Martin Kroeker
8b720f7365
Merge pull request #2494 from shengyang-3390/develop
...
add benchmark for csrot and zdrot
2020-03-07 23:04:21 +01:00
Martin Kroeker
14df234edb
Merge pull request #2489 from jijiwawa/brightness
...
Remove redundant code
2020-03-07 22:26:00 +01:00
s00527847
bbeda55b7b
add trmm.c
2020-03-07 13:09:19 -05:00
s00527847
efcf89aec7
Remove redundant code
2020-03-07 12:03:05 -05:00
Martin Kroeker
37d456f7e0
Merge pull request #2493 from martin-frbg/plainmake
...
Fix use of make vs $(MAKE) in building lapack-testing
2020-03-07 16:55:53 +01:00
Martin Kroeker
0b9e96922b
Merge pull request #2488 from liujingjue/develop
...
Modify the main Makefile in OpenBLAS
2020-03-07 16:52:29 +01:00
zq
0c8162eba6
Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby
2020-03-07 17:48:55 +08:00
zq
9a94a30132
Merge pull request #1 from xianyi/develop
...
update
2020-03-07 17:04:59 +08:00
shengyang
09c7a191bd
add benchmark for csrot and zdrot
...
modified: benchmark/Makefile
modified: benchmark/rot.c
2020-03-07 15:17:49 +08:00
l00546269
8a8df530e2
[OpenBLAS]:modifed the Makefile
...
[Description]: check the compiler version and show the detail info
2020-03-07 10:14:33 +08:00
Martin Kroeker
37f46f2fa0
Fix another spot where make was used instead of $(MAKE)
...
Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax
2020-03-06 15:37:26 +01:00
Martin Kroeker
9afc561be4
Merge pull request #36 from xianyi/develop
...
rebase
2020-03-06 15:32:27 +01:00
Martin Kroeker
dca3e0cf20
Merge pull request #2491 from chenxuqiang/hbmv_benchmark
...
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
2020-03-06 15:06:42 +01:00
Martin Kroeker
c9f8db979b
Merge pull request #2490 from shengyang-3390/develop
...
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
2020-03-06 15:05:55 +01:00
Martin Kroeker
18099de976
Merge pull request #2487 from jijiwawa/develop
...
add benchmark for spr/spr2
2020-03-06 14:42:25 +01:00
Martin Kroeker
97c36ca58c
Merge branch 'develop' into develop
2020-03-06 14:41:40 +01:00
Martin Kroeker
9f5a74f3c7
Merge pull request #2486 from qqqil/develop
...
add benchmark for trsv
2020-03-06 14:30:09 +01:00
Martin Kroeker
2afb10975d
Merge pull request #2485 from Darkness303/develop
...
Add syr2 benchmark
2020-03-06 14:29:27 +01:00
Martin Kroeker
dbef479227
Merge pull request #2469 from AGSaidi/acq-rel-2
...
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
2020-03-06 14:28:58 +01:00
Ali Saidi
208c7e7ca5
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
...
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
chenxuqiang
32c847df45
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
...
Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
2020-03-06 01:02:02 -05:00
shengyang
e0df9485d4
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
...
modified: benchmark/Makefile
new file: benchmark/rotm.c
2020-03-05 10:05:59 +08:00
s00527847
0f1a2b12f9
add benchmark for spr/spr2
2020-03-04 15:50:19 -05:00
q00437336
233838b4bc
change clock to CLOCK_PROCESS_CPUTIME_ID
2020-03-04 03:54:40 -05:00
l00546269
13f9afbd99
[OpenBLAS]:modifed the Makefile
...
[Description]:add c/fortran compiler version information in final note
2020-03-04 16:47:23 +08:00
q00437336
de74e11641
add benchmark for trsv
2020-03-04 03:23:22 -05:00
Martin Kroeker
ad9e53154d
Merge pull request #2484 from RajalakshmiSR/power-dynamic
...
Fix DYNAMIC_ARCH build for POWER9
2020-03-04 08:06:06 +01:00
Martin Kroeker
6e70621b0d
Merge pull request #2483 from aaawuanjun/develop
...
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-04 07:59:56 +01:00
Martin Kroeker
e6edb7431f
Merge pull request #2466 from AGSaidi/acq-rel-1
...
Switch blas_server to use acq/rel semantics
2020-03-04 07:59:31 +01:00
Darkness303
114dbec947
1.Add syr2 benchmark
...
2.Fixed some errors
2020-03-04 14:09:10 +08:00
Martin Kroeker
d68e4ba59b
Fix cut/paste glitch
2020-03-03 21:37:48 +01:00
Martin Kroeker
635c9e4e09
Restore initializers for mutex and conditional
2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan
2afc074803
Fix DYNAMIC_ARCH build for POWER9
...
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
2020-03-03 12:35:10 -06:00
wuanjun 00447568
5d6c688a7e
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:57 +08:00
wuanjun 00447568
87baf9cfe6
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 19:03:28 +08:00
wuanjun 00447568
c0ca7d6258
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
2020-03-03 17:39:26 +08:00
wuanjun 00447568
f682d19ed4
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:37:33 +08:00
wuanjun 00447568
790d50fbba
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
2020-03-03 17:13:49 +08:00
Martin Kroeker
59243d49ab
Merge pull request #2479 from Darkness303/develop
...
Fix potential index overflows at large matrix sizes in the benchmark codes
2020-03-03 08:46:49 +01:00
Martin Kroeker
d41f83e128
Merge pull request #2436 from marxin/improve-utest-coverage
...
Improve test coverage for utests.
2020-03-03 08:43:00 +01:00
Martin Kroeker
ee4ca7ca6b
Merge pull request #2481 from ChinouneMehdi/fix2480
...
Fix #2480
2020-03-02 21:21:29 +01:00
Martin Kroeker
e326c89ae8
Merge pull request #2478 from MacChen02/develop
...
Update benchmark statistical time function
2020-03-02 21:20:51 +01:00
مهدي شينون (Mehdi Chinoune)
21f6c4b5a9
fixes #2480
2020-03-02 17:22:28 +01:00
Martin Liska
7ca4ffdbdd
Improve test coverage for utests.
2020-03-02 13:38:17 +01:00
jianghesong
0f65c05cd1
fix core dumped error
2020-03-02 19:13:45 +08:00
MacChen02
917d243580
Update benchmark statistical time function
...
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
2020-03-02 14:36:27 +08:00
Ali Saidi
43c2e845ab
Switch blas_server to use acq/rel semantics
...
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
2020-03-02 02:52:49 +00:00
Martin Kroeker
33f76a6c37
Version 0.3.9
2020-03-02 00:10:20 +01:00
Martin Kroeker
960dec234f
Version 0.3.9
2020-03-02 00:09:49 +01:00
Martin Kroeker
6b92979f35
Merge pull request #2476 from xianyi/develop
...
Update from develop in preparation for 0.3.9
2020-03-02 00:08:32 +01:00
Martin Kroeker
014fc13995
Merge pull request #2475 from martin-frbg/039changes
...
Update ChangeLog for 0.3.9
2020-03-02 00:04:26 +01:00
Martin Kroeker
f1e05676a0
Merge pull request #2474 from martin-frbg/p9be
...
Use POWER8 kernels on big-endian POWER9 for now
2020-03-02 00:04:08 +01:00
Martin Kroeker
d221c50f27
Add Ampere EMAG8180
2020-03-02 00:02:36 +01:00
Martin Kroeker
f14013da7f
Update with 0.3.9 changes
2020-03-02 00:01:22 +01:00
Martin Kroeker
4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now
2020-03-01 23:45:58 +01:00
Martin Kroeker
02d60c1563
Merge pull request #35 from xianyi/develop
...
rebase
2020-03-01 23:44:10 +01:00
Martin Kroeker
2e6963259b
Merge pull request #2471 from AGSaidi/l3-fix-2
...
Fix barriers in level3_thread
2020-03-01 19:41:07 +01:00
Martin Kroeker
e94590e400
Merge pull request #2468 from AGSaidi/wfe
...
Use wait-for-event to not spin in the blas_lock
2020-03-01 19:40:46 +01:00
Martin Kroeker
69d4687142
Merge pull request #2464 from Darkness303/develop
...
Add syr benchmark
2020-03-01 13:02:34 +01:00
Martin Kroeker
a731a9bbb9
Merge pull request #2467 from AGSaidi/rpcc
...
Make rpcc() on arm64 get closer to what x86 returns
2020-02-29 22:43:02 +01:00
Martin Kroeker
1aa5907a2c
Merge pull request #2463 from martin-frbg/mingwfix
...
Apply MinGW AVX512 compilation fix to fortran options as well
2020-02-29 19:08:03 +01:00
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
...
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
97ce6bbce2
Fix barriers in level3_thread
2020-02-29 17:45:17 +00:00
Martin Kroeker
a9aeb6745c
Merge pull request #2465 from AGSaidi/neoverse-n1
...
Add Neoverse-N1 core
2020-02-29 13:24:44 +01:00
Ali Saidi
0af9991cc9
Use wait-for-event to not spin in the blas_lock
2020-02-29 04:23:48 +00:00
Ali Saidi
19f3a4091c
Make rpcc() on arm64 get closer to what x86 returns
...
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
2020-02-29 04:23:22 +00:00
Ali Saidi
c623a965f9
Add Neoverse-N1 core
...
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
j00520245
e1062400c4
New add syr benchmark
2020-02-28 16:36:53 +08:00
Martin Kroeker
a66f4d80c8
Apply MinGW AVX512 compilation fix to fortran options as well
...
original issue was #1708 , I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
2020-02-27 23:09:40 +01:00
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
2020-02-27 22:26:15 +08:00
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
2020-02-27 22:25:19 +08:00
Martin Kroeker
430ee31e66
Merge pull request #2447 from martin-frbg/issue2446
...
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
2020-02-27 15:07:02 +01:00
Xianyi Zhang
265ab484c8
Change default RISC-V 64-bit corename to RISCV64_GENERIC
...
e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc
2020-02-27 14:46:15 +08:00
Xianyi Zhang
44020a42a4
Fixed compile bug for RV64.
2020-02-27 14:29:42 +08:00
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
2020-02-27 13:53:49 +08:00
Martin Kroeker
8164fd1328
Always assume server-class cpu count for TSV110 and EMAG8180
2020-02-26 22:19:57 +01:00
Martin Kroeker
531c6b96d6
Merge pull request #34 from xianyi/develop
...
rebase
2020-02-26 22:16:28 +01:00
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
2020-02-26 18:38:12 +08:00
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
2020-02-26 18:36:54 +08:00
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
...
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
2020-02-25 18:42:52 +01:00
Martin Kroeker
f8ec538c82
Add Ampere EMAG8180
2020-02-25 14:30:00 +01:00
Martin Kroeker
ca4f7dceff
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake
2020-02-24 20:23:18 +01:00
Martin Kroeker
1ddf9f1067
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake
2020-02-24 20:16:18 +01:00
Martin Kroeker
4c5fac5a2b
Typo fix
2020-02-24 20:15:04 +01:00
Martin Kroeker
320e2648cd
Add EMAG8180 to DYNAMIC_CORE list for ARM64
2020-02-24 19:23:46 +01:00
Martin Kroeker
9b732696c6
Add DYNAMIC_ARCH support for ARMV8 EMAG8180
2020-02-24 19:20:00 +01:00
Martin Kroeker
c9dcb3d4a4
Merge pull request #2443 from aaawuanjun/develop
...
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 13:14:51 +01:00
Martin Kroeker
3bb7f0138e
Merge pull request #2442 from martin-frbg/lapackpr390
...
Apply fix from Reference-LAPACK PR 390
2020-02-24 12:27:01 +01:00
wuanjun 00447568
c93ae92579
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
2020-02-24 11:23:39 +08:00
Martin Kroeker
87ac1ceb0b
Apply fix from Reference-LAPACK PR390, NaN not propagating
2020-02-23 22:40:40 +01:00
Martin Kroeker
9e40c080f2
Apply fix from Reference-LAPACK PR390, NaN not propagating
2020-02-23 22:39:01 +01:00
wjc404
903854c168
Add files via upload
2020-02-22 23:40:02 +08:00
wjc404
a2ff577a30
Update KERNEL.ZEN
2020-02-22 23:39:43 +08:00
wjc404
97a32cb0a5
Update KERNEL.HASWELL
2020-02-22 23:39:20 +08:00
wjc404
f1746e7284
Delete sgemm_kernel_8x4_haswell_2.c
2020-02-22 23:38:48 +08:00
wjc404
f6fcbd7906
Fix performance bug when LDC is a multiple of 1024
2020-02-22 23:37:45 +08:00
Martin Kroeker
1e8410f18c
Merge pull request #2441 from martin-frbg/ismin2
...
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
2020-02-22 11:21:03 +01:00
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
...
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:58:15 +01:00
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
...
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
2020-02-21 11:55:52 +01:00
Martin Kroeker
75577f95a7
Merge pull request #33 from xianyi/develop
...
rebase
2020-02-21 09:56:05 +01:00
Martin Kroeker
33d92c7a37
Merge pull request #2435 from martin-frbg/issue2433
...
Fix handling of ppc endianness
2020-02-21 00:01:58 +01:00
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
2020-02-19 19:00:28 +01:00
Martin Kroeker
71e5669c3e
Add preliminary support for EMAG8180 ARMV8 processor
2020-02-19 18:57:26 +01:00
Martin Kroeker
e8d82c01d4
Recognize Ampere EMAG8180
2020-02-19 18:49:13 +01:00
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
2020-02-19 18:09:54 +01:00
Martin Kroeker
76b2cec6ce
Get endianness into Makefile variable
2020-02-19 18:08:20 +01:00
Martin Kroeker
276c1791ea
Merge pull request #32 from xianyi/develop
...
rebase
2020-02-19 18:06:39 +01:00
Martin Kroeker
c5bbfd8fee
Merge pull request #2432 from isuruf/install_name
...
Fix install name on osx again
2020-02-19 08:14:28 +01:00
Isuru Fernando
130c1741e5
Fix install name on osx again
2020-02-18 10:22:49 -08:00
Martin Kroeker
8f782f0673
Merge pull request #2426 from zbeekman/nightly-homebrew-check
...
Nightly homebrew check
2020-02-18 12:09:15 +01:00
Martin Kroeker
6a517dcb6a
Merge pull request #2427 from martin-frbg/powermin
...
Fix ISMIN and ISMAX kernel choices for POWER8
2020-02-18 08:15:02 +01:00
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
...
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
2020-02-17 19:55:39 +01:00
Izaak Beekman
1a88c4ab26
Fix bottle upload problem & typo
2020-02-17 13:36:17 -05:00
Izaak Beekman
0b44802164
Test push & PRs only when workflow file changes
...
Also, add comments to clarify what the test is testing
2020-02-17 13:22:09 -05:00
Izaak Beekman
2c242b4cef
Add Github Action to build development branch nightly with Homebrew
2020-02-17 12:36:37 -05:00
Martin Kroeker
0bfb7336d2
Merge pull request #2424 from isuruf/osx
...
Fix building on osx
2020-02-17 17:00:08 +01:00
Martin Kroeker
403cde104e
Merge pull request #30 from xianyi/develop
...
rebase
2020-02-17 14:53:46 +01:00
Martin Kroeker
634f2bddda
Merge pull request #2414 from marxin/fix-iamax_sse-implementation
...
Fix iamax sse implementation and add utests
2020-02-17 14:50:18 +01:00
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
2020-02-17 09:01:53 +01:00
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
...
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
2020-02-17 09:01:53 +01:00
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
2020-02-17 09:01:53 +01:00
Martin Kroeker
79e201fbba
Merge pull request #2423 from xianyi/issue2419
...
Restore -march flag for Android builds
2020-02-17 07:24:02 +01:00
Isuru Fernando
4326dcb460
Pass CFLAGS from env to Makefile.prebuild and remove iOS hack
2020-02-16 15:13:01 -06:00
Martin Kroeker
e32f3b1447
Restore -march flag for Android builds
...
fixes #2419 - renewed discussion in #2112 suggests removal of the option was primarily aimed at non-Android builds
2020-02-16 17:32:13 +01:00
Martin Kroeker
d483e9270a
Update KERNEL.POWER8
2020-02-16 17:29:35 +01:00
Martin Kroeker
01834aee33
Merge pull request #29 from xianyi/develop
...
rebase
2020-02-16 17:28:10 +01:00
wjc404
b0558c11b9
Update param.h
2020-02-16 23:01:31 +08:00
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
2020-02-16 22:58:44 +08:00
wjc404
e3368cbf18
AVX512 STRMM kernel
2020-02-16 22:58:00 +08:00
Martin Kroeker
d92bd5be24
Update KERNEL.POWER8
2020-02-15 23:07:50 +01:00
Martin Kroeker
46e4b12946
Update KERNEL.POWER8
2020-02-15 23:06:51 +01:00
Martin Kroeker
5e94aa4877
Merge pull request #2417 from marxin/make-ctest-verbose-for-drone
...
Make ctest verbose for drone
2020-02-15 21:57:41 +01:00
Martin Kroeker
93f3e27574
Merge pull request #2415 from marxin/add-cmake-to-gitignore
...
Add CMake related files to .gitignore.
2020-02-15 21:57:03 +01:00
Martin Kroeker
785c389b0e
Merge pull request #2420 from martin-frbg/issue2396
...
Correct generation of GETRF files by the CMAKE build
2020-02-15 21:56:16 +01:00
Martin Kroeker
c222b25b81
Correct generation of GETRF files by the CMAKE build
...
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
221da8bf05
Merge pull request #2411 from martin-frbg/fix2254-038
...
Fix pre-processed POWER8 codes and wrong conditionals in the POWER8,PPC440 and PPC970 KERNEL files
2020-02-14 23:07:43 +01:00
Martin Liska
eb285b4d20
Make ctest verbose for drone builder.
2020-02-14 10:46:37 +01:00
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
2020-02-13 22:44:09 +01:00
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
2020-02-13 21:24:54 +01:00
Martin Kroeker
486c35c5dc
Update icamin_power8.S
2020-02-13 18:38:43 +01:00
Martin Liska
0e05ea9bac
Add CMake related files to .gitignore.
2020-02-13 14:51:55 +01:00
Martin Kroeker
5ba3699f41
Update isamin_power8.S
2020-02-13 00:00:32 +01:00
Martin Kroeker
8eefa530cd
Update isamax_power8.S
2020-02-12 23:59:50 +01:00
Martin Kroeker
de40d47edf
Update isamin_power8.S
2020-02-12 23:57:48 +01:00
Martin Kroeker
7c162b8a21
Update isamax_power8.S
2020-02-12 23:56:57 +01:00
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
2020-02-12 20:00:29 +01:00
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
2020-02-12 19:58:42 +01:00
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
2020-02-12 19:56:52 +01:00
Martin Kroeker
616921fd91
Merge pull request #27 from xianyi/develop
...
rebase
2020-02-12 19:16:14 +01:00
Martin Kroeker
8a9e9a82a1
Merge pull request #2410 from bartoldeman/fix-dscal-inline-asm
...
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
2020-02-12 15:38:37 +01:00
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
...
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
2020-02-12 14:11:44 +00:00
Martin Kroeker
cb6ef49857
Merge pull request #2407 from susilehtola/patch-2
...
Patch out instances of Z15 in dynamic_zarch.c
2020-02-11 13:04:44 +01:00
Martin Kroeker
63994e1cdb
Merge pull request #2405 from susilehtola/patch-1
...
Fix typo in dynamic_zarch.c
2020-02-11 13:03:35 +01:00
Martin Kroeker
496e3019bc
Merge pull request #2404 from martin-frbg/issue2395
...
Fix spurious application of USE_TRMM in cmake builds
2020-02-11 13:00:36 +01:00
Martin Kroeker
169be3f097
Merge pull request #2403 from martin-frbg/issue2400
...
Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont
2020-02-11 13:00:16 +01:00
Martin Kroeker
6ccbb089c2
Merge pull request #2402 from gxw-loongson/develop
...
Avoid printing the following information on mips and mips64 when check msa
2020-02-11 12:59:53 +01:00
Martin Kroeker
59ebe3636a
Merge pull request #2399 from martin-frbg/buffersize
...
Make BUFFER_SIZE configurable at build time
2020-02-11 12:56:56 +01:00
Susi Lehtola
5a6bba3061
Patch out instances of Z15 in dynamic_zarch.c
...
There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.
2020-02-11 15:07:33 +13:00
Susi Lehtola
dff173e50e
Fix typo in dynamic_zarch.c
2020-02-11 14:46:30 +13:00
Martin Kroeker
7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM
2020-02-10 21:17:39 +01:00
Martin Kroeker
303bdb673b
Fix coretype detection for Intel extended models 6 and 7
...
affecting Goldmont, Cannon Lake, Ice Lake autodetection
2020-02-10 19:17:32 +01:00
gxw
754433f420
Avoid printing the following information on mips and mips64 when check msa:
...
"unrecognized command line option ‘-mmsa’"
2020-02-10 19:11:45 +08:00
Martin Kroeker
7f0d523b42
Make BUFFER_SIZE configurable
2020-02-09 23:32:57 +01:00
Martin Kroeker
c353d8b106
Make BUFFER_SIZE configurable
2020-02-09 23:30:22 +01:00
Martin Kroeker
579be3aa9d
Add configuration option for BUFFER_SIZE
2020-02-09 23:28:04 +01:00
Martin Kroeker
449e8ea443
Merge pull request #26 from xianyi/develop
...
rebase
2020-02-09 23:23:55 +01:00
Martin Kroeker
3bec250cf9
Increment version to 0.3.9.dev
2020-02-09 23:18:44 +01:00
Martin Kroeker
f03dd23e90
Increment version to 0.3.9.dev
2020-02-09 23:18:07 +01:00
Martin Kroeker
fb5eb47558
Merge pull request #2398 from xianyi/develop
...
Update from develop in preparation of the 0.3.8 release
2020-02-09 23:16:28 +01:00
Martin Kroeker
fa93d63365
Merge branch 'release-0.3.0' into develop
2020-02-09 23:16:06 +01:00
Martin Kroeker
90e6c66a57
Merge pull request #2397 from martin-frbg/038changes
...
Update Changelog with changes from 0.3.8
2020-02-09 23:01:52 +01:00
Martin Kroeker
32d97330b3
Update with changes from 0.3.8
2020-02-09 23:00:36 +01:00
Martin Kroeker
29eaf4b6d7
Merge pull request #25 from xianyi/develop
...
rebase
2020-02-09 22:48:15 +01:00
Martin Kroeker
47c1bf7f4d
typo fixes
2020-02-09 01:06:40 +01:00
Martin Kroeker
2b55f0ad30
Merge pull request #2393 from martin-frbg/issue2388
...
Provide more documentation in README.md
2020-02-09 01:00:33 +01:00
Martin Kroeker
a5b32ab06c
Merge pull request #2390 from martin-frbg/pgi
...
Small corrections for compilation with PGI compilers
2020-02-09 00:13:40 +01:00
Martin Kroeker
50545b19d0
Update CPU and OS support and document DYNAMIC_ARCH option in README.md
...
prompted by #2388
2020-02-09 00:06:07 +01:00
Martin Kroeker
b3cbd60d7a
Remove PGI from list again as it is actually still not capable
2020-02-08 10:20:13 +01:00
Martin Kroeker
70199d1905
Merge pull request #2389 from Zeyiii/develop
...
Fix bugs in benchmark of gemv
2020-02-07 16:05:46 +01:00
Martin Kroeker
cfe63d8cc2
Remove OpenMP libraries from link list
2020-02-07 16:03:51 +01:00
Martin Kroeker
d55b10830f
Remove OpenMP libraries from link list
2020-02-07 16:02:17 +01:00
Martin Kroeker
c1c10cbb21
Merge pull request #2384 from wjc404/develop
...
Optimize AVX512 DGEMM (& DTRMM)
2020-02-07 13:47:12 +01:00
Martin Kroeker
5989841524
Add PGI to avx512-supporting compilers
2020-02-07 13:01:31 +01:00
Martin Kroeker
68a43db358
Fix utest compilation with PGI
2020-02-07 10:15:18 +01:00
Martin Kroeker
9694037b23
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test
2020-02-07 10:09:25 +01:00
Martin Kroeker
71faa1c1a7
Merge pull request #24 from xianyi/develop
...
rebase
2020-02-07 10:03:02 +01:00
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 02:14:10 +00:00
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
2020-02-06 01:47:46 +00:00
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 01:46:36 +00:00
w00421467
ce9ea8f826
Fix another branch
2020-02-05 15:07:18 +08:00
w00421467
0b909203cb
Fix bugs in benchmark of gemv
2020-02-05 14:53:37 +08:00
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
2020-02-05 13:36:57 +08:00
wjc404
2f96a2c55b
Update trmm_R.c
2020-02-05 10:15:02 +08:00
wjc404
833bd0f8ff
Update trmm_L.c
2020-02-05 10:09:41 +08:00
wjc404
77b8f49556
Update level3_thread.c
2020-02-04 20:33:08 +08:00
wjc404
1c3e20ce48
Update level3.c
2020-02-04 20:30:23 +08:00
wjc404
83b6be7976
Update param.h
2020-02-04 19:55:26 +08:00
wjc404
081b188529
Update KERNEL.SKYLAKEX
2020-02-03 21:38:08 +08:00
wjc404
f3f969f681
Update param.h
2020-02-03 21:34:12 +08:00
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
2020-02-03 21:32:56 +08:00
Martin Kroeker
8d2a796f49
Merge pull request #2378 from martin-frbg/issue2377
...
Add -march option for AVX512 in cmake as well
2020-01-30 17:07:19 +01:00
Martin Kroeker
8dc9fd4dfe
Add -march option for AVX512
2020-01-30 12:41:18 +01:00
Martin Kroeker
abc67bdd74
Merge pull request #2375 from ewanglong/master
...
fix a few performance drop in some matrix size per data type
2020-01-30 10:27:29 +01:00
Martin Kroeker
1f62a82789
Merge pull request #2376 from wjc404/develop
...
Fix remaining bugs in parallel GEMM3M
2020-01-23 21:50:19 +01:00
wjc404
e9fb8f62b1
Update level3_gemm3m_thread.c
2020-01-22 17:40:03 +00:00
Wang,Long
fbf4f48f4a
fix a few performance drop in some matrix size per data type
...
Signed-off-by: Wang,Long <long1.wang@intel.com >
2020-01-22 15:15:04 +00:00
Martin Kroeker
b9ad450295
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
...
Optimize genenal Gemm Beta
2020-01-21 15:05:38 +01:00
Martin Kroeker
e011ad820a
Merge pull request #2372 from martin-frbg/winexit
...
Do not run any cleanup if the program is exiting anyway
2020-01-21 14:56:45 +01:00
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
2020-01-20 11:49:42 +08:00
Martin Kroeker
23f322f997
Do not run any cleanup if the program is exiting anyway
...
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
2020-01-19 13:28:27 +01:00
Martin Kroeker
093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
...
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
2020-01-18 20:39:34 +01:00
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
...
fix error messages
2020-01-18 20:39:04 +01:00
Martin Kroeker
78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
...
as suggested by hjmndv in #2370
2020-01-18 15:06:39 +01:00
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
...
Improve paralleled SGEMM performance on SKYLAKEX CPUs
2020-01-15 21:13:43 +01:00
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
2020-01-13 16:59:23 +08:00
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
2020-01-13 16:58:54 +08:00
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
...
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
wjc404
1c67567008
improve skylakex paralleled sgemm performance
2020-01-13 16:26:03 +08:00
Martin Kroeker
4e979bf75b
Merge pull request #2366 from martin-frbg/install390
...
Add new file lapack.h from LAPACK 3.9.0 to installable headers
2020-01-13 09:00:21 +01:00
Martin Kroeker
daa4310db5
Install new lapack.h
...
new file in LAPACK 3.9.0, split off from lapacke.h
2020-01-12 22:00:50 +01:00
Martin Kroeker
b8f3605132
Merge pull request #23 from xianyi/develop
...
rebase
2020-01-12 21:57:23 +01:00
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
...
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
2020-01-09 13:48:41 +08:00
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
...
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
2020-01-07 11:22:46 +08:00
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
2020-01-06 20:11:36 +08:00
wjc404
9f5cdc49d4
Update CONTRIBUTORS.md
2020-01-06 12:28:43 +08:00
wjc404
b7b408a120
optimize AVX2 SGEMM
2020-01-06 12:16:09 +08:00
wjc404
92b10212de
optimize AVX2 SGEMM
2020-01-06 12:11:21 +08:00
wjc404
b73bf01378
optimize AVX2 SGEMM
2020-01-06 12:09:14 +08:00
wjc404
eb3c9f1db9
optimize AVX2 SGEMM
2020-01-06 12:07:02 +08:00
Martin Kroeker
fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
...
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
2020-01-03 15:03:30 +01:00
Martin Kroeker
2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
...
from Reference-LAPACK PR 330
2020-01-03 11:10:00 +01:00
Martin Kroeker
fbb894948c
Merge pull request #22 from xianyi/develop
...
rebase
2020-01-03 10:23:25 +01:00
Martin Kroeker
e711659c90
Merge pull request #2358 from shengyang-3390/develop
...
Test all 7 declared values of N in float and double cblas3 tests
2020-01-03 09:02:03 +01:00
shengyang
893e6e57c4
modified: ctest/din3 ctest/sin3
2020-01-03 10:03:33 +08:00
Martin Kroeker
456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
...
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
Martin Kroeker
9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
...
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
2020-01-02 22:27:44 +01:00
shengyang
80db5f11e1
update
2020-01-02 11:01:57 +08:00
chenxuqiang
52de4cc8fd
kernel/arm64/dgemm_beta.S: add beta == zero branch
...
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com >
2020-01-01 21:50:45 -05:00
Martin Kroeker
44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
...
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker
86ab939936
Merge pull request #2354 from ZuoQ3/develop
...
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00
Martin Kroeker
375b1875c8
[WIP] Update LAPACK to 3.9.0 ( #2353 )
...
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
...
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
...
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
...
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
...
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
shengyang
8729db117c
modified: ctest/din3
...
modified: ctest/sin3
2019-12-31 15:59:52 +08:00
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
2019-12-31 10:42:03 +08:00
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
2019-12-31 10:21:23 +08:00
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
2019-12-31 10:13:24 +08:00
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
2019-12-30 17:33:42 +08:00
wjc404
700fe5b5ee
Add files via upload
2019-12-30 17:18:59 +08:00
wjc404
bb2729c855
Update CONTRIBUTORS.md
2019-12-30 16:11:37 +08:00
wjc404
aae44d040d
Update CONTRIBUTORS.md
2019-12-30 16:10:08 +08:00
wjc404
6362c34ee6
Update param.h
2019-12-30 16:08:19 +08:00
wjc404
f60840c420
Update KERNEL.ZEN
2019-12-30 16:04:23 +08:00
wjc404
109e18cd96
Update KERNEL.HASWELL
2019-12-30 16:03:24 +08:00
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
2019-12-30 16:02:51 +08:00
w00421467
3ccf8885ac
prefetching for dgemm_beta
2019-12-30 11:45:49 +08:00
Martin Kroeker
454847588e
Update LAPACK to 3.9.0
2019-12-29 21:27:18 +01:00
Martin Kroeker
0257f26488
Merge pull request #21 from xianyi/develop
...
rebase
2019-12-29 18:08:55 +01:00
Martin Kroeker
c45b7aef14
Merge pull request #2348 from wjc404/develop
...
AVX2 CGEMM3M kernel
2019-12-28 20:07:56 +01:00
wjc404
312060d0d6
Update CONTRIBUTORS.md
2019-12-27 23:36:13 +08:00
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:23:29 +08:00
wjc404
64639f440f
Update param.h
2019-12-27 18:06:42 +08:00
wjc404
3a66c8cac1
Update KERNEL.ZEN
2019-12-27 18:04:08 +08:00
wjc404
4c35b8dbaa
Update gemm3m_level3.c
2019-12-27 18:03:01 +08:00
wjc404
ed9af2f7da
Update KERNEL.HASWELL
2019-12-27 18:01:38 +08:00
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:00:55 +08:00
Martin Kroeker
26478eb0d0
Merge pull request #2345 from wjc404/develop
...
Optimize AVX2 CGEMM
2019-12-25 22:26:41 +01:00
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
2019-12-24 00:40:16 +08:00
wjc404
3ce6bcdb5f
Update CONTRIBUTORS.md
2019-12-24 00:30:16 +08:00
wjc404
6fbe51072b
Update CONTRIBUTORS.md
2019-12-24 00:24:40 +08:00
wjc404
611445c7f8
Update param.h
2019-12-23 23:44:55 +08:00
wjc404
2cd9306bb5
Update KERNEL.ZEN
2019-12-23 23:42:30 +08:00
wjc404
c418c81224
Update KERNEL.HASWELL
2019-12-23 23:41:44 +08:00
wjc404
025741f16a
Fast Haswell CGEMM kernel
2019-12-23 23:40:03 +08:00
Martin Kroeker
0ae49d2990
Merge pull request #2344 from wjc404/develop
...
Optimize AVX2 ZGEMM
2019-12-21 12:16:55 +01:00
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
2019-12-21 14:38:51 +08:00
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
2019-12-21 14:37:06 +08:00
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
2019-12-21 14:35:15 +08:00
Martin Kroeker
31d6c2eb7d
Merge pull request #2340 from Zeyiii/develop
...
[WIP] Use arm neon instructions to optimize gemm beta operation
2019-12-20 08:38:57 +01:00
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
2019-12-20 10:11:50 +08:00
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
2019-12-17 10:00:13 +08:00
Martin Kroeker
445ca2f418
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
...
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan
13226e3101
driver: more reasonable thread wait timeout on Windows.
...
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.
Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
...
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
...
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429
Support two-digit version numbers in gcc version check
...
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7
AIX changes for Power8
2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC ( #2332 )
...
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat
3938e59569
AIX changes for Power8
2019-12-04 00:23:46 -06:00
Martin Kroeker
9d5079008f
Merge pull request #2334 from martin-frbg/fix2228
...
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker
3518617f5b
Add Intel Goldmont+ cpuid
...
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00
Martin Kroeker
715f4650d9
Delete stray copy of dynamic.c from PR 2228
2019-12-03 08:24:10 +01:00
Martin Kroeker
10705183ce
Merge pull request #20 from xianyi/develop
...
Rebase
2019-12-03 08:22:40 +01:00
Martin Kroeker
235599f17a
Merge pull request #2329 from isuruf/patch-1
...
Workaround an ICE in clang 9.0.0
2019-12-02 08:30:43 +01:00
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
...
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker
dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
...
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker
f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
...
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker
dedd822d1a
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:56:57 +01:00
Martin Kroeker
2181fb7047
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:54:15 +01:00
Martin Kroeker
a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x
2019-11-29 23:49:50 +01:00
Martin Kroeker
97762234f9
Add variable for gcc >=9 test
...
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
Martin Kroeker
948d11fc51
Merge pull request #19 from xianyi/develop
...
rebase
2019-11-29 23:44:09 +01:00
Martin Kroeker
c815b8fb85
Merge pull request #2323 from wjc404/develop
...
some optimizations of AVX512 DGEMM
2019-11-28 20:55:16 +01:00
wjc404
e20709e976
Update param.h
2019-11-28 19:57:50 +08:00
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
2019-11-28 19:56:35 +08:00
Martin Kroeker
a4c3668f99
Merge pull request #2321 from martin-frbg/issue2319
...
Fix race conditions in multithreaded GEMM3M
2019-11-28 09:30:24 +01:00
Martin Kroeker
867232c6a4
Merge pull request #2327 from martin-frbg/travisosx
...
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
2019-11-28 08:43:45 +01:00
Martin Kroeker
5aaf70ef95
Merge pull request #2326 from xianyi/revert-2325-travisosx
...
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:17:19 +01:00
Martin Kroeker
ae2a0995cc
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker
83dae28ae2
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:09:06 +01:00
Martin Kroeker
da986d2e83
Merge pull request #2325 from martin-frbg/travisosx
...
Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now
2019-11-27 21:59:36 +01:00
Martin Kroeker
6bc487de35
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-27 15:10:57 +01:00
Anton Blanchard
cf2a8e410c
Fix SEGV in cdot_power9
...
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00
wjc404
eb1e9c8c92
some optimizations
2019-11-26 14:12:20 +08:00
Martin Kroeker
f95989cbc1
Fix AVX512 capability test (always returning zero)
...
from #2322
2019-11-23 22:38:07 +01:00
Martin Kroeker
f3065a0eed
Fix race conditions in multithreaded GEMM3M
...
by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago
2019-11-23 19:54:56 +01:00
Martin Kroeker
04226f1e97
Add the cpuid of the business/rackmount version of z15 as well
2019-11-21 18:14:29 +01:00
Martin Kroeker
0925ef70db
Merge pull request #2316 from sharkcz/s390x
...
zarch: treat z15 as z14 instead of generic
2019-11-21 18:03:00 +01:00
Martin Kroeker
371e6f73d4
Merge pull request #2317 from aarnez/develop
...
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
2019-11-21 17:59:21 +01:00
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
...
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
2019-11-21 13:49:13 +01:00
Dan Horák
883c39773a
zarch: treat z15 as z14 instead of generic
...
Signed-off-by: Dan Horák <dan@danny.cz >
2019-11-21 12:53:23 +01:00
Martin Kroeker
b09b5be0a4
Merge pull request #2315 from ewanglong/develop
...
revised fix windows compatible for #2313
2019-11-21 05:06:44 +01:00
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
...
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-21 10:22:58 +08:00
Martin Kroeker
3da6d66da9
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
...
Fix usage of TerminateThread() causing critical section corruption.
2019-11-20 16:16:35 +01:00
Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
...
Further Power8 big-endian corrections
2019-11-20 15:12:06 +01:00
Martin Kroeker
63d3ee8dfc
Merge pull request #2313 from ewanglong/develop
...
Fix the integer overflow issue for large matrix size
2019-11-20 14:49:15 +01:00
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
...
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-20 21:30:47 +08:00
Jehan
1f6071590d
Fix usage of TerminateThread() causing critical section corruption.
...
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:
First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread )
Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).
And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
2019-11-20 13:00:49 +01:00
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
...
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com >
2019-11-20 14:11:17 +08:00
Martin Kroeker
73128f3883
Merge pull request #2310 from martin-frbg/ppc440
...
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
2019-11-17 23:19:48 +01:00
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
2019-11-17 23:12:10 +01:00
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
2019-11-17 22:58:32 +01:00
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
2019-11-17 19:25:08 +01:00
Martin Kroeker
82b75f97e5
Disable the old QCDOC qalloc by default and copy utility functions from memory.c
...
1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.
2019-11-17 19:22:04 +01:00
Martin Kroeker
7887c45077
Merge pull request #17 from xianyi/develop
...
rebase
2019-11-17 19:09:49 +01:00
Martin Kroeker
3e67017ac8
Merge pull request #2309 from martin-frbg/ppc970-be
...
Fix PPC970 big-endian support
2019-11-17 18:22:24 +01:00
Martin Kroeker
b3ac6ee222
Define alternate kernels for big-endian PPC970
...
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
2019-11-17 15:19:39 +01:00
Martin Kroeker
6082e556cd
Use "generic" S/CGEMM unroll M on big-endian PPC970
...
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
2019-11-17 15:10:26 +01:00
Martin Kroeker
92315173d5
Merge pull request #2308 from martin-frbg/ctestfix
...
Fix potential issue in the c/z blas3 ctests
2019-11-15 08:33:17 +01:00
Martin Kroeker
351d12b94e
Fix potential spurious failure from uninitialized variable
2019-11-15 00:20:36 +01:00
Martin Kroeker
bf73aa141b
Fix potential spurious failure from uninitialized variable
2019-11-15 00:19:24 +01:00
Martin Kroeker
71e96163db
Merge pull request #2305 from wjc404/develop
...
AVX512 CGEMM & ZGEMM kernels
2019-11-12 07:38:37 +01:00
wjc404
819e852ae7
AVX512 CGEMM & ZGEMM kernels
...
96-99% 1-thread performance of MKL2018
2019-11-11 20:04:52 +08:00
Martin Kroeker
4e466d739c
Merge pull request #15 from xianyi/develop
...
rebase
2019-11-09 18:52:08 +01:00
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
...
Optimize SGEMM on SKYLAKEX CPUs
2019-11-06 07:27:33 +01:00
wjc404
836c414e22
optimizations of software prefetching
2019-11-05 13:36:56 +08:00
Martin Kroeker
d403eb3c2f
Merge pull request #2302 from martin-frbg/ppc970
...
Disable three-operand DCBT on PPC970 regardless of operating system
2019-11-04 22:55:05 +01:00
Martin Kroeker
3cd97f1a80
Merge pull request #2301 from martin-frbg/ppc8be
...
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
2019-11-04 22:54:28 +01:00
Martin Kroeker
9955f0996f
Merge pull request #2294 from martin-frbg/ios-cleanup
...
Remove obsolete workarounds for IOS on ARMV8
2019-11-04 22:53:58 +01:00
wjc404
430c11e135
Add files via upload
2019-11-04 20:10:12 +08:00
wjc404
fbacd2605d
optimizations via software prefetches
2019-11-04 19:37:19 +08:00
Martin Kroeker
6fa89b06a1
Use the two-operand form of DCBT on all PPC970 regardless of OS
...
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well
2019-11-03 22:55:31 +01:00
Martin Kroeker
68597002ea
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:42:46 +01:00
Martin Kroeker
d2a6285549
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:41:19 +01:00
Martin Kroeker
d999688d1a
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:39:06 +01:00
Martin Kroeker
928fe1b28e
The assembly microkernel is not safe to use on ELFv1
2019-11-03 22:37:27 +01:00
Martin Kroeker
ccc28c6d60
Merge pull request #13 from xianyi/develop
...
resync with upstream
2019-11-03 22:33:31 +01:00
wjc404
ae43b75a6a
Add files via upload
2019-11-02 10:09:19 +08:00
wjc404
54fc06fd70
Add files via upload
2019-11-02 10:06:13 +08:00
wjc404
1df9a2013d
new sgemm kernel for skylakex
2019-11-02 00:00:48 +08:00
wjc404
274ff5cdb8
update sgemm_q on skylakex cpus
2019-11-01 23:59:18 +08:00
Martin Kroeker
eb2eddf241
Merge pull request #2296 from kdunee/develop
...
Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty
2019-10-28 13:24:18 +01:00
k.dunikowski
8691825944
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty
2019-10-28 08:51:05 +01:00
Martin Kroeker
7dc8a76f60
Merge pull request #2293 from martin-frbg/pr2288
...
Add support for NetBSD by adding it to the existing xBSD conditionals
2019-10-25 23:46:39 +02:00
Martin Kroeker
df857551c0
Remove special parameter set for obsolete IOS/ARMV8 workaround
2019-10-25 23:07:00 +02:00
Martin Kroeker
85ccdce8c4
Remove the IOS fallbacks to generic C kernels
2019-10-25 23:02:37 +02:00
Martin Kroeker
aeabe0a83f
Fix regex to parse -R options with and without whitespace
...
Both forms are seen on NetBSD (#2288 )
2019-10-25 22:52:30 +02:00
Martin Kroeker
1b90989662
Add NetBSD to the xBSD conditionals
2019-10-25 12:52:49 +02:00
Martin Kroeker
e3e8b5cdca
Add NetBSD
2019-10-25 12:51:06 +02:00
Martin Kroeker
69b16a894d
Merge pull request #2292 from martin-frbg/g95fixes
...
Improve support for g95 and non-GNU ld
2019-10-25 10:35:17 +02:00
Martin Kroeker
6782e5767d
Merge pull request #2291 from martin-frbg/gensymbol
...
Fix netlib 3.7/3.8 function enumeration for linktest
2019-10-25 10:34:50 +02:00
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
...
Optimize RPCC function on ARM64
2019-10-25 09:56:30 +02:00
Martin Kroeker
4ae1610f37
Merge pull request #2290 from martin-frbg/cpuidfixes
...
Fixup x86 cpuid changes from #2283
2019-10-24 22:52:15 +02:00
Martin Kroeker
911c3e2f4b
Improve support for g95 and non-GNU ld
...
Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check
2019-10-24 22:43:27 +02:00
Martin Kroeker
fab49e49e5
Move most lapack 3.7/3.8 additions to the embedded_underscores list
...
to allow linktest to pass with a compiler that adds a second underscore to such names
2019-10-24 21:26:20 +02:00
Martin Kroeker
b687fba5bc
Disable direct clock register access on IOS and Android
...
as I find conflicting information on accessibility from non-priviledged processes
2019-10-24 21:18:17 +02:00
luzpaz
46a8c2519a
Remove prototype of unused, unimplemented function ( #2274 )
...
* Fix source typo
Found via `codespell -q 3 -L amin,als,ba,dum,mone,nd,nto,orign -S Changelog.txt,./lapack*`
* Remove beta-thread function per request
2019-10-24 18:56:53 +02:00
Martin Kroeker
e9437eebd2
Restore Goldmont ID and improve QEMU support
...
#2283 had inadvertently removed Goldmont+, and cpuid was reporting a mix of Core2 and Pentium2 for some QEMU configurations
2019-10-24 18:45:27 +02:00
Martin Kroeker
3a39062cfc
Merge pull request #12 from xianyi/develop
...
resync with upstream
2019-10-24 18:40:13 +02:00
Martin Kroeker
eaa0be1313
Merge pull request #2286 from wjc404/develop
...
AVX512 DGEMM kernel
2019-10-20 12:44:19 +02:00
wjc404
6ff013bae0
native support for icopy_4
...
90% MKL 1-thread performance.
2019-10-19 03:54:44 +08:00
wjc404
0d669e04bb
Update dgemm_kernel_8x8_skylakex.c
2019-10-18 15:00:17 +08:00
wjc404
17cdd9f9e1
some correction
2019-10-18 14:58:07 +08:00
wjc404
6bcb06fcb1
make further changes to icopy_8 easier
2019-10-18 10:47:31 +08:00
wjc404
b7315f8401
Add files via upload
2019-10-16 19:23:36 +08:00
wjc404
9b19e9e1b0
Update dgemm_kernel_8x8_skylakex.c
2019-10-16 10:14:51 +08:00
wjc404
6bd67ddbab
Update dgemm_kernel_8x8_skylakex.c
2019-10-16 03:20:08 +08:00
wjc404
5da9484d93
Add files via upload
2019-10-16 02:01:13 +08:00
wjc404
844629af57
Add files via upload
2019-10-16 02:00:34 +08:00
Martin Kroeker
2beaa82c05
Merge pull request #2283 from martin-frbg/issue2176
...
Support QEMU virtual cpu in 64bit mode as CORE2 or BARCELONA
2019-10-09 22:06:09 +02:00
Martin Kroeker
e8a2aed2b9
Support QEMU cpu calling itself 64bit AMD Athlon as well
...
Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.
2019-10-09 18:24:13 +02:00
Martin Kroeker
f262031685
Support QEMU virtual cpu as CORE2
...
qemu itself claims it is a 64bit P6, which does not exist in the wild.
2019-10-08 22:30:02 +02:00
Martin Kroeker
5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it ( #2279 )
...
* Add automatic fixups for OSX/IOS cross-compilation
* Add OSX/IOS cross-compilation test to Travis CI
* Handle platforms that lack hwcap.h by falling back to ARMV8
* Fix PROLOGUE for OSX/IOS
2019-10-08 20:13:14 +02:00
Martin Kroeker
f2cde2ccfb
Update common_arm64.h
2019-10-08 20:12:08 +02:00
Martin Kroeker
ba7838d2e1
Merge pull request #2280 from martin-frbg/iosfix
...
Add overlooked part of IOS compilation fix
2019-10-08 10:25:25 +02:00
Martin Kroeker
a448884a63
Remove automatic label postfixes from macro included only once
2019-10-08 08:37:50 +02:00
Martin Kroeker
17609f88f1
Merge pull request #11 from xianyi/develop
...
sync with upstream
2019-10-08 08:32:52 +02:00
Martin Kroeker
3a2df19db6
Fix accidental duplication of jump instruction
2019-10-08 08:09:26 +02:00
Martin Kroeker
d2093a40d3
Merge pull request #2277 from martin-frbg/issue2275
...
Rewrite ARMV8 code to allow cross-compilation for IOS
2019-10-06 23:01:54 +02:00
Martin Kroeker
aa04b0925e
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
...
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-06 11:12:44 +02:00
Martin Kroeker
258ac56e0a
Move 32bit OSX build back to xcode 8.3 but switch to gcc8
2019-10-05 10:52:47 +02:00
Martin Kroeker
56837e9d92
Make local labels in macro compatible with the xcode assembler
...
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
2019-10-04 14:53:23 +02:00
Martin Kroeker
bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios
2019-10-04 14:50:03 +02:00
Martin Kroeker
32f5907fef
Update 32bit macOS again to xcode 9.3
...
os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source
2019-10-03 01:09:02 +02:00
Martin Kroeker
ac10236cc8
Update the OSX BINARY=32 test to xcode9.2
...
in response to Homebrew updates
2019-10-02 22:35:34 +02:00
Martin Kroeker
8617d75548
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-01 23:50:41 +02:00
Martin Kroeker
c07d78b9e9
Merge pull request #2272 from seberg/thread-sqrt-of-negative
...
Avoid taking root of negative number in symv_thread.c
2019-09-30 11:27:29 +02:00
Sebastian Berg
6355c25dde
Avoid taking root of negative number in symv_thread.c
...
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
2019-09-29 22:03:12 -07:00
Martin Kroeker
5e244d80f2
Merge pull request #2271 from quickwritereader/strmm_fix
...
fixed bug power9 strmm . BLAS-TESTER passes
2019-09-29 13:53:45 +02:00
AbdelRauf
ede5efebab
trmm fix
2019-09-29 02:28:34 +00:00
Martin Kroeker
84908d60d2
Merge pull request #2269 from martin-frbg/ppc-fixes
...
Ppc fixes
2019-09-27 09:52:19 +02:00
Martin Kroeker
596a22325a
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc
2019-09-27 00:47:18 +02:00
Martin Kroeker
7f58f3ad0e
Fix mis-edits in the gcc-derived power8 caxpy kernel
2019-09-27 00:44:26 +02:00
Martin Kroeker
c0d570a357
Merge pull request #7 from xianyi/develop
...
update
2019-09-27 00:42:32 +02:00
Martin Kroeker
6b83079368
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters ( #2267 )
...
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
...
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
2019-09-22 22:35:22 +02:00
Martin Kroeker
bfa2cc7d64
Restore ppc64 CI job and remove the travis_wait that caused the problem with it
2019-09-20 10:29:35 +02:00
Martin Kroeker
e7c4d6705a
Revert #2051 and replace with a better fix ( #2261 )
...
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Martin Kroeker
2a1911cc14
Merge pull request #6 from xianyi/develop
...
update to current develop
2019-09-13 14:00:23 +02:00
Martin Kroeker
9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
...
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel
2463938879
fix error message
2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c
more bugfix
2019-09-10 17:30:57 -04:00
Guillaume Horel
6cb47ea3f0
fix Makefile
2019-09-10 17:11:01 -04:00
Guillaume Horel
459bb9291d
fix error codes
2019-09-10 17:10:33 -04:00
Martin Kroeker
3f1077ce6f
Merge pull request #2249 from brada4/gcc7minor
...
Address minor warnings popping up in gcc7+
2019-09-10 08:27:32 +02:00
Martin Kroeker
eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds ( #2248 )
...
* Fix compiler identification and option setting
* Handle BINARY=32 option on X86_64
* Add xGEMM3M unroll parameters for crossbuild-target CORE2
* Replace bogus mingw64/32bit CI job with actual 32bit build
mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Guillaume Horel
f2becb777a
fix Makefile
2019-09-09 11:36:50 -04:00
Guillaume Horel
5997b6b491
bugfix
2019-09-08 11:14:49 -04:00
Guillaume Horel
4b21b646ea
turn on optimized code
2019-09-08 11:14:49 -04:00
Guillaume Horel
7ec7b999a5
add missing file
2019-09-08 11:14:49 -04:00
Guillaume Horel
af9ac0898a
fix Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
c7b5a459b6
add missing defines and headers
2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6
update Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
9f6984fe4b
add missing files
2019-09-08 11:14:49 -04:00
Guillaume Horel
42203dafdc
add logic
2019-09-08 11:14:49 -04:00
Guillaume Horel
a4f17a9297
add missing objects
2019-09-08 11:14:49 -04:00
Guillaume Horel
733d97b2df
add files
2019-09-08 11:14:49 -04:00
Guillaume Horel
ea747cf933
start working on ?trtrs
2019-09-08 11:14:49 -04:00
Andrew
4de545aa7d
address minor warnings from gcc7
2019-09-07 10:21:08 +03:00
Andrew
6e9a93ec19
init
2019-09-07 10:18:46 +03:00
Martin Kroeker
fde8a8e6a0
Improve cmake build behaviour with non-host cpu targets ( #2246 )
...
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker
256fc15f5f
Merge pull request #2 from xianyi/develop
...
update
2019-09-03 15:12:14 +02:00
Martin Kroeker
ee498525e0
Merge pull request #2242 from martin-frbg/issue2235
...
Add arch data for cmake cross-compiling to CORE2
2019-09-02 22:06:29 +02:00
Martin Kroeker
1fec0570f6
Add cgemm and zgemm unroll factors for core2
2019-09-02 15:03:45 +02:00
Martin Kroeker
b5af7b9c78
Disable ppc64le test environment on Travis CI
...
as this semi-official beta option has suddenly reverted to a standard x86_64 environment causing spurious failures
2019-08-31 18:06:12 +02:00
Martin Kroeker
f3c314550c
Merge pull request #2243 from quickwritereader/develop
...
possible cgemv,caxpy,cdot fix
2019-08-30 23:06:23 +02:00
AbdelRauf
847c20c9b7
fix uninitialized variables i
2019-08-30 11:14:55 +00:00
AbdelRauf
4c22828812
caxpy and cdot are using vec_vsx_ld
2019-08-30 04:09:15 +00:00
AbdelRauf
e79712d969
cgemv using vec_vsx_ld instead of letting gcc to decide
2019-08-30 02:52:04 +00:00
AbdelRauf
be09551cdf
aligned
2019-08-29 23:22:23 +00:00
Martin Kroeker
ec1ef6aa9e
Merge pull request #2241 from martin-frbg/zdotfix
...
Make x86_64 zdot compile with PGI and Sun C again
2019-08-29 07:12:54 +02:00
Martin Kroeker
11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
2019-08-28 18:07:44 +02:00
Martin Kroeker
bf0d92a310
Add arch data for cross-compiling to CORE2
...
for #2235
2019-08-28 17:35:56 +02:00
Martin Kroeker
db066151ee
Merge pull request #2240 from martin-frbg/issue2237
...
Fix PGI build options (again)
2019-08-28 15:30:53 +02:00
Martin Kroeker
3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again
...
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
2019-08-28 11:35:31 +02:00
Martin Kroeker
7d380f7d79
Fix PGI build options (again)
...
for #2237
2019-08-28 11:31:20 +02:00
Martin Kroeker
300f158d3b
Merge pull request #2239 from martin-frbg/issue2231
...
Fix 32bit armv8 compilation regression
2019-08-28 07:54:57 +02:00
Martin Kroeker
3635fdbf2b
Do not abuse the global ARCH variable as a local temporary
...
Setting it with a simple "uname -m" just to be able to decide whether to compile getarch.c with -march=native
may actually keep getarch from doing a proper probe. Fixes #2231 , a regression caused by #2110
2019-08-27 22:52:17 +02:00
Martin Kroeker
b6552b11eb
Merge pull request #2 from xianyi/develop
...
merge develop
2019-08-27 22:41:31 +02:00
Kavana Bhat
3dc6b26eff
AIX changes for Power8
2019-08-20 06:51:35 -05:00
Martin Kroeker
5f36f18148
Update with 0.3.7 changes
2019-08-11 23:23:27 +02:00
Martin Kroeker
d47fe78b0e
Set version to 0.3.7
2019-08-11 23:16:45 +02:00
Martin Kroeker
ebe2f47a0f
Set version to 0.3.7
2019-08-11 23:16:11 +02:00
Martin Kroeker
20d417762f
Merge pull request #2213 from xianyi/develop
...
Update from develop in preparation of the 0.3.7 release
2019-08-11 23:14:49 +02:00
xoviat
6cfd6195c5
param: define constant as blaslong to prevent overflow
2019-05-05 13:10:36 -05:00
xoviat
5163a85d40
add gitignore directory
2019-05-05 13:09:48 -05:00
xoviat
dbf9ad1f3d
tests: add windows compatibility
2019-05-05 13:09:39 -05:00
Martin Kroeker
15cb124012
Merge pull request #2100 from xianyi/develop
...
Merge develop in preparation of 0.3.6 release
2019-04-29 19:22:19 +02:00
Martin Kroeker
d93cf1126e
Merge pull request #1753 from dloghin/risc-v
...
Override ARCH (archiver) in lapack-netlib/make.inc to avoid conflict with ARCH (architecture)
2018-09-07 11:01:23 +02:00
Dumi Loghin
a1bdc308b8
override ARCH (archiver) in lapack-netlib/make.inc
2018-09-06 13:13:36 +08:00
Dumi Loghin
0b7ccb9e38
Revert "replace ARCH with AR in lapack-netlib"
...
This reverts commit db17ce896f .
2018-09-06 13:08:30 +08:00
Dumi Loghin
db17ce896f
replace ARCH with AR in lapack-netlib
2018-09-05 12:49:37 +08:00
Martin Kroeker
cbc46163bd
Merge pull request #1526 from jerryz123/upstream_riscv
...
Add support for RISC-V
2018-04-28 11:55:45 +02:00
Jerry Zhao
0ee395db35
Fixed TRMM and SYMM for RISCV
2018-04-18 18:03:32 -07:00
Jerry Zhao
c167a3d6f4
Added RISCV build
2018-04-16 14:08:31 -07:00