Commit Graph

5343 Commits

Author SHA1 Message Date
Gengxin Xie
d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
Martin Kroeker
60ab9c783f Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
2020-11-04 16:02:46 +01:00
Martin Kroeker
8cc73fee98 Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target 2020-11-03 23:47:04 +01:00
Martin Kroeker
0155cd53a3 Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker
a9f9354296 Fix target test 2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4 Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939 Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10:  Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Aisha Tammy
60997ddd73 allow setting soname without suffix or prefix
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
2020-11-02 13:04:53 +00:00
Martin Kroeker
e5f8c2bf8a typo fix 2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658 Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target 2020-11-01 22:11:48 +01:00
Martin Kroeker
40a93c232b Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
2020-11-01 21:58:26 +01:00
Martin Kroeker
fab952bee4 Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
2020-11-01 14:24:40 +01:00
Martin Kroeker
1cf04a6f0e Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
2020-11-01 09:14:54 +01:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature.   Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Martin Kroeker
7f26be4802 Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH 2020-11-01 00:00:43 +01:00
User User-User
9fab65e90a add openbsd gfortran 2020-11-01 00:38:08 +02:00
Martin Kroeker
9efc3f0815 Merge pull request #109 from xianyi/develop
rebase
2020-10-31 22:33:52 +01:00
Martin Kroeker
aa21cb5217 Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
2020-10-31 20:24:21 +01:00
Guillaume Horel
1f564d729b fix avx2 detection
reword commits to make it clearer
2020-10-31 10:00:48 -04:00
Martin Kroeker
9349dcd206 Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
2020-10-30 08:54:10 +01:00
Rajalakshmi Srinivasaraghavan
b435491885 Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Martin Kroeker
9a058f2451 Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
2020-10-29 20:28:37 +01:00
Martin Kroeker
074927a7d0 Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
2020-10-29 09:22:33 +01:00
Martin Kroeker
60b22e3462 Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
2020-10-29 09:22:07 +01:00
Chen, Guobing
c5e62dad69 Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
2020-10-29 03:37:59 +08:00
Chen, Guobing
a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
67f39ad813 Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Martin Kroeker
6e13a7e99e Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
2020-10-28 09:37:56 +01:00
Martin Kroeker
2207a16235 Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
2020-10-28 09:37:32 +01:00
Martin Kroeker
5d643929dd Merge pull request #2948 from martin-frbg/issue2947
Expressly enable neon for use with intrinsics if available
2020-10-28 09:37:09 +01:00
Martin Kroeker
e8cbf0fc50 Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET 2020-10-27 23:01:19 +01:00
Martin Kroeker
b937d78a6d Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails 2020-10-27 17:51:32 +01:00
Martin Kroeker
e2f9005db8 Merge pull request #2950 from RajalakshmiSR/saxpy
Optimize saxpy for POWER10
2020-10-27 00:02:18 +01:00
Martin Kroeker
6a1f3e40af Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Martin Kroeker
878b6d1f41 Remove spurious expr in flang version check 2020-10-26 21:35:40 +01:00
Rajalakshmi Srinivasaraghavan
c24ba8b1dd Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Qiyu8
f917c26e83 Refractoring remaining benchmark cases. 2020-10-26 10:25:05 +08:00
Martin Kroeker
76203e2120 Merge pull request #2946 from martin-frbg/issue2945
Move definitions that are neither needed nor supported on Solaris
2020-10-26 00:43:44 +01:00
Martin Kroeker
eec517af0e Expressly enable neon for use with intrinsics if available 2020-10-26 00:21:56 +01:00
Martin Kroeker
fd7da56965 Move definitions that are neither needed nor supported on SUNOS 2020-10-25 12:01:50 +01:00
Martin Kroeker
2f9fc9be30 Update version to 0.3.12.dev 2020-10-24 23:29:05 +02:00
Martin Kroeker
81fcfd5ed3 Update version to 0.3.12.dev 2020-10-24 23:28:29 +02:00
Martin Kroeker
addf7593ae Merge pull request #2944 from xianyi/release-0.3.0
Merge back 0.3.12 tag (and Changelog typo fixes) from release
2020-10-24 13:10:51 +02:00
Martin Kroeker
c5f280a7f0 Fix typos v0.3.12 2020-10-24 13:03:28 +02:00
Martin Kroeker
6e3a05f2c9 Merge pull request #2943 from xianyi/develop
Merge from develop for 0.3.12 release
2020-10-24 12:52:59 +02:00
Martin Kroeker
89db73569b Update Changelog with 0.3.12 changes 2020-10-24 12:50:04 +02:00
Martin Kroeker
e1c18e4eeb Update version to 0.3.12 for release 2020-10-24 12:15:33 +02:00
Martin Kroeker
26f658c9d2 Update version to 0.3.12 for release 2020-10-24 12:14:45 +02:00
Martin Kroeker
dc35477317 Merge pull request #2942 from martin-frbg/makebuildtypes
Comment out  BUILD_SINGLE etc. in Makefile.rule and add a short explanation
2020-10-24 09:26:50 +02:00
Martin Kroeker
365f28787c Comment out BUILD_SINGLE etc. and add a short explanation 2020-10-23 23:32:06 +02:00