Martin Kroeker
a9f9354296
Fix target test
2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4
Add files via upload
2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939
Merge pull request #2967 from RajalakshmiSR/dgemm88
...
POWER10: Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Aisha Tammy
60997ddd73
allow setting soname without suffix or prefix
...
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
2020-11-02 13:04:53 +00:00
Martin Kroeker
e5f8c2bf8a
typo fix
2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
2020-11-01 22:11:48 +01:00
Martin Kroeker
40a93c232b
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
...
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
2020-11-01 21:58:26 +01:00
Martin Kroeker
fab952bee4
Merge pull request #2962 from brada4/develop
...
add openbsd 68+ gfortran name
2020-11-01 14:24:40 +01:00
Martin Kroeker
1cf04a6f0e
Merge pull request #2963 from martin-frbg/issue2959
...
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
2020-11-01 09:14:54 +01:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
...
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
2020-11-01 00:00:43 +01:00
User User-User
9fab65e90a
add openbsd gfortran
2020-11-01 00:38:08 +02:00
Martin Kroeker
9efc3f0815
Merge pull request #109 from xianyi/develop
...
rebase
2020-10-31 22:33:52 +01:00
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
...
fix avx2 detection
2020-10-31 20:24:21 +01:00
Guillaume Horel
1f564d729b
fix avx2 detection
...
reword commits to make it clearer
2020-10-31 10:00:48 -04:00
Martin Kroeker
9349dcd206
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
...
Optimize caxpy for POWER10
2020-10-30 08:54:10 +01:00
Rajalakshmi Srinivasaraghavan
b435491885
Optimize caxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Martin Kroeker
9a058f2451
Merge pull request #2940 from Qiyu8/optimize-benchmark
...
Refactor the performance measurement system
2020-10-29 20:28:37 +01:00
Martin Kroeker
074927a7d0
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
...
Implementation of BF16 based gemv
2020-10-29 09:22:33 +01:00
Martin Kroeker
60b22e3462
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
...
Fix cooperlake compile issue
2020-10-29 09:22:07 +01:00
Chen, Guobing
c5e62dad69
Fix cooperlake compile issue
...
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
2020-10-29 03:37:59 +08:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
...
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Martin Kroeker
6e13a7e99e
Merge pull request #2951 from martin-frbg/cleanup_make
...
Minor Makefile cleanup
2020-10-28 09:37:56 +01:00
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
...
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
2020-10-28 09:37:32 +01:00
Martin Kroeker
5d643929dd
Merge pull request #2948 from martin-frbg/issue2947
...
Expressly enable neon for use with intrinsics if available
2020-10-28 09:37:09 +01:00
Martin Kroeker
e8cbf0fc50
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
2020-10-27 23:01:19 +01:00
Martin Kroeker
b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
2020-10-27 17:51:32 +01:00
Martin Kroeker
e2f9005db8
Merge pull request #2950 from RajalakshmiSR/saxpy
...
Optimize saxpy for POWER10
2020-10-27 00:02:18 +01:00
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
2020-10-26 21:37:04 +01:00
Martin Kroeker
878b6d1f41
Remove spurious expr in flang version check
2020-10-26 21:35:40 +01:00
Rajalakshmi Srinivasaraghavan
c24ba8b1dd
Optimize saxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Qiyu8
f917c26e83
Refractoring remaining benchmark cases.
2020-10-26 10:25:05 +08:00
Martin Kroeker
76203e2120
Merge pull request #2946 from martin-frbg/issue2945
...
Move definitions that are neither needed nor supported on Solaris
2020-10-26 00:43:44 +01:00
Martin Kroeker
eec517af0e
Expressly enable neon for use with intrinsics if available
2020-10-26 00:21:56 +01:00
Martin Kroeker
fd7da56965
Move definitions that are neither needed nor supported on SUNOS
2020-10-25 12:01:50 +01:00
Martin Kroeker
2f9fc9be30
Update version to 0.3.12.dev
2020-10-24 23:29:05 +02:00
Martin Kroeker
81fcfd5ed3
Update version to 0.3.12.dev
2020-10-24 23:28:29 +02:00
Martin Kroeker
addf7593ae
Merge pull request #2944 from xianyi/release-0.3.0
...
Merge back 0.3.12 tag (and Changelog typo fixes) from release
2020-10-24 13:10:51 +02:00
Martin Kroeker
c5f280a7f0
Fix typos
2020-10-24 13:03:28 +02:00
Martin Kroeker
6e3a05f2c9
Merge pull request #2943 from xianyi/develop
...
Merge from develop for 0.3.12 release
2020-10-24 12:52:59 +02:00
Martin Kroeker
89db73569b
Update Changelog with 0.3.12 changes
2020-10-24 12:50:04 +02:00
Martin Kroeker
e1c18e4eeb
Update version to 0.3.12 for release
2020-10-24 12:15:33 +02:00
Martin Kroeker
26f658c9d2
Update version to 0.3.12 for release
2020-10-24 12:14:45 +02:00
Martin Kroeker
dc35477317
Merge pull request #2942 from martin-frbg/makebuildtypes
...
Comment out BUILD_SINGLE etc. in Makefile.rule and add a short explanation
2020-10-24 09:26:50 +02:00
Martin Kroeker
365f28787c
Comment out BUILD_SINGLE etc. and add a short explanation
2020-10-23 23:32:06 +02:00
Martin Kroeker
2f2e9ddb65
Merge pull request #2941 from martin-frbg/exportsfix
...
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
2020-10-23 20:47:35 +02:00
Martin Kroeker
0d140e61ac
Fix wrong grouping of dcombssq
2020-10-23 15:53:40 +02:00
Martin Kroeker
4c45cd6294
fix missing split of sladiv1/dladiv/ilaenv2stage by build type
2020-10-23 15:31:25 +02:00
Martin Kroeker
680f744abf
Merge pull request #108 from xianyi/develop
...
rebase
2020-10-23 15:29:48 +02:00