Commit Graph

5876 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan b435491885 Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Martin Kroeker 9a058f2451
Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
2020-10-29 20:28:37 +01:00
Martin Kroeker 074927a7d0
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
2020-10-29 09:22:33 +01:00
Martin Kroeker 60b22e3462
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
2020-10-29 09:22:07 +01:00
Chen, Guobing c5e62dad69 Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
2020-10-29 03:37:59 +08:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker 67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Martin Kroeker 6e13a7e99e
Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
2020-10-28 09:37:56 +01:00
Martin Kroeker 2207a16235
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
2020-10-28 09:37:32 +01:00
Martin Kroeker 5d643929dd
Merge pull request #2948 from martin-frbg/issue2947
Expressly enable neon for use with intrinsics if available
2020-10-28 09:37:09 +01:00
Martin Kroeker e8cbf0fc50
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET 2020-10-27 23:01:19 +01:00
Martin Kroeker b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails 2020-10-27 17:51:32 +01:00
Martin Kroeker e2f9005db8
Merge pull request #2950 from RajalakshmiSR/saxpy
Optimize saxpy for POWER10
2020-10-27 00:02:18 +01:00
Martin Kroeker 6a1f3e40af
Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Martin Kroeker 878b6d1f41
Remove spurious expr in flang version check 2020-10-26 21:35:40 +01:00
Rajalakshmi Srinivasaraghavan c24ba8b1dd Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Qiyu8 f917c26e83 Refractoring remaining benchmark cases. 2020-10-26 10:25:05 +08:00
Martin Kroeker 76203e2120
Merge pull request #2946 from martin-frbg/issue2945
Move definitions that are neither needed nor supported on Solaris
2020-10-26 00:43:44 +01:00
Martin Kroeker eec517af0e
Expressly enable neon for use with intrinsics if available 2020-10-26 00:21:56 +01:00
Martin Kroeker fd7da56965
Move definitions that are neither needed nor supported on SUNOS 2020-10-25 12:01:50 +01:00
Martin Kroeker 2f9fc9be30
Update version to 0.3.12.dev 2020-10-24 23:29:05 +02:00
Martin Kroeker 81fcfd5ed3
Update version to 0.3.12.dev 2020-10-24 23:28:29 +02:00
Martin Kroeker addf7593ae
Merge pull request #2944 from xianyi/release-0.3.0
Merge back 0.3.12 tag (and Changelog typo fixes) from release
2020-10-24 13:10:51 +02:00
Martin Kroeker c5f280a7f0
Fix typos 2020-10-24 13:03:28 +02:00
Martin Kroeker 6e3a05f2c9
Merge pull request #2943 from xianyi/develop
Merge from develop for 0.3.12 release
2020-10-24 12:52:59 +02:00
Martin Kroeker 89db73569b
Update Changelog with 0.3.12 changes 2020-10-24 12:50:04 +02:00
Martin Kroeker e1c18e4eeb
Update version to 0.3.12 for release 2020-10-24 12:15:33 +02:00
Martin Kroeker 26f658c9d2
Update version to 0.3.12 for release 2020-10-24 12:14:45 +02:00
Martin Kroeker dc35477317
Merge pull request #2942 from martin-frbg/makebuildtypes
Comment out  BUILD_SINGLE etc. in Makefile.rule and add a short explanation
2020-10-24 09:26:50 +02:00
Martin Kroeker 365f28787c
Comment out BUILD_SINGLE etc. and add a short explanation 2020-10-23 23:32:06 +02:00
Martin Kroeker 2f2e9ddb65
Merge pull request #2941 from martin-frbg/exportsfix
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
2020-10-23 20:47:35 +02:00
Martin Kroeker 0d140e61ac
Fix wrong grouping of dcombssq 2020-10-23 15:53:40 +02:00
Martin Kroeker 4c45cd6294
fix missing split of sladiv1/dladiv/ilaenv2stage by build type 2020-10-23 15:31:25 +02:00
Martin Kroeker 680f744abf
Merge pull request #108 from xianyi/develop
rebase
2020-10-23 15:29:48 +02:00
Martin Kroeker 6f9460f0f6
Merge pull request #2937 from martin-frbg/pwr-buffersz
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
2020-10-23 07:15:32 +02:00
Qiyu8 dd6ebdfdab Refactor the performance measurement system 2020-10-23 10:32:03 +08:00
Guillaume Horel 1917a4e7b8 reuse variables defined in Makefile.system 2020-10-22 22:04:25 -04:00
Martin Kroeker 6c970fa998
Merge pull request #2938 from martin-frbg/2934-3
Fix twisted spelling that broke the gfortran version test again
2020-10-23 00:19:49 +02:00
Martin Kroeker b23cb05231
Fix twisted spelling that broke the gfortran version test again 2020-10-23 00:18:29 +02:00
Martin Kroeker 1d4c96fa0c
Increase BUFFERSIZE further 2020-10-23 00:12:06 +02:00
Martin Kroeker 34c3c407ef
label always_inline function as inline to silence a gcc warning 2020-10-22 22:14:26 +02:00
Martin Kroeker 3f84a9ca15
Merge pull request #2936 from martin-frbg/issue2934-2
Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)
2020-10-22 22:08:46 +02:00
Martin Kroeker 7e265c50bf
Merge pull request #2935 from martin-frbg/lapack458
Fix macro used in argument conversion (LAPACK PR 458)
2020-10-22 19:25:58 +02:00
Martin Kroeker ee90f30384
Increase BUFFERSIZE for POWER8-10 and use same value for POWER6
to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH
2020-10-22 18:47:07 +02:00
Martin Kroeker 2e48d560ba
Fix compiler version check 2020-10-22 16:23:29 +02:00
Martin Kroeker ab7f466467
Merge pull request #106 from xianyi/develop
rebase
2020-10-22 16:21:09 +02:00
Martin Kroeker f95031204e
Fix macro used in argument conversion (LAPACK PR 458) 2020-10-22 16:19:26 +02:00
Martin Kroeker 909068facf
Merge pull request #2932 from RajalakshmiSR/copyp10
Optimize scopy/ccopy for POWER10
2020-10-22 00:29:46 +02:00
Martin Kroeker 5b7438fdde
Merge pull request #2934 from thrasibule/improve_version_check
actually check that version is greater than 4.7
2020-10-22 00:29:02 +02:00
Guillaume Horel 47696b43e9 actually check that version is greater than 4.7 2020-10-21 16:42:37 -04:00