Commit Graph

22 Commits

Author SHA1 Message Date
Martin Kroeker 675cd551da
fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
Martin Kroeker 66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:45 +02:00
Martin Kroeker 739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745)
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
2022-09-03 15:01:22 +02:00
Martin Kroeker d1ee6ff73f
fix function typecasts 2021-12-21 18:45:28 +01:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker 11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows 2019-08-28 18:07:44 +02:00
Martin Kroeker 3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
2019-08-28 11:35:31 +02:00
Martin Kroeker 9ef96b32a6
Add multithreading support to the x86_64 zdot kernel (#2222)
* Add multithreading support

copied from the ThunderX2T99 kernel. For #2221
2019-08-15 22:09:12 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Isuru Fernando ca17b4b75c Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Martin Kroeker a6efabf155 Replace gnu _real_ , _imag_ extensions in initializers 2017-03-13 00:38:37 +01:00
Martin Kroeker 16446d1d23 Remove explicit include of complex.h 2016-09-29 23:45:56 +02:00
Werner Saar 298b13bba4 updated some kernel files for EXCAVATOR 2016-04-25 10:36:23 +02:00
Werner Saar fc0e0391f3 bugfixes: replaced int with BLASLONG 2015-04-24 14:30:44 +02:00
Werner Saar 3119def9a7 updated cdot and zdot 2015-04-10 11:10:31 +02:00
Werner Saar b57a60dac8 updated cdot and zdot for piledriver 2015-04-09 10:33:46 +02:00
Werner Saar 5c51163972 added optimized cdot- and zdot-kernel for steamroller 2015-04-09 09:45:23 +02:00
Werner Saar 9299d8cfd6 added optimized cdot- and zdot-kernels for bulldozer 2015-04-08 16:29:55 +02:00