Martin Kroeker
ed7e155c35
Merge branch 'develop' into aix
2020-07-07 18:52:06 +02:00
Martin Kroeker
c854ef5471
Fix variable names in conditional
2020-06-25 13:29:52 +02:00
Martin Kroeker
c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
...
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Kavana Bhat
df4ade070f
Fix for #2671
2020-06-24 04:25:47 -05:00
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
...
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan
ff010f496e
Build shgemm for all architecture
2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
...
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Kavana Bhat
6baa9b07d7
AIX changes for Power8
2019-12-06 04:33:32 -06:00
Kavana Bhat
3938e59569
AIX changes for Power8
2019-12-04 00:23:46 -06:00
Martin Kroeker
e7c4d6705a
Revert #2051 and replace with a better fix ( #2261 )
...
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Kavana Bhat
3dc6b26eff
AIX changes for Power8
2019-08-20 06:51:35 -05:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Martin Kroeker
5b95534afc
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
...
for issue #2048
2019-03-09 11:21:16 +01:00
Martin Kroeker
885a3c4350
USE_TRMM on Z14
...
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
Martin Kroeker
f3fd44a731
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
...
fixes #1743
2018-08-28 21:34:07 +02:00
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
...
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker
82012b960b
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
...
... as it was just a silly workaround for the issue seen in #1563 , caused by #1419
2018-05-17 20:30:03 +02:00
Martin Kroeker
018f2dad27
Switch mips32 target to USE_TRMM to fix complex TRMM
2018-05-02 20:25:32 +02:00
Martin Kroeker
9c5518319a
Revert "Fix 32bit HASWELL builds"
2018-04-22 20:20:04 +02:00
Martin Kroeker
0e2cf102e1
Fix 32bit HASWELL
2017-10-24 10:07:44 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
2017-01-09 05:48:09 -05:00
Kaustubh Raste
c8a7860eb3
STRSM optimized
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Werner Saar
b752858d6c
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
2016-03-01 07:33:56 +01:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
...
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
2015-06-13 10:28:27 +02:00
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
2015-02-26 01:47:11 +08:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
2014-11-24 15:34:48 +08:00
wernsaar
e80b144932
enabled compiling of *3M functions
2014-07-02 14:11:53 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
ffe70b1fdc
modified Makefile.L3
2013-12-01 17:58:46 +01:00
wernsaar
cff70a666d
added generic trmm kernels and modified Makefile.L3
2013-07-30 20:18:57 +02:00
wernsaar
d854b30ae6
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
2013-06-09 17:26:42 +02:00
Wang Qian
8e53b57bb2
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
2012-01-10 17:16:13 +00:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00