TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
H.J. Lu
53ee0b76bb
x86: Enable Intel CET
...
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
6c33764ca4
Unify BUFFER_SIZE settings for x86_64 again to fix potentially fatal mismatch in DYNAMIC_ARCH builds
2020-07-22 17:30:55 +00:00
Martin Kroeker
0464e662ad
make blas_quickdivide unsigned and guard against miscompilation
2020-06-05 10:03:36 +02:00
Martin Kroeker
a52bdd9d7b
Add (empty) read barrier definition
2020-04-13 12:22:35 +02:00
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
...
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
c353d8b106
Make BUFFER_SIZE configurable
2020-02-09 23:30:22 +01:00
Martin Kroeker
280552b988
Fix mov syntax
2019-06-16 18:35:43 +02:00
Martin Kroeker
bbd4bb0154
Zero ecx with a mov instruction
...
PGI assembler does not like the initialization in the constraints.
2019-06-16 15:04:10 +02:00
luz.paz
daf2fec12d
Misc. typo fixes
...
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker
b55c586fac
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function ( #2017 )
...
* Fix missing clobber in blas_quickdivide assembly
2019-02-14 15:21:36 +01:00
Martin Kroeker
0afaae4b23
Query AVX2 and AVX512VL capability in x86 cpu detection
2019-01-05 16:58:56 +01:00
Arjan van de Ven
2ddc96c9e5
make WMB / MB safer on x86-64
...
make it so that
if (foo)
RMB;
else
MB;
is always done correctly and without syntax surprises
2018-06-17 18:06:24 +00:00
Arjan van de Ven
7e39ffe113
On x86-64, make MB/WMB compiler barriers
...
Whie on x86(64) one does not normally need full memory barriers, it's
good practice to at least use compiler barriers for places where on other
architectures memory barriers are used; this prevents the compiler
from over-optimizing.
2018-06-17 17:53:15 +00:00
Martin Kroeker
88e224f4c0
Merge pull request #1542 from martin-frbg/quickdiv64
...
Avoid out-of-bounds accesses in blas_quickdivide on big X86 systems
2018-05-02 18:11:50 +02:00
Martin Kroeker
d0c0506588
Omit the divide table overflow check on small systems
2018-05-02 14:44:50 +02:00
Martin Kroeker
c1eb06e102
Update common_x86_64.h
2018-04-29 14:40:12 +02:00
Martin Kroeker
26ce518d46
Avoid out of bounds reads from blas_quick_divide_table on big systems
...
Should fix #1541
2018-04-29 14:34:33 +02:00
Alex Arslan
a41d241a0e
Add support for DragonFly BSD
2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52
Allow building on OpenBSD
...
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Paul Osmialowski
d7afdf9137
build: Flang has the same interface as PGI
...
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-27 06:26:48 +01:00
Keno Fischer
d5e1255ca7
Don't pass REALNAME to `.end`
...
Putting the procedure there is an MSVC-ism, where it is optional. GCC silently ignores and Clang errors, so it is best to remove this.
2016-03-13 18:56:21 -04:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Grazvydas Ignotas
6b92204a7c
add fallback blas_lock implementation
...
to be used on armv5 and new platforms
2015-08-16 18:59:17 +02:00
Grazvydas Ignotas
e12cf1123e
add fallback rpcc implementation
...
- use on arm, arm64 and any new platform
- use faster integer math instead of double
- use similar scale as rdtsc so that timeouts work
2015-08-16 18:59:16 +02:00
Zhang Xianyi
f8eba3d548
Fixed cmake build bugs on Linux.
2015-08-11 16:25:16 -05:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
wernsaar
7794237475
undef WHEREAMI
2014-09-06 11:01:42 +02:00
wernsaar
2021d0f9d6
experimentally removed expensive function calls
2014-09-05 15:05:53 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Zhang Xianyi
16eb780e13
Refs #262 . Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1
2013-09-22 09:37:59 +08:00
Zhang Xianyi
a2930664f4
Refs #262 . Added executable stack markings.
2013-07-28 00:09:40 +08:00
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
2013-07-06 12:06:43 -03:00
Zhang Xianyi
88c272f6a7
Refs #83 . Added the missing ALIGN_5 macro on Mac OSX. However, it still exists SEGFAULT bug.
2012-06-20 09:20:20 +08:00
Zhang Xianyi
37edae1c90
Refs #75 . Check ffreep macro before the define.
2012-05-31 17:17:02 +08:00
Xianyi Zhang
a4daa34db7
Refs #75 . Use ffreep opcode directly. Please check out http://www.sandpile.org/x86/opc_fpu.htm .
2012-05-30 20:25:01 +08:00
Zaheer Chothia
a431042475
Fix inconsistent case for OS_* macros (Refs pull request #111 )
2012-05-23 00:01:14 +02:00
Mike Nolta
4e29b6ffc0
FreeBSD: fix OS_FreeBSD -> OS_FREEBSD typos
2012-05-21 16:57:19 -04:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00