Update with 0.3.10 changes
This commit is contained in:
parent
7e3e006af6
commit
72888497e2
|
@ -1,4 +1,77 @@
|
|||
OpenBLAS ChangeLog
|
||||
====================================================================
|
||||
Version 0.3.10
|
||||
14-Jun-2020
|
||||
|
||||
common:
|
||||
* Improved thread locking behaviour in blas_server and parallel getrf
|
||||
* Imported bugfix 394 from LAPACK (spurious reference to "XERBL"
|
||||
due to overlong lines)
|
||||
* Imported bugfix 403 from LAPACK (compile option "recursive" required
|
||||
for correctness with Intel and PGI)
|
||||
* Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB)
|
||||
* Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP)
|
||||
* Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that
|
||||
could lead to crashes at large matrix sizes
|
||||
* Restored internal soname in dynamic libraries on FreeBSD and Dragonfly
|
||||
* Added API (openblas_setaffinity) to set the thread affinity on Linux
|
||||
* Added initial infrastructure for half-precision floating point
|
||||
(bfloat16) support with a generic implementation of SHGEMM
|
||||
* Added CMAKE build system support for building the cblas_Xgemm3m
|
||||
functions
|
||||
* Fixed CMAKE support for building in a path with embedded spaces
|
||||
* Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC
|
||||
* Fixed GCC version detection in the Makefiles
|
||||
* Allowed overriding the names of AR, AS and LD in Makefile builds
|
||||
|
||||
POWER:
|
||||
* Fixed big-endian POWER8 ELFv2 builds on FreeBSD
|
||||
* Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9
|
||||
* Fixed CMAKE build support for POWER9
|
||||
* fixed a potential race condition in the thread buffer allocation
|
||||
* Worked around LAPACK test failures on PPC G4
|
||||
|
||||
MIPS:
|
||||
* Fixed a potential race condition in the thread buffer allocation
|
||||
* Added support for MIPS 24K/24KE family based on P5600 kernels
|
||||
|
||||
MIPS64:
|
||||
* fixed a potential race condition in the thread buffer allocation
|
||||
* Added TARGET=GENERIC
|
||||
|
||||
ARMV7:
|
||||
* Fixed a race condition in the thread buffer allocation
|
||||
|
||||
ARMV8:
|
||||
* Fixed a race condition in the thread buffer allocation
|
||||
* Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA
|
||||
* Improved performance of the ThunderX2 DAXPY kernel
|
||||
* Added an optimized SGEMM kernel for Cortex A53
|
||||
* Fixed Makefile support for INTERFACE64 (8-byte integer)
|
||||
|
||||
x86_64:
|
||||
* Fixed a syntax error in the CMAKE setup for SkylakeX
|
||||
* Improved performance of STRSM on Haswell, SkylakeX and Ryzen
|
||||
* Improved SGEMM performance on SGEMM for workloads with ldc a
|
||||
multiple of 1024
|
||||
* Improved DGEMM performance on Skylake X
|
||||
* Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH
|
||||
builds created on SkylakeX
|
||||
* Removed data alignment requirement in the SSE2 copy kernels
|
||||
that could cause spurious crashes
|
||||
* Added a workaround for an optimizer bug in AppleClang 11.0.3
|
||||
* Fixed LAPACK test failures due to wrong options for Intel Fortran
|
||||
* Fixed compilation and LAPACK test results with recent Flang
|
||||
and AMD AOCC
|
||||
* Fixed DYNAMIC_ARCH builds with CMAKE on OS X
|
||||
* Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max,
|
||||
cblas_?sum, cblas_?gemm3m in the shared library on OS
|
||||
* Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes
|
||||
show the name of an older generation chip supported by the same kernels)
|
||||
|
||||
IBM Z:
|
||||
* Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14
|
||||
|
||||
====================================================================
|
||||
Version 0.3.9
|
||||
1-Mar-2020
|
||||
|
|
Loading…
Reference in New Issue