Update Changelog with changes from 0.3.6
This commit is contained in:
parent
9c4edd38f2
commit
9763f872fc
|
@ -1,4 +1,82 @@
|
||||||
OpenBLAS ChangeLog
|
OpenBLAS ChangeLog
|
||||||
|
====================================================================
|
||||||
|
Version 0.3.6
|
||||||
|
29-Apr-2019
|
||||||
|
|
||||||
|
common:
|
||||||
|
* the build tools now check that a given cpu TARGET is actually valid
|
||||||
|
* the build-time check of system features (c_check) has been made
|
||||||
|
less dependent on particular perl features (this should mainly
|
||||||
|
benefit building on Windows)
|
||||||
|
* several problem with the ReLAPACK integration were fixed,
|
||||||
|
including INTERFACE64 support and building a shared library
|
||||||
|
* building with CMAKE on BSD systems was improved
|
||||||
|
* a non-absolute SUM function was added based on the
|
||||||
|
existing optimized code for ASUM
|
||||||
|
* CBLAS interfaces to the IxMIN and IxMAX functions were added
|
||||||
|
* a name clash between LAPACKE and BOOST headers was resolved
|
||||||
|
* CMAKE builds with OpenMP failed to include the appropriate getrf_parallel
|
||||||
|
kernels
|
||||||
|
* a crash on thread (key) deletion with the USE_TLS=1 memory management
|
||||||
|
option was fixed
|
||||||
|
* restored several earlier fixes, in particular for OpenMP performance,
|
||||||
|
building on BSD, and calling fork on CYGWIN, which had inadvertently
|
||||||
|
been dropped in the 0.3.3 rewrite of the memory management code.
|
||||||
|
|
||||||
|
x86_64:
|
||||||
|
* the AVX512 DGEMM kernel has been disabled again due to unsolved problems
|
||||||
|
* building with old versions of MSVC was fixed
|
||||||
|
* it is now possible to build a static library on Windows with CMAKE
|
||||||
|
* accessing environment variables on CYGWIN at run time was fixed
|
||||||
|
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
|
||||||
|
* Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected
|
||||||
|
* building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported
|
||||||
|
with CMAKE as well
|
||||||
|
* building for DYNAMIC_ARCH with GENERIC as the default target is now supported
|
||||||
|
* a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed
|
||||||
|
* assembly bugs involving undeclared modification of input operands were fixed
|
||||||
|
in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem,
|
||||||
|
Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause
|
||||||
|
test failures or segfaults when compiled with recent versions of gcc from 8 onward.
|
||||||
|
* a similar bug was fixed in the blas_quickdivide code used to split workloads
|
||||||
|
in most functions
|
||||||
|
* a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX
|
||||||
|
* fixed building on SkylakeX systems when either the compiler or the (emulated) operating
|
||||||
|
environment does not support AVX512
|
||||||
|
* improved GEMM performance on ZEN targets
|
||||||
|
|
||||||
|
x86:
|
||||||
|
* build failures caused by the recently added checks for AVX512 were fixed
|
||||||
|
* an inline assembly bug involving undeclared modification of an input argument was
|
||||||
|
fixed in the blas_quickdivide code used to split workloads in most functions
|
||||||
|
* a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX
|
||||||
|
|
||||||
|
MIPS32:
|
||||||
|
* a bug in the IMIN implementation made it return the result of IMAX
|
||||||
|
|
||||||
|
POWER:
|
||||||
|
* single precision BLAS1/2 functions have received optimized POWER8 kernels
|
||||||
|
* POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel
|
||||||
|
* building on PPC970 systems under OSX Leopard or Tiger is now supported
|
||||||
|
* out-of-bounds memory accesses in the gemm_beta microkernels were fixed
|
||||||
|
* building a shared library on AIX is now supported for POWER6
|
||||||
|
* DYNAMIC_ARCH support has been added for POWER6 and newer
|
||||||
|
|
||||||
|
ARMv7:
|
||||||
|
* corrected xDOT behaviour with zero INC_X or INC_Y
|
||||||
|
* a bug in the IMIN implementation made it return the result of IMAX
|
||||||
|
|
||||||
|
ARMv8:
|
||||||
|
* added support for HiSilicon TSV110 cpus
|
||||||
|
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
|
||||||
|
* cross-compilation with CMAKE now works again
|
||||||
|
* a bug in the IMIN implementation made it return the result of IMAX
|
||||||
|
* ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7
|
||||||
|
|
||||||
|
IBM Z:
|
||||||
|
* optimized microkernels for single precicion BLAS1/2 functions have been added
|
||||||
|
for both Z13 and Z14
|
||||||
|
|
||||||
====================================================================
|
====================================================================
|
||||||
Version 0.3.5
|
Version 0.3.5
|
||||||
31-Dec-2018
|
31-Dec-2018
|
||||||
|
|
Loading…
Reference in New Issue