Update with changes from 0.3.4
This commit is contained in:
parent
86cff4effc
commit
c0827a7164
|
@ -1,4 +1,77 @@
|
||||||
OpenBLAS ChangeLog
|
OpenBLAS ChangeLog
|
||||||
|
====================================================================
|
||||||
|
Version 0.3.4
|
||||||
|
02-Dec-2018
|
||||||
|
|
||||||
|
common:
|
||||||
|
* the new, experimental thread-local memory allocation had
|
||||||
|
inadvertently been left enabled for gmake builds in 0.3.3
|
||||||
|
despite the announcement. It is now disabled by default, and
|
||||||
|
single-threaded builds will keep using the old allocator even
|
||||||
|
if the USE_TLS option is turned on.
|
||||||
|
* OpenBLAS will now provide enough buffer space for at least 50
|
||||||
|
threads by default.
|
||||||
|
* The output of openblas_get_config() now contains the version
|
||||||
|
number.
|
||||||
|
* A serious thread safety bug in GEMV operation with small M and
|
||||||
|
large N size has been fixed.
|
||||||
|
* The code will now automatically call blas_thread_init after a
|
||||||
|
fork if needed before handling a call to openblas_set_num_threads
|
||||||
|
* Accesses to parallelized level3 functions from multiple callers
|
||||||
|
are now serialized to avoid thread races (unless using OpenMP).
|
||||||
|
This should provide better performance than the known-threadsafe
|
||||||
|
(but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
|
||||||
|
* When building LAPACK with gfortran, -frecursive is now (again)
|
||||||
|
enabled by default to ensure correct behaviour.
|
||||||
|
* The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
|
||||||
|
CBLAS_LAYOUT as the name of the matrix row/column order option.
|
||||||
|
* Externally set LDFLAGS are now passed through to the final compile/link
|
||||||
|
steps to facilitate setting platform-specific linker flags.
|
||||||
|
* A potential race condition during the build of LAPACK (that would
|
||||||
|
usually manifest itself as a failure to build TESTING/MATGEN) has been
|
||||||
|
fixed.
|
||||||
|
* xHEMV has been changed to stay single-threaded for small input sizes
|
||||||
|
where the overhead of multithreading exceeds any possible gains
|
||||||
|
* CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
|
||||||
|
ThunderX hardware with sizable input.
|
||||||
|
* Linker flags for the PGI compiler have been updated
|
||||||
|
* Behaviour of AXPY with zero increments is now handled in the C interface,
|
||||||
|
correcting the result on at least Intel Atom.
|
||||||
|
* The result matrix from calling SGELSS with an all-zero input matrix is
|
||||||
|
now zeroed completely.
|
||||||
|
|
||||||
|
x86_64:
|
||||||
|
* Autodetection of AMD Ryzen2 has been fixed (again).
|
||||||
|
* CMAKE builds now support labeling of an INTERFACE64=1 build of
|
||||||
|
the library with the _64 suffix.
|
||||||
|
* AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
|
||||||
|
has been sped up by rewriting with C intrinsics
|
||||||
|
* Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)
|
||||||
|
|
||||||
|
POWER:
|
||||||
|
* added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
|
||||||
|
* CPU type detection has been implemented for AIX.
|
||||||
|
* CPU type detection has been fixed for NETBSD.
|
||||||
|
|
||||||
|
MIPS64:
|
||||||
|
* AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
|
||||||
|
* DSDOT on LOONGSON3A has been fixed.
|
||||||
|
* the SGEMM microkernel has been hardened against potential data loss.
|
||||||
|
|
||||||
|
ARMV8:
|
||||||
|
* DYNAMic_ARCH support is now available for 64bit ARM
|
||||||
|
* cross-compiling for ARMV8 under iOS now works.
|
||||||
|
* cpu-specific code has been rearranged to make better use of both
|
||||||
|
hardware commonalities and model-specific compiler optimizations.
|
||||||
|
* XGENE1 has been removed as a TARGET, superseded by the improved generic
|
||||||
|
ARMV8 support.
|
||||||
|
|
||||||
|
ARMV7:
|
||||||
|
* Older assembly mnemonics have been converted to UAL form to allow
|
||||||
|
building with clang 7.0
|
||||||
|
* Cross compiling LAPACKE for Android has been fixed again (broken by
|
||||||
|
update to LAPACK 3.7.0 some while ago).
|
||||||
|
|
||||||
====================================================================
|
====================================================================
|
||||||
Version 0.3.3
|
Version 0.3.3
|
||||||
31-Aug-2018
|
31-Aug-2018
|
||||||
|
|
Loading…
Reference in New Issue