parent
61659f8765
commit
c52a831ae4
111
Changelog.txt
111
Changelog.txt
|
@ -1,4 +1,115 @@
|
||||||
OpenBLAS ChangeLog
|
OpenBLAS ChangeLog
|
||||||
|
====================================================================
|
||||||
|
Version 0.3.2
|
||||||
|
30-Jul-2018
|
||||||
|
|
||||||
|
common:
|
||||||
|
* fixes for regressions caused by the rewrite of the thread
|
||||||
|
initialization code in 0.3.1
|
||||||
|
|
||||||
|
POWER:
|
||||||
|
* fixed cpu autodetection for the BSDs
|
||||||
|
|
||||||
|
MIPS64:
|
||||||
|
* fixed utest errors in AXPY, DSDOT, ROT and SWAP
|
||||||
|
|
||||||
|
x86_64:
|
||||||
|
* added autodetection of AMD Ryzen 2
|
||||||
|
* fixed build with older versions of MSVC
|
||||||
|
|
||||||
|
====================================================================
|
||||||
|
Version 0.3.1
|
||||||
|
01-Jul-2018
|
||||||
|
|
||||||
|
common:
|
||||||
|
* rewritten thread initialization code with significantly reduced overhead
|
||||||
|
* added CBLAS interfaces to the IxAMIN BLAS extension functions
|
||||||
|
* fixed the lapack-test target
|
||||||
|
* CMAKE builds now create an OpenBLASConfig.cmake file
|
||||||
|
* ZAXPY now uses a single thread for small input sizes
|
||||||
|
* the LAPACK code was updated from Reference-LAPACK/lapack#253
|
||||||
|
(fixing LAPACKE interfaces to Aasen's functions)
|
||||||
|
|
||||||
|
POWER:
|
||||||
|
* corrected CROT and ZROT behaviour with zero INC_X
|
||||||
|
|
||||||
|
ARMV7:
|
||||||
|
* corrected xDOT behaviour with zero INC_X or INC_Y
|
||||||
|
|
||||||
|
x86_64:
|
||||||
|
* retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER,
|
||||||
|
this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO
|
||||||
|
(which will still be supported via the slower PRESCOTT kernels when this option is not set)
|
||||||
|
* added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to
|
||||||
|
specify the list of x86_64 targets to include. Any target not on the list will be supported
|
||||||
|
by the Sandybridge or Nehalem kernels if available, or by Prescott.
|
||||||
|
* improved SWITCH_RATIO on Haswell for increased GEMM throughput
|
||||||
|
* added initial support for Intel Skylake X, including an AVX512 SGEMM kernel
|
||||||
|
* added autodetection of Intel Cannon Lake series as Skylake X
|
||||||
|
* added a default L2 cache size for hypervisors that return zero here (Chromebook)
|
||||||
|
* fixed a name clash with recent Windows10 headers that broke the build with (at least)
|
||||||
|
recent mingw from MSYS2
|
||||||
|
* fixed a link error in mixed clang/gfortran builds with OpenMP
|
||||||
|
* updated the OSX deployment target to 10.8
|
||||||
|
* switched on parallel make for builds on MS Windows by default
|
||||||
|
|
||||||
|
x86:
|
||||||
|
* fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y
|
||||||
|
|
||||||
|
====================================================================
|
||||||
|
Version 0.3.0
|
||||||
|
23-May-2108
|
||||||
|
|
||||||
|
common:
|
||||||
|
* fixed some more thread race and locking bugs
|
||||||
|
* added preliminary support for calling an OpenMP build of the library from multiple threads
|
||||||
|
* removed performance impact of thread locks added in 0.2.20 on OpenMP code
|
||||||
|
* general code cleanup
|
||||||
|
* optimized DSDOT implementation
|
||||||
|
* improved thread distribution for GEMM
|
||||||
|
* corrected IMATCOPY/OMATCOPY implementation
|
||||||
|
* fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations
|
||||||
|
* cmake build improvements
|
||||||
|
* pkgconfig file now contains build options
|
||||||
|
* openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build
|
||||||
|
* corrections and improvements for systems with more than 64 cpus
|
||||||
|
* LAPACK code updated to 3.8.0 including later fixes
|
||||||
|
* added ReLAPACK, a recursive implementation of several LAPACK functions
|
||||||
|
* Rewrote ROTMG to handle cases that the netlib code failed to address
|
||||||
|
* Disabled (broken) multithreading code for xTRMV
|
||||||
|
* corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard
|
||||||
|
* shared memory access failures on startup are now handled more gracefully
|
||||||
|
* restored utests from earlier releases (and made them pass on all affected systems)
|
||||||
|
|
||||||
|
SPARC:
|
||||||
|
* several fixes for cpu autodetection
|
||||||
|
|
||||||
|
POWER:
|
||||||
|
* corrected vector register overwriting in several Power8 kernels
|
||||||
|
* optimized additional BLAS functions
|
||||||
|
|
||||||
|
ARM:
|
||||||
|
* added support for CortexA53 and A72
|
||||||
|
* added autodetection for ThunderX2T99
|
||||||
|
* made most optimized kernels the default for generic ARMv8 targets
|
||||||
|
|
||||||
|
x86_64:
|
||||||
|
* parallelized DDOT kernel for Haswell
|
||||||
|
* changed alignment directives in assembly kernels to boost performance on OSX
|
||||||
|
* fixed register handling in the GEMV microkernels (bug exposed by gcc7)
|
||||||
|
* added support for building on OpenBSD and Dragonfly
|
||||||
|
* updated compiler options to work with Intel release 2018
|
||||||
|
* support fully optimized build with clang/flang on Microsoft Windows
|
||||||
|
* fixed building on AIX
|
||||||
|
|
||||||
|
IBM Z:
|
||||||
|
* added optimized BLAS 1/2 functions
|
||||||
|
|
||||||
|
MIPS:
|
||||||
|
* fixed cpu autodetection helper code
|
||||||
|
* added mips32 1004K cpu (Mediatek MT7621 and similar SoC)
|
||||||
|
* added mips64 I6500 cpu
|
||||||
|
|
||||||
====================================================================
|
====================================================================
|
||||||
Version 0.2.20
|
Version 0.2.20
|
||||||
24-Jul-2017
|
24-Jul-2017
|
||||||
|
|
Loading…
Reference in New Issue