Update Changelog.txt for 0.3.28
This commit is contained in:
parent
d92cc96978
commit
1df95bb23a
123
Changelog.txt
123
Changelog.txt
|
@ -1,4 +1,127 @@
|
|||
OpenBLAS ChangeLog
|
||||
====================================================================
|
||||
Version 0.3.28
|
||||
8-Aug-2024
|
||||
|
||||
general:
|
||||
- Reworked the unfinished implementation of HUGETLB from GotoBLAS
|
||||
for allocating huge memory pages as buffers on suitable systems
|
||||
- Changed the unfinished implementation of GEMM3M for the generic
|
||||
target on all architectures to at least forward to regular GEMM
|
||||
- Improved multithreaded GEMM performance for large non-skinny matrices
|
||||
- Improved BLAS3 performance on larger multicore systems through improved
|
||||
parallelism
|
||||
- Improved performance of the initial memory allocation by reducing
|
||||
locking overhead
|
||||
- Improved performance of GBMV at small problem sizes by introducing
|
||||
a size barrier for the switch to multithreading
|
||||
- Added an implementation of the CBLAS_GEMM_BATCH extension
|
||||
- Fixed miscompilation of CAXPYC and ZAXPYC on all architectures in
|
||||
CMAKE builds (error introduced in 0.3.27)
|
||||
- Fixed corner cases involving the handling of NAN and INFINITY
|
||||
arguments in ?SCAL on all architectures
|
||||
- Added support for cross-compiling to WEBM with CMAKE (in addition
|
||||
to the already present makefile support)
|
||||
- Fixed NAN handling and potential accuracy issues in compilations with
|
||||
Intel ICX by supplying a suitable fp-model option by default
|
||||
- The contents of the github project wiki have been converted into
|
||||
a new set of documentation included with the source code.
|
||||
- It is now possible to register a callback function that replaces
|
||||
the built-in support for multithreading with an external backend
|
||||
like TBB (openblas_set_threads_callback_function)
|
||||
- Fixed potential duplication of suffixes in shared library naming
|
||||
- Improved C compiler detection by the build system to tolerate more
|
||||
naming variants for gcc builds
|
||||
- Fixed an unnecessary dependency of the utest on CBLAS
|
||||
- Fixed spurious error reports from the BLAS extensions utest
|
||||
- Fixed unwanted invocation of the GEMM3M tests in cross-compilation
|
||||
- Fixed a flaw in the makefile build that could lead to the pkgconfig
|
||||
file containing an entry of UNKNOWN for the target cpu after installing
|
||||
- Integrated fixes from the Reference-LAPACK project:
|
||||
- Fixed uninitialized variables in the LAPACK tests for ?QP3RK (PR 961)
|
||||
- Fixed potential bounds error in ?UNHR_COL/?ORHR_COL (PR 1018)
|
||||
- Fixed potential infinite loop in the LAPACK testsuite (PR 1024)
|
||||
- Make the variable type used for hidden length arguments configurable (PR 1025)
|
||||
- Fixed SYTRD workspace computation and various typos (PR 1030)
|
||||
- Prevent compiler use of FMA that could increase numerical error in ?GEEVX (PR 1033)
|
||||
|
||||
x86-64:
|
||||
- reverted thread management under Windows to its state before 0.3.26
|
||||
due to signs of race conditions in some circumstances now under study
|
||||
- fixed accidental selection of the unoptimized generic SBGEMM kernel
|
||||
in CMAKE builds for CooperLake and SapphireRapids targets
|
||||
- fixed a potential thread buffer overrun in SBSTOBF16 on small systems
|
||||
- fixed an accuracy issue in ZSCAL introduced in 0.3.26
|
||||
- fixed compilation with CMAKE and recent releases of LLVM
|
||||
- added support for Intel Emerald Rapids and Meteor Lake cpus
|
||||
- added autodetection support for the Zhaoxin KX-7000 cpu
|
||||
- fixed autodetection of Intel Prescott (probably broken since 0.3.19)
|
||||
- fixed compilation for older targets with the Yocto SDK
|
||||
- fixed compilation of the converter-generated C versions
|
||||
of the LAPACK sources with gcc-14
|
||||
- improved compiler options when building with CMAKE and LLVM for
|
||||
AVX512-capable targets
|
||||
- added support for supplying the L2 cache size via an environment
|
||||
variable (OPENBLAS_L2_SIZE) in case it is not correctly reported
|
||||
(as in some VM configurations)
|
||||
- improved the error message shown when thread creation fails on startup
|
||||
- fixed setting the rpath entry of the dylib in CMAKE builds on MacOS
|
||||
|
||||
arm:
|
||||
- fixed building for baremetal targets with make
|
||||
|
||||
arm64:
|
||||
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
|
||||
matrix to the corresponding GEMV kernel
|
||||
- added optimized SGEMV and DGEMV kernels for A64FX
|
||||
- added optimized SVE kernels for small-matrix GEMM
|
||||
- added A64FX to the cpu list for DYNAMIC_ARCH
|
||||
- fixed building with support for cpu affinity
|
||||
- worked around accuracy problems with C/ZNRM2 on NeoverseN1 and
|
||||
Apple M targets
|
||||
- improved GEMM performance on Neoverse V1
|
||||
- fixed compilation for NEOVERSEN2 with older compilers
|
||||
- fixed potential miscompilation of the SVE SDOT and DDOT kernels
|
||||
- fixed potential miscompilation of the non-SVE CDOT and ZDOT kernels
|
||||
- fixed a potential overflow when using very large user-defined BUFFERSIZE
|
||||
- fixed setting the rpath entry of the dylib in CMAKE builds on MacOS
|
||||
|
||||
power:
|
||||
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
|
||||
matrix to the corresponding GEMV kernel
|
||||
- significantly improved performance of SBGEMM on POWER10
|
||||
- fixed compilation with OpenMP and the XLF compiler
|
||||
- fixed building of the BLAS extension utests under AIX
|
||||
- fixed building of parts of the LAPACK testsuite with XLF
|
||||
- fixed CSWAP/ZSWAP on big-endian POWER10 targets
|
||||
- fixed a performance regression in SAXPY on POWER10 with OpenXL
|
||||
- fixed accuracy issues in CSCAL/ZSCAL when compiled with LLVM
|
||||
- fixed building for POWER9 under FreeBSD
|
||||
- fixed a potential overflow when using very large user-defined BUFFERSIZE
|
||||
- fixed an accuracy issue in the POWER6 kernels for GEMM and GEMV
|
||||
|
||||
riscv64:
|
||||
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
|
||||
matrix to the corresponding GEMV kernel
|
||||
- fixed building for RISCV64_GENERIC with OpenMP enabled
|
||||
- added DYNAMIC_ARCH support (comprising GENERIC_RISCV64 and the two
|
||||
RVV 1.0 targets with vector length of 128 and 256)
|
||||
- worked around the ZVL128B kernels for AXPBY mishandling the special
|
||||
case of zero Y increment
|
||||
|
||||
loongarch64:
|
||||
- improved GEMM performance on servers of the 3C5000 generation
|
||||
- improved performance and stability of DGEMM
|
||||
- improved GEMV and TRSM kernels for LSX and LASX vector ABIs
|
||||
- fixed CMAKE compilation with the INTERFACE64 option set
|
||||
- fixed compilation with CMAKE
|
||||
- worked around spurious errors flagged by the BLAS3 tests
|
||||
- worked around a miscompilation of the POTRS utest by gcc 14.1
|
||||
|
||||
mips64:
|
||||
- fixed ASUM and SUM kernels to accept negative step sizes in X
|
||||
- fixed complex GEMV kernels for MSA
|
||||
|
||||
====================================================================
|
||||
Version 0.3.27
|
||||
4-Apr-2024
|
||||
|
|
Loading…
Reference in New Issue