Update Changelog.txt for 0.3.27
This commit is contained in:
parent
f5e5109318
commit
c5184078b4
100
Changelog.txt
100
Changelog.txt
|
@ -1,4 +1,104 @@
|
|||
OpenBLAS ChangeLog
|
||||
====================================================================
|
||||
Version 0.3.27
|
||||
4-Apr-2024
|
||||
|
||||
general:
|
||||
- added initial (generic) support for the CSKY architecture
|
||||
- capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
|
||||
underutilized or idle threads
|
||||
- sped up multithreaded POTRF on all platforms
|
||||
- added extension openblas_set_num_threads_local() that returns the previous thread count
|
||||
- re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
|
||||
for too small workloads
|
||||
- improved the fallback code used when the precompiled number of threads is exceeded,
|
||||
and made it callable multiple times during the lifetime of an instance
|
||||
- added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
|
||||
- fixed a potential buffer overflow in the interface to the GEMMT kernels
|
||||
- fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
|
||||
- fixed unwanted case sensitivity of the character parameters in ?TRTRS
|
||||
- sped up the OpenMP thread management code
|
||||
- fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
|
||||
- fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
|
||||
- added a testsuite for the BLAS extensions
|
||||
- modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
|
||||
spurious errors
|
||||
- added support for building the benchmark collection with CMAKE
|
||||
- added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
|
||||
with OpenMP enabled that use clang with gfortran
|
||||
- fixed building on systems with ucLibc
|
||||
- added support for calling ?NRM2 with a negative increment value on all architectures
|
||||
- added support for the LLVM18 version of the flang-new compiler
|
||||
- fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
|
||||
- Integrated fixes from the Reference-LAPACK project:
|
||||
- Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)
|
||||
|
||||
x86:
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- fixed GEMM3M functions failing in CMAKE builds
|
||||
|
||||
x86-64:
|
||||
- removed all instances of sched_yield() on Linux and BSD
|
||||
- fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
|
||||
- fixed GEMM3M functions failing in CMAKE builds
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- added compiler checks for AVX512BF16 compatibility
|
||||
- fixed LLVM compiler options for Sapphire Rapids
|
||||
- fixed cpu handling fallbacks for Sapphire Rapids with
|
||||
disabled AVX2 in DYNAMIC_ARCH mode
|
||||
- fixed extensions SCSUM and DZSUM
|
||||
- improved GEMM performance for ZEN targets
|
||||
|
||||
arm:
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
|
||||
arm64:
|
||||
- added initial support for the Cortex-A76 cpu
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- fixed default compiler options for gcc (-march and -mtune)
|
||||
- added support for ArmCompilerForLinux
|
||||
- added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
|
||||
- fixed mishandling of the INTERFACE64 option in CMAKE builds
|
||||
- corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)
|
||||
- added SVE-enabled kernels for CSUM/ZSUM
|
||||
- worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M
|
||||
|
||||
power:
|
||||
- improved performance of SGEMM on POWER8/9/10
|
||||
- improved performance of DGEMM on POWER10
|
||||
- added support for OpenMP builds with xlc/xlf on AIX
|
||||
- improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
|
||||
- fixed cpu core counting on AIX
|
||||
- added support for building a shared library on AIX
|
||||
|
||||
riscv64:
|
||||
- added support for the X280 cpu
|
||||
- added support for semi-generic RISCV models with vector length 128 or 256
|
||||
- added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- improved cpu model autodetection
|
||||
- fixed corner cases in ?AXPBY for C910V
|
||||
- fixed handling of zero increments in ?AXPY kernels for C910V
|
||||
|
||||
loongarch64:
|
||||
- added optimized kernels for ?AMIN and ?AMAX
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- fixed handling of corner cases in ?AXPBY
|
||||
- fixed computation of SAMIN and DAMIN in LSX mode
|
||||
- fixed computation of ?ROT
|
||||
- added optimized SSYMV and DSYMV kernels for LSX and LASX mode
|
||||
- added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
|
||||
- added optimized CGEMV and ZGEMV kernels
|
||||
|
||||
mips:
|
||||
- fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- fixed mishandling of the INTERFACE64 option in CMAKE builds
|
||||
|
||||
zarch:
|
||||
- fixed handling of NaN and Inf arguments in ZSCAL
|
||||
- fixed calculation of ?SUM on Z13
|
||||
|
||||
====================================================================
|
||||
Version 0.3.26
|
||||
2-Jan-2024
|
||||
|
|
Loading…
Reference in New Issue