Commit Graph

6685 Commits

Author SHA1 Message Date
Martin Kroeker
d39978cd7f Fix includes 2022-10-30 12:53:19 +01:00
Martin Kroeker
ce7ea72de1 Fix include paths 2022-10-30 12:50:51 +01:00
Martin Kroeker
3ebf5d219d handle INCLUDE_ALL and optional function prefixes 2022-10-30 12:49:07 +01:00
Martin Kroeker
a082d54035 Rename to avoid conflict with OpenBLAS' toplevel config.h 2022-10-30 12:47:01 +01:00
Martin Kroeker
eeebaf2294 move INCLUDE_ALL to (c)make options 2022-10-30 12:45:54 +01:00
Martin Kroeker
06b022b139 Fix ReLAPACK source selection 2022-10-30 12:42:36 +01:00
Martin Kroeker
03bd1157d8 Merge pull request #3793 from imzhuhl/new_sbgemm
New sbgemm implementation for Neoverse N2
2022-10-30 12:09:46 +01:00
Honglin Zhu
79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Honglin Zhu
4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu
843e9fd0b9 Fix typo error 2022-10-26 17:06:33 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
8c10f0abba Merge pull request #3794 from bartoldeman/benchmark-align-malloc
Benchmarks: align malloc'ed buffers.
2022-10-21 16:13:58 +02:00
Bart Oldeman
9e6b060bf3 Fix comment.
It stores the pointer, not an offset (that would be an alternative approach).
2022-10-20 20:11:09 -04:00
Bart Oldeman
9959a60873 Benchmarks: align malloc'ed buffers.
Benchmarks should allocate with cacheline (often 64 bytes) alignment
to avoid unreliable timings. This technique, storing the offset in the
byte before the pointer, doesn't require C11's aligned_alloc for
compatibility with older compilers.

For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is
not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).
2022-10-20 13:28:20 -04:00
Martin Kroeker
ad424fce08 Merge pull request #3791 from martin-frbg/issue3790
Fix pkgconfig file generation for INTERFACE64 builds
2022-10-19 07:11:33 +02:00
Martin Kroeker
5f72415f10 Suffix the pkgconfig file itself in INTERFACE64 builds 2022-10-18 20:29:24 +02:00
Martin Kroeker
747ade5adf fix INTERFACE64/USE64BITINT reporting 2022-10-18 17:28:07 +02:00
Martin Kroeker
8bacea1254 Pass libsuffix to openblas.pc and fix passing of INTERFACE64/USE64BITINT flag 2022-10-18 16:18:29 +02:00
Martin Kroeker
b2523471c9 Add libsuffix support 2022-10-18 16:16:26 +02:00
Martin Kroeker
11b2570c13 Merge pull request #3786 from martin-frbg/issue3784
Disable the gfortran tree vectorizer for lapack-netlib
2022-10-13 18:34:28 +02:00
Martin Kroeker
ab6009b0b6 Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker
32566bfb44 Disable the gfortran tree vectorizer for netlib LAPACK 2022-10-13 14:04:25 +02:00
Martin Kroeker
57809526c4 Disable the gfortran tree vectorizer for lapack-netlib 2022-10-13 09:12:23 +02:00
Martin Kroeker
eece0dfd14 Merge pull request #3781 from martin-frbg/issue3779
Fix building with only a subset of variable types on Windows
2022-10-01 19:26:09 +02:00
Martin Kroeker
db50ab4a72 Add BUILD_vartype defines 2022-10-01 15:14:51 +02:00
Martin Kroeker
a84a8a7096 Merge pull request #3778 from martin-frbg/issue3775
Fix misdetection of gfortran on Cray systems
2022-10-01 15:12:40 +02:00
Martin Kroeker
79d842047a Move Cray case after GNU as Cray builds of gfortran have both names in the version string 2022-09-30 11:58:15 +02:00
Martin Kroeker
5e78493d95 Move Cray case after GNU as Cray builds of gfortran have both names in the version string 2022-09-30 11:55:56 +02:00
Elliot Saba
d2ce93179f Add OPENBLAS_DEFAULT_NUM_THREADS
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.

The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`.  Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.

[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Martin Kroeker
8e851160d7 Merge pull request #3772 from siko1056/develop
Support CONSISTENT_FPCSR on aarch64 systems
2022-09-29 20:22:50 +02:00
Martin Kroeker
cf132deb14 Merge pull request #3774 from sashashura/patch-1
GitHub Workflows security hardening
2022-09-29 18:49:50 +02:00
Martin Kroeker
6077d81161 Merge pull request #3777 from martin-frbg/fixmips64generic2
Fix MIPS64_GENERIC copyobj declarations for DYNAMIC_ARCH
2022-09-29 13:50:59 +02:00
Martin Kroeker
f6f35a4288 fix copyobj declarations to work with DYNAMIC_ARCH 2022-09-29 08:47:14 +02:00
Alex
c726604319 build: harden dynamic_arch.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
2022-09-26 13:48:11 +02:00
Alex
4de8e1b8f9 build: harden mips64.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
2022-09-26 13:47:15 +02:00
Alex
11cd108095 build: harden nightly-Homebrew-build.yml permissions
Signed-off-by: Alex <aleksandrosansan@gmail.com>
2022-09-26 13:46:34 +02:00
Kai T. Ohlhus
c2892f0e31 Makefile.rule: update CONSISTENT_FPCSR documentation 2022-09-22 00:25:13 +09:00
Kai T. Ohlhus
84453b924f Support CONSISTENT_FPCSR on AARCH64 2022-09-22 00:20:40 +09:00
Martin Kroeker
667d0e0b48 Merge pull request #3771 from martin-frbg/fixmips64generic
Add KERNEL file for MIPS64_GENERIC as a copy of GENERIC
2022-09-19 18:58:14 +02:00
Martin Kroeker
b1d69fb3ac Add MIPS64_GENERIC as a copy of GENERIC 2022-09-17 23:52:32 +02:00
Martin Kroeker
63d063cb6d Merge pull request #3769 from XiWeiGu/mips64-test
[WIP,Testing]: Add test for mips64
2022-09-17 23:48:53 +02:00
gxw
edea1bcfaf MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500 2022-09-17 16:43:22 +08:00
gxw
548a11b9d9 [WIP,Testing]: Add test for mips64 2022-09-16 09:23:01 +08:00
Martin Kroeker
47120f20ca Merge pull request #3768 from martin-frbg/fixwarnings
Fix some warnings in x86_64 kernels
2022-09-15 13:26:21 +02:00
Martin Kroeker
101a2c77c3 Fix warnings 2022-09-15 09:19:19 +02:00
Martin Kroeker
7ee3cab4ff Merge pull request #3767 from martin-frbg/decl_adaptive
Fix missing external declaration of openblas_omp_adaptive_env()
2022-09-15 07:20:07 +02:00
Martin Kroeker
9402df5604 Fix missing external declaration 2022-09-14 21:44:34 +02:00
Martin Kroeker
dd846e72ed Merge pull request #3766 from martin-frbg/issue3640
Add (minimal) initial support for processing with the Emscripten Javascript converter
2022-09-14 20:03:57 +02:00
Martin Kroeker
b285307e18 Add a kludge for the Emscripten js converter 2022-09-14 17:05:24 +02:00
Martin Kroeker
9773a9d6b3 undefine YIELDING for the Emscripten js converter 2022-09-14 17:04:11 +02:00