Commit Graph

5918 Commits

Author SHA1 Message Date
Wangyang Guo
a87736346f Small Matrix: skylakex: sgemm nn: add n6 to improve performance 2021-08-02 07:06:54 +00:00
Wangyang Guo
4c9d9940fd Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time 2021-08-02 07:06:54 +00:00
Wangyang Guo
13b32f69b7 Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time 2021-08-02 07:06:54 +00:00
Wangyang Guo
3d8c6d9607 Small Matrix: skylakex: sgemm nn: clean up unused code 2021-08-02 07:06:54 +00:00
Wangyang Guo
49b61a3f30 Small Matrix: skylakex: sgemm_nn: optimize for M <= 8 2021-08-02 07:06:54 +00:00
Wangyang Guo
f88470323b Optimize M < 16 using AVX512 mask 2021-08-02 07:06:54 +00:00
Wangyang Guo
9186456a12 small matrix: SkylakeX: add SGEMM NN kernel 2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c Refs #2587 fix small matrix c/zgemm bug. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
59cb5de46b Refs #2587 Fix typos. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
4271cfcc6f Fix gemm interface bug for small matrix. 2021-08-02 07:06:51 +00:00
Xianyi Zhang
be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker
e6d6d3ee43 Merge pull request #3331 from gxw-loongson/develop
Fixed typos about LOONGARCH64
2021-08-02 07:21:46 +02:00
gxw
0b8f7c8c10 Add cmake support for LOONGARCH64 2021-08-02 10:00:41 +08:00
Martin Kroeker
e0e88f9edc Merge pull request #3329 from martin-frbg/issue3272
Work around gcc11+ miscompiling C/ZBLAS3 tests at -O3
2021-07-30 20:39:38 +02:00
Martin Kroeker
5dc6aa74f0 Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 2021-07-30 14:46:19 +02:00
Martin Kroeker
e78fbe4654 Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 2021-07-30 14:44:54 +02:00
Martin Kroeker
b4f4ed378b Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 2021-07-30 14:21:08 +02:00
Martin Kroeker
cbc41973fd Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 2021-07-30 14:20:12 +02:00
gxw
34207bdf5b Fixed typos about LOONGARCH64 2021-07-30 18:11:12 +08:00
Martin Kroeker
1b6db3dbba Merge pull request #3327 from h-vetinari/lapack597_redux
Complete the carry of lapack PR 597
2021-07-28 23:04:02 +02:00
Martin Kroeker
f681553c6a Merge pull request #3326 from wattoc/develop
Include Haiku in processor count checks
2021-07-28 23:03:37 +02:00
Martin Kroeker
afadeeba2a Merge pull request #3325 from gxw-loongson/develop
Add support for LOONGARCH64
2021-07-28 23:03:15 +02:00
Isuru Fernando
02d4a49761 Also make sure the 1 is INTEGER*4 for OMP_SET_NUM_THREADS 2021-07-27 23:44:51 +02:00
Craig Watson
4d7dfe4845 Include Haiku in processor count checks 2021-07-27 09:00:30 +00:00
gxw
af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
Martin Kroeker
5a2fe5bfb9 Merge pull request #3323 from martin-frbg/issue3322
GCC did not support -mtune for ARM64 before 5.1
2021-07-23 22:46:02 +02:00
Martin Kroeker
342d3e8b5c Merge pull request #3314 from martin-frbg/lapack597
Fix LAPACK testsuite compatibility with libomp (Reference-LAPACK PR 597)
2021-07-23 15:30:27 +02:00
Martin Kroeker
efbd7c7840 GCC did not support -mtune for ARM64 before 5.1 2021-07-23 13:42:52 +02:00
Martin Kroeker
3a7955cd93 Merge pull request #3320 from martin-frbg/issue3318
Empirical workaround for numpy SVD NaN problem from issue 3318
2021-07-22 21:28:50 +02:00
Martin Kroeker
47ba85f314 Fix regex to match kernels suffixed with cpuname too 2021-07-22 17:24:15 +02:00
Martin Kroeker
30f23be0f9 Rework setting of -mfma to only apply it where necessary 2021-07-22 12:00:03 +02:00
Martin Kroeker
49bbf330ca Empirical workaround for numpy SVD NaN problem from issue 3318 2021-07-18 22:19:19 +02:00
Martin Kroeker
38d5b4b124 Update version to 0.3.17.dev 2021-07-15 15:00:01 +02:00
Martin Kroeker
6e3fbe8ac5 Update version to 0.3.17.dev 2021-07-15 14:59:15 +02:00
Martin Kroeker
86273392e5 Merge pull request #3317 from xianyi/release-0.3.0
merge 0.3.17 back into develop to copy tag
2021-07-15 14:58:20 +02:00
Martin Kroeker
d909f9f3d4 Update version to 0.3.17 v0.3.17 2021-07-15 14:52:54 +02:00
Martin Kroeker
12d3d94e2e Merge pull request #3316 from xianyi/develop
Merge develop for bugfix release 0.3.17
2021-07-15 14:51:50 +02:00
Martin Kroeker
f349be3bdb Merge branch 'release-0.3.0' into develop 2021-07-15 14:50:20 +02:00
Martin Kroeker
4777eb678f Update version to 0.3.17 2021-07-15 14:46:24 +02:00
Martin Kroeker
415876d117 Merge pull request #3315 from martin-frbg/changelog0317
Update Changelog for 0.3.17
2021-07-15 14:44:59 +02:00
Martin Kroeker
da8435dc36 Update Changelog for 0.3.17 2021-07-15 14:44:17 +02:00
Martin Kroeker
4c7065f3ee Merge pull request #3313 from martin-frbg/3266-2
Remove BLASLONG casts from SPARC parameter entries
2021-07-15 08:00:57 +02:00
Martin Kroeker
f62bfaafe8 Merge pull request #3312 from martin-frbg/revert_3260
Temporarily disable the SkylakeX sgemv_t microkernel
2021-07-15 08:00:34 +02:00
Martin Kroeker
d947116390 Merge pull request #3311 from martin-frbg/issue3309
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe …
2021-07-15 07:58:47 +02:00
Martin Kroeker
f176ff90af Declare N_THREADS as *4 for compatibility of INTERFACE64 builds with LLVM libomp 2021-07-14 22:42:43 +02:00
Martin Kroeker
f4d4abd423 Declare N_THREADS as *4 for compatibility of INTERFACE64 builds with LLVM libomp 2021-07-14 22:41:45 +02:00
Martin Kroeker
2b9443b7e7 Declare N_THREADS as *4 for compatibility of INTERFACE64 builds with LLVM libomp 2021-07-14 22:40:29 +02:00