Wangyang Guo
|
8592c21af4
|
Small Matrix: skylakex: dgemm nn: fix typo in idx load
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
3e79f6d89a
|
Small Matrix: skylakex: add dgemm tn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
323d7da4f7
|
Small Matrix: skylakex: add dgemm tt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
f57fc932ac
|
Small Matrix: skylakex: add dgemm nt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
91ec21202b
|
Small Matrix: skylakex: add dgemm nn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
72e070539c
|
Small Matrix: skylakex: add sgemm tt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
02c6e764f2
|
Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
5dc7c3c8e5
|
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
642c393879
|
Small Matrix: skylakex: add sgemm tn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
ae3f5c737c
|
Small Matrix: skylakex: sgemm nt: optimize for M < 12
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
0d72d75bf9
|
Small Matrix: skylakex: add sgemm nt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
ca7682e3a3
|
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9967e61abb
|
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
a87736346f
|
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
4c9d9940fd
|
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
13b32f69b7
|
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
3d8c6d9607
|
Small Matrix: skylakex: sgemm nn: clean up unused code
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
49b61a3f30
|
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
f88470323b
|
Optimize M < 16 using AVX512 mask
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9186456a12
|
small matrix: SkylakeX: add SGEMM NN kernel
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
6022e5629c
|
Refs #2587 fix small matrix c/zgemm bug.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
57ed58cefe
|
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
17d32a4a82
|
Change a1b0 gemm to b0 gemm.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
59cb5de46b
|
Refs #2587 Fix typos.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
4271cfcc6f
|
Fix gemm interface bug for small matrix.
|
2021-08-02 07:06:51 +00:00 |
Xianyi Zhang
|
be3349405d
|
Add alpha=1.0 beta=0.0 for small gemm.
|
2021-08-02 07:01:47 +00:00 |
Xianyi Zhang
|
0a2077901c
|
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
|
2021-08-02 07:01:47 +00:00 |
Martin Kroeker
|
e6d6d3ee43
|
Merge pull request #3331 from gxw-loongson/develop
Fixed typos about LOONGARCH64
|
2021-08-02 07:21:46 +02:00 |
gxw
|
0b8f7c8c10
|
Add cmake support for LOONGARCH64
|
2021-08-02 10:00:41 +08:00 |
Martin Kroeker
|
f2a7a67f5a
|
Improve the "tried to allocate too many buffers" error message
|
2021-07-31 17:23:40 +02:00 |
Martin Kroeker
|
e0e88f9edc
|
Merge pull request #3329 from martin-frbg/issue3272
Work around gcc11+ miscompiling C/ZBLAS3 tests at -O3
|
2021-07-30 20:39:38 +02:00 |
Martin Kroeker
|
5dc6aa74f0
|
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3
|
2021-07-30 14:46:19 +02:00 |
Martin Kroeker
|
e78fbe4654
|
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3
|
2021-07-30 14:44:54 +02:00 |
Martin Kroeker
|
b4f4ed378b
|
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3
|
2021-07-30 14:21:08 +02:00 |
Martin Kroeker
|
cbc41973fd
|
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3
|
2021-07-30 14:20:12 +02:00 |
gxw
|
34207bdf5b
|
Fixed typos about LOONGARCH64
|
2021-07-30 18:11:12 +08:00 |
Martin Kroeker
|
1b6db3dbba
|
Merge pull request #3327 from h-vetinari/lapack597_redux
Complete the carry of lapack PR 597
|
2021-07-28 23:04:02 +02:00 |
Martin Kroeker
|
f681553c6a
|
Merge pull request #3326 from wattoc/develop
Include Haiku in processor count checks
|
2021-07-28 23:03:37 +02:00 |
Martin Kroeker
|
afadeeba2a
|
Merge pull request #3325 from gxw-loongson/develop
Add support for LOONGARCH64
|
2021-07-28 23:03:15 +02:00 |
Isuru Fernando
|
02d4a49761
|
Also make sure the `1` is INTEGER*4 for OMP_SET_NUM_THREADS
|
2021-07-27 23:44:51 +02:00 |
Craig Watson
|
4d7dfe4845
|
Include Haiku in processor count checks
|
2021-07-27 09:00:30 +00:00 |
gxw
|
af0a69f355
|
Add support for LOONGARCH64
|
2021-07-27 15:29:12 +08:00 |
Martin Kroeker
|
5a2fe5bfb9
|
Merge pull request #3323 from martin-frbg/issue3322
GCC did not support -mtune for ARM64 before 5.1
|
2021-07-23 22:46:02 +02:00 |
Martin Kroeker
|
342d3e8b5c
|
Merge pull request #3314 from martin-frbg/lapack597
Fix LAPACK testsuite compatibility with libomp (Reference-LAPACK PR 597)
|
2021-07-23 15:30:27 +02:00 |
Martin Kroeker
|
efbd7c7840
|
GCC did not support -mtune for ARM64 before 5.1
|
2021-07-23 13:42:52 +02:00 |
Martin Kroeker
|
3a7955cd93
|
Merge pull request #3320 from martin-frbg/issue3318
Empirical workaround for numpy SVD NaN problem from issue 3318
|
2021-07-22 21:28:50 +02:00 |
Martin Kroeker
|
47ba85f314
|
Fix regex to match kernels suffixed with cpuname too
|
2021-07-22 17:24:15 +02:00 |
Martin Kroeker
|
30f23be0f9
|
Rework setting of -mfma to only apply it where necessary
|
2021-07-22 12:00:03 +02:00 |
Martin Kroeker
|
49bbf330ca
|
Empirical workaround for numpy SVD NaN problem from issue 3318
|
2021-07-18 22:19:19 +02:00 |
Martin Kroeker
|
38d5b4b124
|
Update version to 0.3.17.dev
|
2021-07-15 15:00:01 +02:00 |