OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Wangyang Guo	8592c21af4	Small Matrix: skylakex: dgemm nn: fix typo in idx load	2021-08-02 07:06:54 +00:00
Wangyang Guo	3e79f6d89a	Small Matrix: skylakex: add dgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	323d7da4f7	Small Matrix: skylakex: add dgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	f57fc932ac	Small Matrix: skylakex: add dgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	91ec21202b	Small Matrix: skylakex: add dgemm nn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	72e070539c	Small Matrix: skylakex: add sgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	02c6e764f2	Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	5dc7c3c8e5	Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case	2021-08-02 07:06:54 +00:00
Wangyang Guo	642c393879	Small Matrix: skylakex: add sgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	ae3f5c737c	Small Matrix: skylakex: sgemm nt: optimize for M < 12	2021-08-02 07:06:54 +00:00
Wangyang Guo	0d72d75bf9	Small Matrix: skylakex: add sgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	ca7682e3a3	Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4	2021-08-02 07:06:54 +00:00
Wangyang Guo	9967e61abb	Small Matrix: skylakex: sgemm nn: fix error when beta not zero	2021-08-02 07:06:54 +00:00
Wangyang Guo	a87736346f	Small Matrix: skylakex: sgemm nn: add n6 to improve performance	2021-08-02 07:06:54 +00:00
Wangyang Guo	4c9d9940fd	Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time	2021-08-02 07:06:54 +00:00
Wangyang Guo	13b32f69b7	Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time	2021-08-02 07:06:54 +00:00
Wangyang Guo	3d8c6d9607	Small Matrix: skylakex: sgemm nn: clean up unused code	2021-08-02 07:06:54 +00:00
Wangyang Guo	49b61a3f30	Small Matrix: skylakex: sgemm_nn: optimize for M <= 8	2021-08-02 07:06:54 +00:00
Wangyang Guo	f88470323b	Optimize M < 16 using AVX512 mask	2021-08-02 07:06:54 +00:00
Wangyang Guo	9186456a12	small matrix: SkylakeX: add SGEMM NN kernel	2021-08-02 07:06:54 +00:00
Xianyi Zhang	6022e5629c	Refs #2587 fix small matrix c/zgemm bug.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	57ed58cefe	Refs #2587 Add small matrix optimization reference kernel for c/zgemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	17d32a4a82	Change a1b0 gemm to b0 gemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	59cb5de46b	Refs #2587 Fix typos.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	4271cfcc6f	Fix gemm interface bug for small matrix.	2021-08-02 07:06:51 +00:00
Xianyi Zhang	be3349405d	Add alpha=1.0 beta=0.0 for small gemm.	2021-08-02 07:01:47 +00:00
Xianyi Zhang	0a2077901c	Add small marix optimization kernel interface. make SMALL_MATRIX_OPT=1	2021-08-02 07:01:47 +00:00
Martin Kroeker	e6d6d3ee43	Merge pull request #3331 from gxw-loongson/develop Fixed typos about LOONGARCH64	2021-08-02 07:21:46 +02:00
gxw	0b8f7c8c10	Add cmake support for LOONGARCH64	2021-08-02 10:00:41 +08:00
Martin Kroeker	f2a7a67f5a	Improve the "tried to allocate too many buffers" error message	2021-07-31 17:23:40 +02:00
Martin Kroeker	e0e88f9edc	Merge pull request #3329 from martin-frbg/issue3272 Work around gcc11+ miscompiling C/ZBLAS3 tests at -O3	2021-07-30 20:39:38 +02:00
Martin Kroeker	5dc6aa74f0	Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3	2021-07-30 14:46:19 +02:00
Martin Kroeker	e78fbe4654	Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3	2021-07-30 14:44:54 +02:00
Martin Kroeker	b4f4ed378b	Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3	2021-07-30 14:21:08 +02:00
Martin Kroeker	cbc41973fd	Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3	2021-07-30 14:20:12 +02:00
gxw	34207bdf5b	Fixed typos about LOONGARCH64	2021-07-30 18:11:12 +08:00
Martin Kroeker	1b6db3dbba	Merge pull request #3327 from h-vetinari/lapack597_redux Complete the carry of lapack PR 597	2021-07-28 23:04:02 +02:00
Martin Kroeker	f681553c6a	Merge pull request #3326 from wattoc/develop Include Haiku in processor count checks	2021-07-28 23:03:37 +02:00
Martin Kroeker	afadeeba2a	Merge pull request #3325 from gxw-loongson/develop Add support for LOONGARCH64	2021-07-28 23:03:15 +02:00
Isuru Fernando	02d4a49761	Also make sure the `1` is INTEGER*4 for OMP_SET_NUM_THREADS	2021-07-27 23:44:51 +02:00
Craig Watson	4d7dfe4845	Include Haiku in processor count checks	2021-07-27 09:00:30 +00:00
gxw	af0a69f355	Add support for LOONGARCH64	2021-07-27 15:29:12 +08:00
Martin Kroeker	5a2fe5bfb9	Merge pull request #3323 from martin-frbg/issue3322 GCC did not support -mtune for ARM64 before 5.1	2021-07-23 22:46:02 +02:00
Martin Kroeker	342d3e8b5c	Merge pull request #3314 from martin-frbg/lapack597 Fix LAPACK testsuite compatibility with libomp (Reference-LAPACK PR 597)	2021-07-23 15:30:27 +02:00
Martin Kroeker	efbd7c7840	GCC did not support -mtune for ARM64 before 5.1	2021-07-23 13:42:52 +02:00
Martin Kroeker	3a7955cd93	Merge pull request #3320 from martin-frbg/issue3318 Empirical workaround for numpy SVD NaN problem from issue 3318	2021-07-22 21:28:50 +02:00
Martin Kroeker	47ba85f314	Fix regex to match kernels suffixed with cpuname too	2021-07-22 17:24:15 +02:00
Martin Kroeker	30f23be0f9	Rework setting of -mfma to only apply it where necessary	2021-07-22 12:00:03 +02:00
Martin Kroeker	49bbf330ca	Empirical workaround for numpy SVD NaN problem from issue 3318	2021-07-18 22:19:19 +02:00
Martin Kroeker	38d5b4b124	Update version to 0.3.17.dev	2021-07-15 15:00:01 +02:00

... 29 30 31 32 33 ...

7433 Commits All Branches Search

7433 Commits

All Branches