Commit Graph

  • 4f057bffd6 Fix NULL pointer checks in blas_memory_alloc Martin Kroeker 2021-11-05 10:43:17 +01:00
  • 2c32e462ac Merge pull request #3431 from MehdiChinoune/export-shared-only Martin Kroeker 2021-11-04 23:48:02 +01:00
  • 7bcd64357d Merge pull request #3442 from martin-frbg/cpuid_x86 Martin Kroeker 2021-11-04 23:47:11 +01:00
  • 08f8bb66c0 Add CPUIDs for Alder Lake and other recent Intel cpus Martin Kroeker 2021-11-04 20:36:39 +01:00
  • faae86fba2 Add CPUIDs for Alder Lake and some other recent Intel cpus Martin Kroeker 2021-11-04 20:35:41 +01:00
  • 4ea9a14567 Merge pull request #3429 from martin-frbg/issue3428 Martin Kroeker 2021-11-04 12:13:22 +01:00
  • 3737766bdd Merge pull request #3440 from mhillenbrand/fix_gemv_indices Martin Kroeker 2021-11-04 12:11:50 +01:00
  • efb16fafb0 Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437) Martin Kroeker 2021-11-04 12:11:16 +01:00
  • f119e26354 Fix flipped indices in benchmark for gemv Marius Hillenbrand 2021-11-03 12:45:09 +01:00
  • 7093372e32 add ARMV8SVE target Bine Brank 2021-11-01 22:53:21 +01:00
  • abf45f7235 Merge pull request #3427 from mhillenbrand/zarch-detection-notes Martin Kroeker 2021-11-01 21:45:33 +01:00
  • 2847a24498 Merge pull request #3434 from gxw-loongson/develop Martin Kroeker 2021-11-01 21:44:49 +01:00
  • 25f99fa9f8 Add cblas_{c/z}srot cblas_{c/z}rotg support gxw 2021-11-01 20:15:42 +08:00
  • a8fbdbac34 fix sve dgemm kernel + sve dtrmm Bine Brank 2021-10-31 10:24:25 +01:00
  • a6fd497820 Fix nvidia HPC version checks Martin Kroeker 2021-10-30 17:31:19 +02:00
  • 746b4f0f17 added SVE ncopy and tcopy Bine Brank 2021-10-30 12:11:44 +02:00
  • 9874cd11cb Fix exported OpenBLASTargets.cmake Mehdi Chinoune 2021-10-29 21:28:21 +01:00
  • bb01e26cfe Adjust compiler options for nvidia hpc 21.9 (and fix a long-standing typo in dynamic_arch settings) Martin Kroeker 2021-10-29 16:39:03 +02:00
  • 77747bc536 cpuid_zarch/hwcaps: add documentation and dump hwcaps in init Marius Hillenbrand 2021-10-27 17:26:28 +02:00
  • aa231b5875 Merge pull request #3426 from martin-frbg/pr3424 Martin Kroeker 2021-10-28 07:31:12 +02:00
  • 22a616bd8f Add model number for Tiger Lake H (mobile variant) Martin Kroeker 2021-10-27 22:17:58 +02:00
  • 1a10d3e09d add sve dgemm prototype Bine Brank 2021-10-27 16:37:18 +02:00
  • f573ccd107 Merge pull request #3424 from Neutron3529/patch-1 Martin Kroeker 2021-10-27 16:28:12 +02:00
  • dd103eac93 Merge pull request #3423 from mhillenbrand/fix-static-detection Martin Kroeker 2021-10-27 16:27:47 +02:00
  • ead476025d auto-detect for Intel i7-11800H Neutron3529 2021-10-27 14:16:37 +08:00
  • 44950ca173 s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice Marius Hillenbrand 2021-10-26 15:19:49 +02:00
  • 03f1354336 Merge pull request #3422 from martin-frbg/issue3421 Martin Kroeker 2021-10-25 23:37:28 +02:00
  • 4b3769823a Revert #3252 Martin Kroeker 2021-10-24 23:57:06 +02:00
  • 059d3a04c1 Merge pull request #3420 from martin-frbg/issue3419 Martin Kroeker 2021-10-20 12:00:06 +02:00
  • 2845f54eb8 Remove dangerous optimization from previous #3252 - buffer is never unused here Martin Kroeker 2021-10-20 10:50:02 +02:00
  • c6208bbb45 Merge pull request #3418 from martin-frbg/issue2927-2 Martin Kroeker 2021-10-20 08:23:53 +02:00
  • 6975cbe1f0 Enable SVE for A64FX Martin Kroeker 2021-10-19 23:23:40 +02:00
  • 22bf5c27ba Add basic support for the Fujitsu A64FX (#3415) Martin Kroeker 2021-10-18 15:00:19 +02:00
  • 8cbf61792d Merge pull request #3416 from guowangy/spr-bf16 Martin Kroeker 2021-10-18 14:59:21 +02:00
  • 63a103ba6e sbgemm: spr: disable small matrix path by default Wangyang Guo 2021-10-12 01:18:37 -07:00
  • 82194ea9d2 sbgemm: spr: implement otcopy_16 Wangyang Guo 2021-09-23 01:08:40 -07:00
  • 8632380a96 sbgemm: spr: reuse ncopy_16 from cooperlake as incopy Wangyang Guo 2021-09-18 01:11:31 -07:00
  • 6bc8204ce5 sbgemm: spr: optimization for tmp_c buffer Wangyang Guo 2021-09-17 23:59:32 -07:00
  • f018aa342a sbgemm: spr: kernel handle alpha != 1.0 Wangyang Guo 2021-09-17 00:48:52 -07:00
  • a52456b168 sbgemm: spr: oncopy: use tile load/store instead Wangyang Guo 2021-09-16 20:08:42 -07:00
  • f2485352a6 sbgemm: spr: only load A once in tail_k handling Wangyang Guo 2021-09-16 01:04:01 -07:00
  • 9ab33228bb sbgemm: spr: process k2 and odd k at the same time Wangyang Guo 2021-09-15 23:59:38 -07:00
  • 7b2f5cb3b7 sbgemm: spr: enlarge P to 256 for performance Wangyang Guo 2021-09-15 20:29:49 -07:00
  • 10d52646e2 sbgemm: spr: oncopy: avoid handling too much pointer at a time Wangyang Guo 2021-09-15 19:36:02 -07:00
  • 88154ed02d sbgemm: spr: reduce tile conf loading by seperate tail k handling Wangyang Guo 2021-09-15 01:11:15 -07:00
  • 0abbcd19c1 sbgemm: spr: tuning for blocking params Wangyang Guo 2021-09-13 01:44:53 -07:00
  • a70bfb52d5 sbgemm: spr: kernel works for NN case when alpha is 1.0 Wangyang Guo 2021-09-12 19:22:58 -07:00
  • 6051c86741 sbgemm: spr: kernel works for m32 in NN case Wangyang Guo 2021-09-10 01:14:05 -07:00
  • d0b253ac6e sbgemm: spr: implement oncopy_16 Wangyang Guo 2021-09-08 19:41:12 -07:00
  • 1d48b7cb16 sbgemm: spr: add dummy source files Wangyang Guo 2021-09-06 19:48:23 -07:00
  • b57acdf2d3 Add march/mtune flags for clang builds on ARM64 as well (#3414) Martin Kroeker 2021-10-18 00:26:14 +02:00
  • 02ea3db8e7 Merge pull request #3404 from guowangy/spr-build Martin Kroeker 2021-10-17 23:05:11 +02:00
  • 4e4f78442e Merge pull request #3413 from MehdiChinoune/cmake-readibiltiy Martin Kroeker 2021-10-17 22:46:48 +02:00
  • 556788281d [NFC] Improve CMakeLists.txt file readibility Mehdi Chinoune 2021-10-17 05:19:30 +01:00
  • f348506463 Merge pull request #3411 from MehdiChinoune/both_shared_static Martin Kroeker 2021-10-17 20:07:14 +02:00
  • 28a77a8698 Support building both static and shared libraries Mehdi Chinoune 2021-10-16 08:33:47 +01:00
  • 481b3dc4b4 Merge pull request #3410 from MehdiChinoune/mingw-clang-64 Martin Kroeker 2021-10-16 13:52:41 +02:00
  • efd7ac241d Fix MinGW/Clang 64 bits detection. مهدي شينون (Mehdi Chinoune) 2021-10-16 07:55:10 +01:00
  • 1eca91f315 Fix build error in legacy gcc Wangyang Guo 2021-10-12 02:01:20 -07:00
  • 4280dff103 Add NO_AVX=1 fallbacks to Sapphire Rapids build Wangyang Guo 2021-10-12 01:39:09 -07:00
  • 3dc6052c7e initial support for Sapphire Rapids platform Wangyang Guo 2021-09-03 00:39:50 -07:00
  • 8a87e80c74 Update conda in Appveyor CI and move jobs from Appveyor to Azure (#3400) Martin Kroeker 2021-10-10 23:24:52 +02:00
  • b54b50fe3a Merge pull request #3399 from martin-frbg/issue2814 Martin Kroeker 2021-10-07 08:09:34 +02:00
  • 24233b7c49 Use "big arm server" GEMM defaults for Vortex Martin Kroeker 2021-10-06 11:10:19 +02:00
  • 8c20ca345a Use Neoverse's current mix of ThunderX2 kernels for Vortex as well Martin Kroeker 2021-10-06 11:06:43 +02:00
  • 8e4c209002 Merge pull request #3398 from kavanabhat/aix_p10_gnuas Martin Kroeker 2021-10-05 18:59:47 +02:00
  • e819341ec1 Merge pull request #3396 from martin-frbg/makesys_typo Martin Kroeker 2021-10-04 22:23:19 +02:00
  • 2b28b88cab Merge pull request #3397 from martin-frbg/m1detect Martin Kroeker 2021-10-04 22:22:57 +02:00
  • d7351deccf Fix cache reporting for Apple M1 Martin Kroeker 2021-10-04 17:58:29 +02:00
  • 1cce778585 Fix detection of Apple M1 "Vortex" Martin Kroeker 2021-10-04 16:46:41 +02:00
  • 04f3ecd026 Fix minor typo Martin Kroeker 2021-10-04 16:14:32 +02:00
  • dcb005351e Update version to 0.3.18.dev Martin Kroeker 2021-10-02 21:15:39 +02:00
  • 32b4d01d16 Update version to 0.3.18.dev Martin Kroeker 2021-10-02 21:15:00 +02:00
  • de459ed806 Merge pull request #3395 from xianyi/release-0.3.0 Martin Kroeker 2021-10-02 21:14:11 +02:00
  • efe42481e2 Update version to 0.3.18 v0.3.18 Martin Kroeker 2021-10-02 19:38:09 +02:00
  • c75759876c Merge pull request #3394 from xianyi/develop Martin Kroeker 2021-10-02 19:35:27 +02:00
  • 9549167357 Merge branch 'release-0.3.0' into develop Martin Kroeker 2021-10-02 19:35:03 +02:00
  • 686e1f0052 Update version to 0.3.18 Martin Kroeker 2021-10-02 19:29:59 +02:00
  • 5a468ae87a Update Changelog for 0.3.18 (#3388) Martin Kroeker 2021-10-02 19:25:58 +02:00
  • f0e8560660 Merge pull request #3393 from martin-frbg/azurealpine Martin Kroeker 2021-10-02 19:07:04 +02:00
  • ad87d62748 Update Alpine version Martin Kroeker 2021-10-02 16:27:34 +02:00
  • 4f86650979 Merge pull request #3392 from martin-frbg/lapack625 Martin Kroeker 2021-10-01 14:56:09 +02:00
  • 9cc95e5657 AIX changes for P10 with GNU Compiler kavanabhat 2021-10-01 05:18:35 -05:00
  • 337b65133d Fix out of bounds read in ?llarv (Reference-LAPACK PR 625) Martin Kroeker 2021-10-01 11:19:53 +02:00
  • ddb0ff5353 Fix out of bounds read in ?llarv (Reference-LAPACK PR 625) Martin Kroeker 2021-10-01 11:19:07 +02:00
  • fe497efa05 Fix out of bounds read in ?llarv (Reference-LAPACK PR 625) Martin Kroeker 2021-10-01 11:18:20 +02:00
  • 2be5ee3cca Fix out of bounds read in ?llarv (Reference-LAPACK PR 625) Martin Kroeker 2021-10-01 11:17:21 +02:00
  • c34e63ff2f Merge pull request #3390 from Keno/patch-4 Martin Kroeker 2021-10-01 09:11:12 +02:00
  • fe3c778c51 AIX changes for P10 with GNU Compiler kavanabhat 2021-09-30 06:06:27 -05:00
  • 2d33e12a11 Make sure that Netlib LAPACK respects FFLAGS Keno Fischer 2021-09-30 03:14:15 -04:00
  • 240555033b Merge pull request #3389 from guowangy/bf16-build-warn-fix Martin Kroeker 2021-09-28 19:44:36 +02:00
  • ee5ca8a328 x86_64: BFLOAT16: fix build warning Wangyang Guo 2021-09-28 18:22:15 +08:00
  • 9f52abf12c Merge pull request #3387 from commodo/adjust-mips-el-archs Martin Kroeker 2021-09-26 14:16:50 +02:00
  • b7bb2e36b8 Makefile.system: adjust mipsel/mips64el ARCH variables Alexandru Ardelean 2021-09-26 12:17:21 +03:00
  • e76ff6a44e Merge pull request #3385 from martin-frbg/update_readme Martin Kroeker 2021-09-19 18:16:02 +02:00
  • 5c537a5de0 Update README.md Martin Kroeker 2021-09-19 14:54:35 +02:00
  • 5edd88c919 Merge pull request #3384 from martin-frbg/issue3383 Martin Kroeker 2021-09-17 14:53:39 +02:00
  • 90cc944625 Move alphaI to x22 to leave x18 unused (reserved on OSX) Martin Kroeker 2021-09-17 09:53:18 +02:00
  • 590fbff06e move alpha to x19/x20 to leave x18 unused for OSX Martin Kroeker 2021-09-17 09:42:17 +02:00
  • 380940271b Move temp to x21 to leave x18 unused (reserved on OSX) Martin Kroeker 2021-09-17 09:28:19 +02:00