Commit Graph

  • 8792fc4d5f Disable RPCC macro on MIPS24K Martin Kroeker 2020-04-19 07:21:48 +02:00
  • 577c5d9f8f Update README.md Martin Kroeker 2020-04-19 06:54:52 +02:00
  • 6721f2750e Update TargetList.txt Martin Kroeker 2020-04-19 06:51:57 +02:00
  • b0b02a080d Add compiler options for MIPS32 24K/1004K Martin Kroeker 2020-04-19 06:50:51 +02:00
  • a1fc98dc57 rename 1004K, 24K to MIPS1004K, MIPS24K to avoid identifier naming problem Martin Kroeker 2020-04-18 23:50:23 +02:00
  • d0737b0142 Update kernel.cmake Martin Kroeker 2020-04-18 21:36:28 +02:00
  • 7dbb59b256 Update common_macro.h Martin Kroeker 2020-04-18 21:34:14 +02:00
  • 00172d440b Typo fix in MIPS24K addition Martin Kroeker 2020-04-18 21:16:49 +02:00
  • d712ea724c Add MIPS24K support Martin Kroeker 2020-04-18 21:10:18 +02:00
  • 61bbae3ac1 Handle MIPS24K like P5600 Martin Kroeker 2020-04-18 21:09:32 +02:00
  • 1c1ca2bc0a Merge pull request #47 from xianyi/develop Martin Kroeker 2020-04-18 21:07:14 +02:00
  • c7d668c248 Update common_macro.h Martin Kroeker 2020-04-18 16:04:38 +02:00
  • a83a59b038 Use generic kernels for ishama,shasum,shdot,shrot Martin Kroeker 2020-04-18 15:53:51 +02:00
  • 0a19bd813c Use generic codes for shamax and shcopy Martin Kroeker 2020-04-18 12:52:51 +02:00
  • e7afe8a969 Define AXPBY_K fallback for float16 Martin Kroeker 2020-04-18 11:10:15 +02:00
  • f361de30a3 Use generic axpy.c for SHAXPY as x86 lacks saxpy.c Martin Kroeker 2020-04-18 11:07:16 +02:00
  • 9f6d6f6cb6 use saxpy.c instead of axpy.S for SHAXPY Martin Kroeker 2020-04-17 22:27:58 +02:00
  • 22bb50fb81 cmake fixes Rajalakshmi Srinivasaraghavan 2020-04-17 13:35:17 -05:00
  • 236a3d8ce6 Merge pull request #2563 from zelong-1024/develop Martin Kroeker 2020-04-16 11:45:32 +02:00
  • 6b7ef6543a [OpenBLAS]: benchmark error of potrf [description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F. [solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation. [dts]: l00536773 2020-04-16 10:55:10 +08:00
  • 67cc4b9e16 Fix warnings in clang and export symbol Rajalakshmi Srinivasaraghavan 2020-04-15 19:15:23 -05:00
  • 250e6f8039 Merge pull request #2557 from martin-frbg/dronebadge Martin Kroeker 2020-04-15 20:23:43 +02:00
  • 7a6d0016b0 Merge pull request #2556 from martin-frbg/epicdrone Martin Kroeker 2020-04-15 20:23:17 +02:00
  • e8e8a6e608 Restore USE_OPENMP in the x86 thread test Martin Kroeker 2020-04-15 19:26:12 +02:00
  • 579811fb6a Move all 19.04-based jobs back to ubuntu 18.04 Martin Kroeker 2020-04-15 17:38:33 +02:00
  • a87793e03c Fix DYNAMIC_ARCH compilation errors Rajalakshmi Srinivasaraghavan 2020-04-15 09:09:50 -05:00
  • ac6a22ae78 Update header Rajalakshmi Srinivasaraghavan 2020-04-14 22:58:39 -05:00
  • ff010f496e Build shgemm for all architecture Rajalakshmi Srinivasaraghavan 2020-04-14 20:38:53 -05:00
  • 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan 2020-04-14 14:55:08 -05:00
  • 84a9614345 try x86_64 test without openmp Martin Kroeker 2020-04-14 19:18:35 +02:00
  • b969533703 Add drone.io badge, mention EMAG8180 support, reformat the DYNAMIC_ARCH paragraph Martin Kroeker 2020-04-14 10:53:28 +02:00
  • 0f08f3efa6 Add a multithread test for x86_64 Martin Kroeker 2020-04-13 22:46:12 +02:00
  • c861b2a7bd Merge pull request #2553 from martin-frbg/issue2444 Martin Kroeker 2020-04-13 21:28:59 +02:00
  • cf62adffbb Merge pull request #2555 from martin-frbg/issue1137 Martin Kroeker 2020-04-13 18:29:56 +02:00
  • 3eec7d382c ARMV7 does not support DMB ISHLD, use DMB ISH Martin Kroeker 2020-04-13 15:56:31 +02:00
  • 5b0093b5fe Convert aligned moves to unaligned Martin Kroeker 2020-04-13 14:58:52 +02:00
  • f41600e66f Add a read barrier in the traversing of the buffer list Martin Kroeker 2020-04-13 12:34:02 +02:00
  • f5efecb7ca Add (empty) read barrier definition Martin Kroeker 2020-04-13 12:24:10 +02:00
  • a52bdd9d7b Add (empty) read barrier definition Martin Kroeker 2020-04-13 12:22:35 +02:00
  • db3226a646 Add (empty) read barrier definition Martin Kroeker 2020-04-13 12:18:48 +02:00
  • 69b6e258d8 Add (empty) read barrier definition Martin Kroeker 2020-04-13 12:17:41 +02:00
  • 3d4db4d002 Add read barrier definition Martin Kroeker 2020-04-13 12:16:44 +02:00
  • 99dde1d2c9 Add read barrier definition Martin Kroeker 2020-04-13 12:14:58 +02:00
  • ee6b3df02c Add read barrier definition Martin Kroeker 2020-04-13 12:14:06 +02:00
  • 25e879fe92 Add (empty) read barrier definition Martin Kroeker 2020-04-13 12:12:54 +02:00
  • d237dc1360 Add read barrier definition Martin Kroeker 2020-04-13 12:11:58 +02:00
  • 8692456226 Add read barrier definition Martin Kroeker 2020-04-13 12:10:37 +02:00
  • d1d69e1b9a Add read barrier definition Martin Kroeker 2020-04-13 12:09:24 +02:00
  • 20d0cb2f65 Merge pull request #46 from xianyi/develop Martin Kroeker 2020-04-13 12:06:40 +02:00
  • e7f0da9295 Merge pull request #2551 from martin-frbg/issue2538-2 Martin Kroeker 2020-04-12 22:34:41 +02:00
  • e9bfa2291a Fix parameter overflow Martin Kroeker 2020-04-12 19:47:02 +02:00
  • 2a28448a96 Add safeguards for sufficient BUFFER_SIZE Martin Kroeker 2020-04-12 19:45:36 +02:00
  • a33d177430 Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9 Martin Kroeker 2020-04-12 19:44:48 +02:00
  • f73391c9c9 Merge pull request #45 from xianyi/develop Martin Kroeker 2020-04-12 19:39:05 +02:00
  • 7905383cb5 Merge pull request #2547 from sharvil/develop Martin Kroeker 2020-04-11 00:35:38 +02:00
  • a8cbd451bf Merge pull request #2541 from bapt/develop Martin Kroeker 2020-04-11 00:35:07 +02:00
  • eecd8c3204 Merge pull request #2548 from gxw-loongson/develop Martin Kroeker 2020-04-11 00:34:04 +02:00
  • ea85eb2e02 Merge pull request #2549 from martin-frbg/fixthreadtest Martin Kroeker 2020-04-10 23:54:40 +02:00
  • 66f89c0aaf Match thread count to machine capability Martin Kroeker 2020-04-10 22:06:44 +02:00
  • 8d07cf9b67 Fix compilation problem on loongson platform gxw 2020-04-09 19:25:13 +08:00
  • 7b4773b24d Add API to set thread affinity on Linux. Sharvil Nanavati 2020-04-08 12:47:41 -07:00
  • 69f277f8ee Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues (#2544) Martin Kroeker 2020-04-08 11:04:51 +02:00
  • 3a6d51c2fd Merge pull request #44 from xianyi/develop Martin Kroeker 2020-04-04 22:48:53 +02:00
  • 1c7771df96 Merge pull request #43 from martin-frbg/revert-42-z12ci Martin Kroeker 2020-04-04 22:46:58 +02:00
  • a56c9ec52a Revert "Add IBM Z to Travis configuration (#42)" Martin Kroeker 2020-04-04 22:45:01 +02:00
  • 4ae6d1a01b Add a Z13 build to the Travis configuration (#2542) Martin Kroeker 2020-04-03 16:02:11 +02:00
  • 7972beb375 Add IBM Z to Travis configuration (#42) Martin Kroeker 2020-04-03 15:59:18 +02:00
  • 41e802443a libname: treat FreeBSD and DragonFly like linux and sunos Baptiste Daroussin 2020-04-03 06:20:42 +02:00
  • 7bd8624b79 Merge pull request #41 from xianyi/develop Martin Kroeker 2020-04-02 10:32:19 +02:00
  • 806f89166e Make ARMV7 compile with xcode and add a CI job for it (#2537) Martin Kroeker 2020-04-02 10:30:37 +02:00
  • f059e614eb Merge pull request #2536 from martin-frbg/recurs Martin Kroeker 2020-04-01 20:00:13 +02:00
  • e13b6773ee ifort and pgfort need "recursive" for safe compilation of LAPACK as well Martin Kroeker 2020-04-01 15:39:16 +02:00
  • a05243d0f2 ifort and pgfort need "recursive" for compiling LAPACK as well Martin Kroeker 2020-04-01 15:38:07 +02:00
  • c6af9bbb32 Merge pull request #2534 from martin-frbg/issue2496 Martin Kroeker 2020-03-31 20:53:13 +02:00
  • 144be81ca1 fix initialization to zero in the NEON SGEMM_BETA kernel as well Martin Kroeker 2020-03-31 16:53:56 +02:00
  • 07cdd5d05c Fix zero initialization for beta=0 case Martin Kroeker 2020-03-31 00:21:02 +02:00
  • 567d2760e6 Merge pull request #2520 from wjc404/develop Martin Kroeker 2020-03-30 20:15:59 +02:00
  • 018bb3e433 Merge pull request #2533 from martin-frbg/gemmdirect2 Martin Kroeker 2020-03-30 20:15:37 +02:00
  • 79fd006c58 Expose the support_avx512 function provided in dynamic.c Martin Kroeker 2020-03-26 21:25:39 +01:00
  • 8229c163b7 Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH Martin Kroeker 2020-03-26 21:12:56 +01:00
  • a986d42ea6 Merge pull request #39 from xianyi/develop Martin Kroeker 2020-03-26 21:06:51 +01:00
  • b6a948fbee Merge pull request #2530 from martin-frbg/dynmsg Martin Kroeker 2020-03-24 15:44:46 +01:00
  • 0cc352417e Merge pull request #2529 from shengyang-3390/dev1 Martin Kroeker 2020-03-24 15:44:27 +01:00
  • fe47dc8673 Add message highlighting minimum target choice at end of DYNAMIC_ARCH builds Martin Kroeker 2020-03-23 19:35:51 +01:00
  • 9f67d03d3b Merge pull request #2527 from martin-frbg/gemmdirect Martin Kroeker 2020-03-23 12:47:19 +01:00
  • 50f4fb2fbd add ctest for drotm and modified ctest for drot. make sure that test cases cover all code path when kernel uses looping unrolling. shengyang 2020-03-21 15:58:21 +08:00
  • 6a14b34c20 Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX Martin Kroeker 2020-03-22 14:33:16 +01:00
  • 8c7c1395da Merge pull request #2521 from martin-frbg/cm-avx512 Martin Kroeker 2020-03-22 01:03:42 +01:00
  • 5f6f6a2c7d Merge pull request #2525 from andreas-schwab/develop Martin Kroeker 2020-03-21 18:47:48 +01:00
  • 71cf2acdef Fix ARCHCONFIG for Neoverse-N1 Andreas Schwab 2020-03-21 17:33:33 +01:00
  • 1d9773b800 Use proper extension on the avx512 testcase filename Martin Kroeker 2020-03-20 23:05:53 +01:00
  • a46a8c4956 Merge pull request #2518 from shengyang-3390/dev Martin Kroeker 2020-03-20 23:00:06 +01:00
  • 7ae737e04c Merge pull request #2519 from martin-frbg/issue2472 Martin Kroeker 2020-03-20 22:57:44 +01:00
  • 64daad4365 Update param.h wjc404 2020-03-20 21:46:18 +00:00
  • b8307768e2 Add files via upload wjc404 2020-03-21 05:42:10 +08:00
  • 6d54c94760 Make ifort on Windows create lowercase symbols with appended underscore Martin Kroeker 2020-03-20 01:08:10 +01:00
  • c0da205412 Merge pull request #38 from xianyi/develop Martin Kroeker 2020-03-20 01:05:22 +01:00
  • a06d78556d add ctest for srotm and modified ctest for srot. make sure that test cases cover all code path when kernel uses looping unrolling. shengyang 2020-03-18 14:17:32 +08:00
  • af8a619e1f Merge pull request #2517 from wjc404/develop Martin Kroeker 2020-03-17 10:12:53 +01:00
  • 62b9608986 Update KERNEL.SKYLAKEX wjc404 2020-03-17 12:52:55 +08:00