Commit Graph

  • 571eadb880 powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10 Rajalakshmi Srinivasaraghavan 2020-06-24 14:48:15 -05:00
  • df4ade070f Fix for #2671 Kavana Bhat 2020-06-24 04:25:47 -05:00
  • e6b9275034 address vs2019 C4293 User User-User 2020-06-24 09:12:23 +03:00
  • 53ea5bfece Merge pull request #66 from xianyi/develop Martin Kroeker 2020-06-23 10:13:44 +02:00
  • 93592d1260 Merge pull request #2675 from wjc404/develop Martin Kroeker 2020-06-23 09:29:02 +02:00
  • 6eaeb01263 Merge pull request #2658 from RajalakshmiSR/p10 Martin Kroeker 2020-06-23 00:02:37 +02:00
  • 45d542c9d1 Merge pull request #65 from xianyi/develop Martin Kroeker 2020-06-21 12:41:01 +02:00
  • 086d87a302 AVX512 dgemm tcopy_16 function wjc404 2020-06-20 00:07:43 +08:00
  • af501eb753 Merge pull request #2669 from mhillenibm/zarch_fix_gcc_detection Martin Kroeker 2020-06-17 17:55:25 +02:00
  • 0eb6c4dded Merge pull request #2672 from mhillenibm/test_num_threads Martin Kroeker 2020-06-17 17:54:31 +02:00
  • de838c38ef cpp_thread_test/dgemv: fail early if concurrency is zero Marius Hillenbrand 2020-06-17 16:15:44 +02:00
  • 478898b37a cpp_thread_test/dgemv: cap concurrency to number of hw threads on small systems Marius Hillenbrand 2020-06-17 16:08:48 +02:00
  • cde4690721 RFC: Use gcc -dumpfullversion to get minor version with gcc-7.x Marius Hillenbrand 2020-06-16 15:45:59 +02:00
  • 2389291766 Makefile.system: remove duplicate variable GCCVERSIONGT5 Marius Hillenbrand 2020-06-16 14:45:09 +02:00
  • a2d13ea611 Fix gcc version detection for zarch Marius Hillenbrand 2020-06-16 14:40:50 +02:00
  • 1bd3cd66c2 Increment version to 0.3.10.dev Martin Kroeker 2020-06-14 22:05:19 +02:00
  • 1c53e1366d Increment version to 0.3.10.dev Martin Kroeker 2020-06-14 22:04:37 +02:00
  • 63b03efc2a Merge pull request #2667 from xianyi/develop v0.3.10 Martin Kroeker 2020-06-14 22:03:04 +02:00
  • 95dbeff66d Merge branch 'release-0.3.0' into develop Martin Kroeker 2020-06-14 22:02:45 +02:00
  • 3b673a24b7 Increment version to 0.3.10.dev Martin Kroeker 2020-06-14 21:57:52 +02:00
  • 1eb1979050 Increment version to 0.3.10.dev Martin Kroeker 2020-06-14 21:57:15 +02:00
  • efc53b6e7e Merge pull request #2665 from martin-frbg/flang-fixes-2a Martin Kroeker 2020-06-14 21:56:08 +02:00
  • 72888497e2 Update with 0.3.10 changes Martin Kroeker 2020-06-14 21:55:31 +02:00
  • 7e3e006af6 Merge pull request #2666 from martin-frbg/blastest Martin Kroeker 2020-06-14 18:28:37 +02:00
  • d906d14402 Merge pull request #2664 from ACSimon33/exported_symbols Martin Kroeker 2020-06-14 18:27:03 +02:00
  • 3785c0e82b Merge pull request #2663 from martin-frbg/issue2654 Martin Kroeker 2020-06-14 18:26:43 +02:00
  • f2d8879af6 Merge pull request #2661 from martin-frbg/issue2660 Martin Kroeker 2020-06-14 18:25:37 +02:00
  • 6876221cf3 Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead Martin Kroeker 2020-06-14 17:40:24 +02:00
  • 79cdcde717 Re-enable higher optimization levels for flang while disabling loop unrolling for AOCC flang Martin Kroeker 2020-06-14 17:18:16 +02:00
  • 18a11137f1 Update BLAS tests to correspond to Reference-LAPACK 3.9.0 Martin Kroeker 2020-06-14 10:26:25 +02:00
  • 1dd712131e Fix spelling of flang option -Mrecursive and add -Kieee Martin Kroeker 2020-06-14 00:09:31 +02:00
  • 0ed2adf0b2 Fix spelling of flang option -Mrecursive and add -Kieee Martin Kroeker 2020-06-14 00:01:20 +02:00
  • abf670757b Respect predefined defaults for AR, AS, LD and RANLIB Martin Kroeker 2020-06-13 23:21:13 +02:00
  • 41fc6f3cd2 Added missing exported symbols. Simon Märtens 2020-06-13 22:37:39 +02:00
  • 007d9f97d7 Make gotoblas_corename report the name of the selected TARGET rather than its aliases Martin Kroeker 2020-06-13 19:25:28 +02:00
  • 63d26090f5 Merge pull request #64 from xianyi/develop Martin Kroeker 2020-06-13 19:14:47 +02:00
  • 9fe930f205 powerpc: Add support for future processor Rajalakshmi Srinivasaraghavan 2020-06-11 15:47:20 -05:00
  • 3a1b58d54a Merge pull request #2653 from craft-zhang/cortex-a53 Martin Kroeker 2020-06-10 12:19:33 +02:00
  • f7659be4a0 Merge pull request #2652 from martin-frbg/flang-fixes Martin Kroeker 2020-06-09 20:31:06 +02:00
  • bc6fd20a40 fix INIT8x4 ZhangDanfeng 2020-06-10 01:01:16 +08:00
  • 3ce469a34f Limit optimization level to O1 for flang and add -frecursive Martin Kroeker 2020-06-09 16:11:13 +02:00
  • ba2c5b404d When building with flang, use it also for the final link step to get dependencies right Martin Kroeker 2020-06-09 16:09:34 +02:00
  • f07a80354b Apply previously AOCC-specific workaround to all versions of flang Martin Kroeker 2020-06-09 16:07:03 +02:00
  • fdd1b50263 Merge pull request #63 from xianyi/develop Martin Kroeker 2020-06-09 15:54:30 +02:00
  • b98923f33a Test enforce -O1 for flang Leonard Lausen 2020-06-09 06:54:42 +00:00
  • 4cb1db0e3b Test flang build Leonard Lausen 2020-06-09 06:25:45 +00:00
  • 430e8b45fe Merge pull request #2648 from martin-frbg/lapack411 Martin Kroeker 2020-06-07 19:45:52 +02:00
  • 88fe85f4e0 Merge pull request #2647 from martin-frbg/aocc-flang Martin Kroeker 2020-06-07 19:45:11 +02:00
  • 89091e6b64 Merge pull request #2645 from martin-frbg/misc_fixes Martin Kroeker 2020-06-07 19:44:50 +02:00
  • 522aaf53bf Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP Martin Kroeker 2020-06-07 14:30:20 +02:00
  • c3574ffe53 Merge pull request #2646 from wjc404/develop Martin Kroeker 2020-06-07 13:18:22 +02:00
  • 4e28dc6353 Use only -O1 with AMD AOCC version of flang Martin Kroeker 2020-06-07 00:05:02 +02:00
  • 13c28889a2 Update "cosmetic fixes for non-C99 compilers" Martin Kroeker 2020-06-06 15:22:27 +02:00
  • 0e3ac4a06b Add files via upload wjc404 2020-06-06 14:56:57 +08:00
  • 28915eed72 Cosmetic fixes for non-C99 compilers Martin Kroeker 2020-06-05 10:05:34 +02:00
  • 7f60fb6b91 Delete spurious copy of common_param.h Martin Kroeker 2020-06-05 10:04:16 +02:00
  • 0464e662ad make blas_quickdivide unsigned and guard against miscompilation Martin Kroeker 2020-06-05 10:03:36 +02:00
  • 0f9a935a5a Merge pull request #62 from xianyi/develop Martin Kroeker 2020-06-05 09:51:06 +02:00
  • 79cd69fea4 Merge pull request #2644 from martin-frbg/cmake-maxstack Martin Kroeker 2020-06-05 08:33:48 +02:00
  • bb12c2c854 Limit MAX_STACK_ALLOC availability to non-Wndows Martin Kroeker 2020-06-04 19:07:27 +02:00
  • 32c1c1e125 Update azure-pipelines.yml Martin Kroeker 2020-06-04 19:03:46 +02:00
  • f1953b8b81 Update azure-pipelines.yml Martin Kroeker 2020-06-04 17:58:13 +02:00
  • 6e97df7b47 Add CMAKE support for MAX_STACK_ALLOC setting Martin Kroeker 2020-06-04 14:45:31 +02:00
  • 729303e5ed Merge pull request #2643 from craft-zhang/cortex-a53 Martin Kroeker 2020-06-04 07:58:45 +02:00
  • 547965530f Merge pull request #2638 from leezu/actions Martin Kroeker 2020-06-04 00:02:37 +02:00
  • 9b7877ccf1 sgemm copy source init ZhangDanfeng 2020-06-04 02:09:38 +08:00
  • f82fa802d1 Insert prefetch ZhangDanfeng 2020-06-04 02:08:48 +08:00
  • 3eda3d34c3 Merge pull request #2641 from martin-frbg/ppcg4 Martin Kroeker 2020-06-03 16:43:46 +02:00
  • a8f42ae85c set cmake build type to Release Martin Kroeker 2020-06-03 15:28:59 +02:00
  • e6e2e531bc revert clang pragma Martin Kroeker 2020-06-03 15:16:27 +02:00
  • 456dc04441 Update sgemm_kernel_16x4_skylakex_3.c Martin Kroeker 2020-06-03 15:15:41 +02:00
  • 89323458a9 preset optimization level for apple clang Martin Kroeker 2020-06-03 15:07:25 +02:00
  • e153bdeb70 Update dynamic_arch.yml Martin Kroeker 2020-06-03 13:46:43 +02:00
  • c2001f7756 Make cmake build verbose to see options in use Martin Kroeker 2020-06-03 12:18:15 +02:00
  • c2b3f0b3f6 Revert "keep Apple Clang from optimizing this" Martin Kroeker 2020-06-03 10:22:15 +02:00
  • f16e39554d Change PPCG4 CGEMM_M to match kernel change Martin Kroeker 2020-06-03 09:15:29 +02:00
  • b1ee81228a Change complex DOT and ROT to generic kernels and switch CGEMM Martin Kroeker 2020-06-03 09:13:29 +02:00
  • 9f7358d7dc Keep Apple Clang from optimizing this Martin Kroeker 2020-06-03 08:52:53 +02:00
  • 54fa90fb25 Keep apple clang 11.0.3 from trying to optimize this (and running out of registers) Martin Kroeker 2020-06-02 17:31:45 +02:00
  • 5a709b8340 Print CPU info in output Leonard Lausen 2020-06-01 20:51:11 +00:00
  • b31a68b835 Add Github Actions test for DYNAMIC_ARCH builds Leonard Lausen 2020-05-31 01:17:05 +00:00
  • 86552bf4c7 Update f_check Martin Kroeker 2020-05-31 15:22:12 +02:00
  • a349d48d89 Merge pull request #2636 from martin-frbg/issue2634 Martin Kroeker 2020-05-31 15:16:09 +02:00
  • 4db00121dc Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) Martin Kroeker 2020-05-31 12:39:36 +02:00
  • 909897f13b Document option USE_LOCKING Martin Kroeker 2020-05-31 12:37:57 +02:00
  • e79245acd9 Merge pull request #2635 from ilayn/patch-1 Martin Kroeker 2020-05-30 14:37:12 +02:00
  • 76d2612e0c BUG: Fix the loop range in ZHEEQUB.f Ilhan Polat 2020-05-30 14:11:11 +02:00
  • ced49466f0 Use the fortran compiler to link LAPACK-related benchmarks Martin Kroeker 2020-05-29 13:35:51 +02:00
  • 6e270f91ec add support for RETURN_BY_STACK semantics, e.g. clang Martin Kroeker 2020-05-29 13:29:10 +02:00
  • 200296b0f4 remove libomp from link list only for pgfortran Martin Kroeker 2020-05-29 13:23:51 +02:00
  • dd7a650792 Merge pull request #59 from xianyi/develop Martin Kroeker 2020-05-29 13:06:25 +02:00
  • 4a4c50a7ce Merge pull request #2627 from pkubaj/patch-1 Martin Kroeker 2020-05-26 08:36:24 +02:00
  • d069780e63 Merge pull request #2626 from docularxu/working-gcc-version-detections Martin Kroeker 2020-05-26 08:35:58 +02:00
  • 33c8790603 Add powerpc (32-bit) pkubaj 2020-05-25 13:14:09 +02:00
  • 06387ac0e6 make GCC version detection OS-independent Guodong Xu 2020-05-25 10:40:12 +00:00
  • f1a18d245b Merge pull request #2618 from craft-zhang/cortex-A53 Martin Kroeker 2020-05-25 12:14:46 +02:00
  • 2a3aa91354 update CONTRIBUTORS.md, adding myself 张丹枫 2020-05-20 22:35:26 +08:00
  • ea5bdc3f72 split cortex-a53 param to match 8x8 kernel 张丹枫 2020-05-20 22:34:47 +08:00
  • 9df79ae9a3 update sgemm and strmm kernel selecting strategy 张丹枫 2020-05-20 21:57:12 +08:00
  • a1fc6041cd use general register to speedup 张丹枫 2020-05-20 21:55:32 +08:00