Commit Graph

  • 5ba01dd1a8 Add an OSX build with xcode12 Martin Kroeker 2020-09-22 17:26:19 +02:00
  • 14f7dad3b7 performance improved Qiyu8 2020-09-22 16:52:15 +08:00
  • 06cf73a239 fix a bug of trmm y00512012 2020-09-22 16:47:10 +08:00
  • 325b539c26 Optimize the performance of daxpy by using universal intrinsics Qiyu8 2020-09-22 10:38:35 +08:00
  • 0f112077e6 Merge pull request #2847 from mhillenibm/fixup_cscal Martin Kroeker 2020-09-21 22:22:43 +02:00
  • 22aa81f3e5 s390x: fix cscal and zscal implementations Marius Hillenbrand 2020-09-14 18:36:31 +02:00
  • 77ea73f5e5 s390x: for clang use fp-contract=on instead of fast Marius Hillenbrand 2020-09-16 15:55:38 +02:00
  • f91057cbad s390x: move common vector definitions and utils into header Marius Hillenbrand 2020-09-15 10:54:37 +02:00
  • 992d7ca63d Merge pull request #2845 from martin-frbg/lapack443 Martin Kroeker 2020-09-18 23:18:41 +02:00
  • 7e4d5c237c Fix workspace query in xGELQ (Reference-LAPACK PR443) Martin Kroeker 2020-09-18 09:19:46 +02:00
  • 8d12027a79 Merge pull request #86 from xianyi/develop Martin Kroeker 2020-09-18 09:17:49 +02:00
  • b1e0bcceec Merge pull request #2844 from RajalakshmiSR/daxpy_p10 Martin Kroeker 2020-09-17 23:46:32 +02:00
  • be43d2cb96 Optimize daxpy/zaxpy for POWER10 Rajalakshmi Srinivasaraghavan 2020-09-17 12:56:28 -05:00
  • 2855e6000c Merge pull request #2841 from martin-frbg/cpp_gemvtest Martin Kroeker 2020-09-17 17:29:56 +02:00
  • 144a03446d Merge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch Martin Kroeker 2020-09-17 17:28:43 +02:00
  • 75d440caa0 s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification Marius Hillenbrand 2020-09-17 16:45:07 +02:00
  • 6abca76c4e Add option for running only the less demanding GEMV version of the thread safety tests Martin Kroeker 2020-09-17 13:49:24 +02:00
  • 84c00c3c6e Support running just the GEMV version of the thread safety test Martin Kroeker 2020-09-17 13:46:41 +02:00
  • 8c5c991bd7 Add cpp_thread_test options Martin Kroeker 2020-09-17 13:45:40 +02:00
  • 2e3b15d68b Add CMakeLists.txt Martin Kroeker 2020-09-17 13:43:55 +02:00
  • eaf7f825bd Merge pull request #85 from xianyi/develop Martin Kroeker 2020-09-17 13:42:47 +02:00
  • 4c10a1673d Merge pull request #2840 from martin-frbg/fixup2833 Martin Kroeker 2020-09-16 18:55:50 +02:00
  • c4aeeeb9f4 Activate all BUILD_ options if none was specified Martin Kroeker 2020-09-15 23:15:34 +02:00
  • 3843bd188c Merge pull request #84 from xianyi/develop Martin Kroeker 2020-09-15 23:13:30 +02:00
  • ddec244a5a Merge pull request #2838 from austinpagan/gordon_trmm Martin Kroeker 2020-09-15 21:17:48 +02:00
  • dfeca46098 Adding performance patch for trmm, just like #2836 fossum 2020-09-15 08:59:50 -05:00
  • f8950f40a2 Merge pull request #2836 from austinpagan/gordon_trsm Martin Kroeker 2020-09-15 11:26:37 +02:00
  • 274d6e015b Fixing a performance bug in trsm_[LR].c. fossum 2020-09-14 13:10:48 -05:00
  • 91c84e1c01 Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Martin Kroeker 2020-09-14 15:00:19 +02:00
  • 1ee1e7b495 Merge pull request #2833 from martin-frbg/issue2830 Martin Kroeker 2020-09-14 07:24:23 +02:00
  • ba644378dc Copy BUILD_ options available to the compiler flags Martin Kroeker 2020-09-14 00:03:33 +02:00
  • 9e11c2d62f Add BUILD_SINGLE etc Martin Kroeker 2020-09-13 23:55:11 +02:00
  • 4d250d0cdf Rearrange ifdefs Martin Kroeker 2020-09-13 23:29:01 +02:00
  • de139337b8 Remove spurious tests for complex ASUM and NRM2 Martin Kroeker 2020-09-13 22:20:41 +02:00
  • ec2948f147 Make tests conditional on BUILD_DOUBLE Martin Kroeker 2020-09-13 22:17:46 +02:00
  • ce89398636 Make tests for individual variable types conditional on the respective BUILD_ option Martin Kroeker 2020-09-13 21:52:18 +02:00
  • 593ce9e237 Make building individual tests depend on BUILD_SINGLE etc defines Martin Kroeker 2020-09-13 21:50:12 +02:00
  • 74e358bcd5 Remove spurious complex16 tests Martin Kroeker 2020-09-13 21:49:01 +02:00
  • 26792d2096 Copy BUILD_* directives to the compiler options to allow ifdef in tests Martin Kroeker 2020-09-13 21:47:55 +02:00
  • 6b52c7e172 Merge pull request #2832 from martin-frbg/issue2831 Martin Kroeker 2020-09-13 21:20:30 +02:00
  • 746ad3bd19 Fix vendor match for GCC gfortran Martin Kroeker 2020-09-13 18:40:59 +02:00
  • 55d4d470ec Merge pull request #83 from xianyi/develop Martin Kroeker 2020-09-13 18:30:11 +02:00
  • a270894730 Merge pull request #2829 from mhillenibm/clang_s390x Martin Kroeker 2020-09-08 23:36:41 +02:00
  • 047b8d7aff Add an s390 build with clang to the Travis configuration Marius Hillenbrand 2020-09-08 19:30:37 +02:00
  • f7731a358a Update CONTRIBUTERS.md - clang build fixes for IBM z Marius Hillenbrand 2020-09-08 15:15:15 +02:00
  • a55fe06f25 s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Marius Hillenbrand 2020-09-07 17:13:03 +02:00
  • 4f34bcfb5e s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code Marius Hillenbrand 2020-09-07 17:04:03 +02:00
  • 0629d8ebdb s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang Marius Hillenbrand 2020-09-04 16:32:45 +02:00
  • 15da2f9acb Merge pull request #2828 from martin-frbg/lapack438 Martin Kroeker 2020-09-08 10:25:19 +02:00
  • 7d9c77f421 Correct dimension argument to xLASET Martin Kroeker 2020-09-07 22:03:46 +02:00
  • c8f029a518 Merge pull request #82 from xianyi/develop Martin Kroeker 2020-09-07 21:59:13 +02:00
  • e72430fe46 Merge pull request #2803 from xiegengxin/AVX2-asum Martin Kroeker 2020-09-06 18:32:15 +02:00
  • 6e0f6c5f00 Merge pull request #2824 from martin-frbg/asumbench Martin Kroeker 2020-09-06 10:05:47 +02:00
  • 6f8fad87c5 Use POSIX2001 clock.gettime for higher resolution Martin Kroeker 2020-09-05 19:44:01 +02:00
  • ed0f2d3dd7 Merge pull request #2816 from martin-frbg/silicon Martin Kroeker 2020-09-05 19:17:59 +02:00
  • 43a31b7786 Merge pull request #2823 from martin-frbg/fix2778 Martin Kroeker 2020-09-05 17:29:38 +02:00
  • 8a2a137a9e Correct argument to SLASET (Improves fix from PR2778) Martin Kroeker 2020-09-05 13:06:31 +02:00
  • 0d1f30a297 Merge pull request #81 from xianyi/develop Martin Kroeker 2020-09-05 12:47:03 +02:00
  • 70a254d507 Merge pull request #2822 from martin-frbg/issue2821 Martin Kroeker 2020-09-05 12:39:32 +02:00
  • 330044d821 Fix potentiol domain error in sqrt Martin Kroeker 2020-09-05 09:44:33 +02:00
  • 97636b2c8a Merge pull request #2819 from h-vetinari/carry_lapack_437 Martin Kroeker 2020-09-04 23:50:43 +02:00
  • 4d36711547 Merge pull request #2820 from RajalakshmiSR/clang Martin Kroeker 2020-09-04 23:09:31 +02:00
  • 718f67421a POWER9: Fix mcpu option with clang Rajalakshmi Srinivasaraghavan 2020-09-04 10:36:19 -05:00
  • 3426519ae2 adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h H. Vetinari 2020-09-02 22:46:47 +02:00
  • 1c6c71fa85 Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures H. Vetinari 2020-09-02 22:41:50 +02:00
  • 860247b5da Follow-up to lapack#434 & lapack#409: fix signature mismatches H. Vetinari 2020-09-02 22:38:56 +02:00
  • c61771e335 Merge pull request #2778 from martin-frbg/lapackeig Martin Kroeker 2020-09-04 10:06:02 +02:00
  • deaeb6c5b8 Add bfloat16 based dot and conversion with single/double Chen, Guobing 2020-08-27 06:42:28 +08:00
  • c7ef7174e4 Merge pull request #2817 from martin-frbg/lapack436 Martin Kroeker 2020-09-03 17:10:23 +02:00
  • 775a87242d Rename KERNEL.SILICON to KERNEL.VORTEX Martin Kroeker 2020-09-03 08:44:20 +02:00
  • af5bc95503 Rename SILICON to VORTEX and fix duplicate numbering Martin Kroeker 2020-09-03 08:43:26 +02:00
  • ea3a58c844 Rename SILICON to VORTEX Martin Kroeker 2020-09-03 08:38:53 +02:00
  • 17dca035de rename SILICON to VORTEX Martin Kroeker 2020-09-03 08:38:08 +02:00
  • 1b0f17eeed align to 64, using SSE when input size is small Gengxin Xie 2020-09-01 15:41:48 +08:00
  • c31b72965e Fix data type of work array in zgesvdq prototype Martin Kroeker 2020-09-02 23:44:44 +02:00
  • 0ce2aa3163 Fix data type of rwork array Martin Kroeker 2020-09-02 23:41:51 +02:00
  • 80794fe8fd Create KERNEL.SILICON Martin Kroeker 2020-09-02 22:56:58 +02:00
  • 4a4d1ca6e0 Add AppleSIlicon cpu Martin Kroeker 2020-09-02 22:52:12 +02:00
  • b37d17382a Add Apple Silicon Martin Kroeker 2020-09-02 22:48:49 +02:00
  • 029fd01cfb Detect AppleSilicon cpu on OSX Martin Kroeker 2020-09-02 22:47:38 +02:00
  • 9d1ea75aa0 Merge pull request #80 from xianyi/develop Martin Kroeker 2020-09-02 22:16:41 +02:00
  • 776d005f4c Merge pull request #2815 from mhillenibm/clang_s390x Martin Kroeker 2020-09-02 16:56:01 +02:00
  • 2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang Marius Hillenbrand 2020-09-01 16:16:53 +02:00
  • 095f4e6964 s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior) Marius Hillenbrand 2020-09-01 15:09:32 +02:00
  • 87e5bbd887 s390x: avoid variable-length arrays in struct for asm operands Marius Hillenbrand 2020-09-01 12:08:05 +02:00
  • b9b3265ec8 s390x: avoid inline assembly for vector loads for clang Marius Hillenbrand 2020-09-01 12:04:28 +02:00
  • a1616a0b86 s390x: replace nop with "nop 0" in inline assembly Marius Hillenbrand 2020-09-01 11:58:48 +02:00
  • 60ef193258 s390x: use "lghi" for immediate values to fix build with clang Marius Hillenbrand 2020-09-01 13:59:06 +02:00
  • 18bfb6d6f7 Merge pull request #2813 from martin-frbg/issue2804-2 Martin Kroeker 2020-09-01 23:39:46 +02:00
  • e4900caa11 Fix c_check misinterpreting arm64 in uname output to mean armv7 Martin Kroeker 2020-09-01 19:54:08 +02:00
  • 68b1713c30 Merge pull request #2811 from martin-frbg/issue2806 Martin Kroeker 2020-09-01 17:19:14 +02:00
  • 4074770d00 Merge pull request #2797 from martin-frbg/relafixes1 Martin Kroeker 2020-09-01 16:04:03 +02:00
  • b87a77da02 Merge pull request #79 from xianyi/develop Martin Kroeker 2020-09-01 12:03:53 +02:00
  • f42e84d46c Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808) Martin Kroeker 2020-09-01 10:44:48 +02:00
  • 0a4c5c4c44 Merge pull request #2807 from martin-frbg/issue2804 Martin Kroeker 2020-08-31 23:44:56 +02:00
  • 3210a42734 Report cpu as ARMV8 instead of just giving up on non-Linux hosts Martin Kroeker 2020-08-31 20:03:21 +02:00
  • 5feb087c05 Handle Apple labeling armv8 as arm64 rather than aarch64 Martin Kroeker 2020-08-31 20:02:08 +02:00
  • 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 Gengxin Xie 2020-08-31 14:39:08 +08:00
  • cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic Gengxin Xie 2020-08-21 14:44:36 +08:00
  • 59e01b1aec Merge pull request #2799 from RajalakshmiSR/p10_ger Martin Kroeker 2020-08-28 22:52:11 +02:00