Commit Graph

5000 Commits

Author SHA1 Message Date
Martin Kroeker
bc319cee82 Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606 Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72 Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7 Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker
d33de97d60 Adapt to having only a subset of variable types supported 2020-10-11 14:36:45 +02:00
Martin Kroeker
6a83c591d6 Adapt for having only a subset of variable types 2020-10-11 14:34:12 +02:00
Martin Kroeker
f6d2827d0c Adapt ctests to having only a subset of types in the build 2020-10-11 14:32:00 +02:00
Martin Kroeker
08f4749eb4 Adapt tests to having only a subset of types in the build 2020-10-11 14:25:24 +02:00
Martin Kroeker
63d7dad04c Adapt utests for builds supportin only some variable types 2020-10-11 14:15:35 +02:00
Martin Kroeker
ac653c94f3 Merge branch 'develop' into issue2588-cmake 2020-10-11 13:57:07 +02:00
Martin Kroeker
190b74dd24 Add files via upload 2020-10-11 13:26:05 +02:00
Martin Kroeker
88928650c4 Merge pull request #2883 from martin-frbg/issue2872
Minor CMAKE fixes
2020-10-11 10:30:33 +02:00
Martin Kroeker
82a497ec5d restore PRESCOTT default for DYNAMIC_LIST 2020-10-11 00:43:09 +02:00
Martin Kroeker
de27e4f5fb Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
2020-10-11 00:40:22 +02:00
Martin Kroeker
e1b7123bbe Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8
f32d34a015 add sse3 compiler flag 2020-10-10 10:36:15 +08:00
Martin Kroeker
599777ecb7 Merge pull request #2879 from martin-frbg/issue2839
Default BLAS3_MEM_ALLOC_THRESHOLD on all platforms to 32
2020-10-06 23:26:52 +02:00
Martin Kroeker
a5feea6611 make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows 2020-10-04 23:01:06 +02:00
Martin Kroeker
dc8e4e1959 Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable 2020-10-04 22:59:24 +02:00
Martin Kroeker
cccd1438da Merge pull request #93 from xianyi/develop
rebase
2020-10-04 22:57:11 +02:00
Martin Kroeker
f032d8966e Merge pull request #2874 from Flamefire/memory_fixes
Avoid out of bounds access on invalid memory free
2020-10-04 15:16:51 +02:00
Martin Kroeker
f6e4cf2f9d Merge pull request #2876 from Flamefire/omp_fork_fix
Lazyly reinit threads after a fork in OMP mode
2020-10-03 22:52:17 +02:00
Martin Kroeker
9828343e12 Merge pull request #2878 from brada4/asms
fix clang std=c18 compilation on aarch64
2020-10-03 22:51:49 +02:00
User User-User
d2333e7842 aarch64 fix std=c18 compilation 2020-10-03 18:00:34 +03:00
Alexander Grund
3094fc6c83 Lazyly reinit threads after a fork in OMP mode
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
2020-10-01 15:41:42 +02:00
Alexander Grund
3c05f54df8 Avoid out of bounds access on invalid memory free 2020-10-01 10:48:45 +02:00
Alexander Grund
dee7c49938 Fix TABs and trailing space 2020-10-01 10:43:16 +02:00
Martin Kroeker
d3c0d6811b Merge pull request #2873 from martin-frbg/issue2871
Check for __linux rather than linux in cpuid code and benchmarks
2020-10-01 06:38:22 +02:00
Martin Kroeker
9637cd1fd1 Merge pull request #2865 from thisch/backticks
Consolidate usage of backticks for build options
2020-10-01 06:38:06 +02:00
Martin Kroeker
2367726578 Remove redundant status message 2020-09-30 23:28:49 +02:00
Martin Kroeker
5464eb13ea Change ifdef linux to __linux for C11 compatibility 2020-09-30 22:59:41 +02:00
Martin Kroeker
e1574cbc83 Change ifdef linux to __linux for C11 compatibility
and add a fallback for unsupported operating systems in detect()
2020-09-30 22:50:21 +02:00
Martin Kroeker
0b2bb5696a Change ifdef linux to __linux for C11 compatibility 2020-09-30 22:47:25 +02:00
Martin Kroeker
a7d5d0078d Change ifdef linux to __linux for C11 compatibility 2020-09-30 22:46:25 +02:00
Martin Kroeker
be40440ec5 Change ifdef linux to __linux for C11 compatibility 2020-09-30 22:45:18 +02:00
Martin Kroeker
2bf70c8e3b Change ifdef linux to __linux for C11 compatibility 2020-09-30 22:43:25 +02:00
Qiyu8
60e6c68e38 Adapt ARM architect 2020-09-29 16:36:14 +08:00
Martin Kroeker
64629cb5c7 Merge pull request #91 from xianyi/develop
rebase
2020-09-28 22:48:53 +02:00
Qiyu8
1b1a757f5f Optimize the performance of dot by using universal intrinsics in X86/ARM 2020-09-28 20:36:53 +08:00
Martin Kroeker
0d98ce202c Merge pull request #2866 from RajalakshmiSR/p10_dcopy
Optimize dcopy/zcopy for POWER10
2020-09-28 07:22:54 +02:00
Rajalakshmi Srinivasaraghavan
2df4235e00 Optimize dcopy/zcopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-27 21:42:32 -05:00
Thomas Hisch
fe8cd5ae7e Consolidate usage of backticks for build options
There were some build options in the README that were not
highlighted. Now all are highlighted.
2020-09-28 00:42:17 +02:00
Martin Kroeker
ba31c8f5f9 Merge pull request #2853 from Qiyu8/usimd-daxpy
Optimize the performance of daxpy by using universal intrinsics
2020-09-27 23:19:59 +02:00
Martin Kroeker
e961d4d609 Merge pull request #2864 from martin-frbg/lapack445
FIx underflow/rounding errors in LAPACK (S,D)LANV2
2020-09-27 23:11:17 +02:00
Martin Kroeker
7ed25e9e10 FIx underflow/rounding errors in LAPACK (S,D)LANV2
Reference-LAPACK PR 445, fixing their issue 263
2020-09-27 22:59:20 +02:00
Martin Kroeker
7b169379e0 Merge pull request #2863 from martin-frbg/readmefixes
Readmefixes
2020-09-27 22:50:25 +02:00
Martin Kroeker
7f539fb850 Update cpu list, outline cmake build, clarify scope of set_num_threads extension 2020-09-27 22:48:41 +02:00
Martin Kroeker
caf7a12295 Merge pull request #90 from xianyi/develop
rebase
2020-09-27 22:35:45 +02:00