Martin Kroeker
f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes
...
Avoid out of bounds access on invalid memory free
2020-10-04 15:16:51 +02:00
Martin Kroeker
f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix
...
Lazyly reinit threads after a fork in OMP mode
2020-10-03 22:52:17 +02:00
Martin Kroeker
9828343e12
Merge pull request #2878 from brada4/asms
...
fix clang std=c18 compilation on aarch64
2020-10-03 22:51:49 +02:00
User User-User
d2333e7842
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
Alexander Grund
3094fc6c83
Lazyly reinit threads after a fork in OMP mode
...
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
2020-10-01 15:41:42 +02:00
Alexander Grund
3c05f54df8
Avoid out of bounds access on invalid memory free
2020-10-01 10:48:45 +02:00
Alexander Grund
dee7c49938
Fix TABs and trailing space
2020-10-01 10:43:16 +02:00
Martin Kroeker
d3c0d6811b
Merge pull request #2873 from martin-frbg/issue2871
...
Check for __linux rather than linux in cpuid code and benchmarks
2020-10-01 06:38:22 +02:00
Martin Kroeker
9637cd1fd1
Merge pull request #2865 from thisch/backticks
...
Consolidate usage of backticks for build options
2020-10-01 06:38:06 +02:00
Martin Kroeker
2367726578
Remove redundant status message
2020-09-30 23:28:49 +02:00
Martin Kroeker
5464eb13ea
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:59:41 +02:00
Martin Kroeker
e1574cbc83
Change ifdef linux to __linux for C11 compatibility
...
and add a fallback for unsupported operating systems in detect()
2020-09-30 22:50:21 +02:00
Martin Kroeker
0b2bb5696a
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:47:25 +02:00
Martin Kroeker
a7d5d0078d
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:46:25 +02:00
Martin Kroeker
be40440ec5
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:45:18 +02:00
Martin Kroeker
2bf70c8e3b
Change ifdef linux to __linux for C11 compatibility
2020-09-30 22:43:25 +02:00
Qiyu8
60e6c68e38
Adapt ARM architect
2020-09-29 16:36:14 +08:00
Martin Kroeker
64629cb5c7
Merge pull request #91 from xianyi/develop
...
rebase
2020-09-28 22:48:53 +02:00
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-09-28 20:36:53 +08:00
Martin Kroeker
0d98ce202c
Merge pull request #2866 from RajalakshmiSR/p10_dcopy
...
Optimize dcopy/zcopy for POWER10
2020-09-28 07:22:54 +02:00
Rajalakshmi Srinivasaraghavan
2df4235e00
Optimize dcopy/zcopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-27 21:42:32 -05:00
Thomas Hisch
fe8cd5ae7e
Consolidate usage of backticks for build options
...
There were some build options in the README that were not
highlighted. Now all are highlighted.
2020-09-28 00:42:17 +02:00
Martin Kroeker
ba31c8f5f9
Merge pull request #2853 from Qiyu8/usimd-daxpy
...
Optimize the performance of daxpy by using universal intrinsics
2020-09-27 23:19:59 +02:00
Martin Kroeker
e961d4d609
Merge pull request #2864 from martin-frbg/lapack445
...
FIx underflow/rounding errors in LAPACK (S,D)LANV2
2020-09-27 23:11:17 +02:00
Martin Kroeker
7ed25e9e10
FIx underflow/rounding errors in LAPACK (S,D)LANV2
...
Reference-LAPACK PR 445, fixing their issue 263
2020-09-27 22:59:20 +02:00
Martin Kroeker
7b169379e0
Merge pull request #2863 from martin-frbg/readmefixes
...
Readmefixes
2020-09-27 22:50:25 +02:00
Martin Kroeker
7f539fb850
Update cpu list, outline cmake build, clarify scope of set_num_threads extension
2020-09-27 22:48:41 +02:00
Martin Kroeker
caf7a12295
Merge pull request #90 from xianyi/develop
...
rebase
2020-09-27 22:35:45 +02:00
Martin Kroeker
72b5b73647
Merge pull request #2850 from xiaojiayuan111/develop
...
fix a bug of trmm
2020-09-27 12:12:35 +02:00
Qiyu8
881c15179f
remove default support for FMA4 on zen architect
2020-09-27 09:35:50 +08:00
Martin Kroeker
896bbd55e1
Add support for building only selected variable types
2020-09-26 23:25:55 +02:00
Martin Kroeker
c5a32288c6
Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16
2020-09-26 23:24:37 +02:00
Martin Kroeker
dfaafd3b55
Merge pull request #2854 from martin-frbg/travis-graviton
...
Add an AWS-Graviton2 build to Travis CI
2020-09-23 21:59:18 +02:00
Martin Kroeker
f2e9a24e1a
Add AWS Graviton2 build
2020-09-23 19:02:20 +02:00
Martin Kroeker
98153875e9
Adapt tests to having only a subset of types in the library
2020-09-22 23:28:57 +02:00
Martin Kroeker
0eaae30e8c
Adapt tests to having only a subset of types in the build
2020-09-22 23:28:03 +02:00
Martin Kroeker
dfbc62ef7e
Support building only a subset of types
2020-09-22 23:25:59 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Martin Kroeker
357bff06b5
Add BUILD_vartype defines
2020-09-22 23:24:22 +02:00
Martin Kroeker
988a6f429e
Add BUILD_vartype defines
2020-09-22 23:23:33 +02:00
Martin Kroeker
e5e2fbd593
Support building only selected types
2020-09-22 23:21:30 +02:00
Martin Kroeker
3287848c8f
Support building only seleced types
2020-09-22 23:20:51 +02:00
Martin Kroeker
26611af8e1
fix grouping of sources used for more than one type
2020-09-22 23:20:05 +02:00
Martin Kroeker
b886bd672b
add defines for building a subset of types
2020-09-22 23:18:55 +02:00
Martin Kroeker
61fae59298
Merge pull request #88 from xianyi/develop
...
rebase
2020-09-22 23:15:33 +02:00
Martin Kroeker
33d22f99f1
Merge pull request #2851 from martin-frbg/travis-xcode12
...
Add an OSX build with xcode12
2020-09-22 21:44:55 +02:00
Martin Kroeker
5ba01dd1a8
Add an OSX build with xcode12
2020-09-22 17:26:19 +02:00
Qiyu8
14f7dad3b7
performance improved
2020-09-22 16:52:15 +08:00
y00512012
06cf73a239
fix a bug of trmm
2020-09-22 16:47:10 +08:00
Qiyu8
325b539c26
Optimize the performance of daxpy by using universal intrinsics
2020-09-22 10:38:35 +08:00