OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	0cc36770f1	Merge pull request #3073 from xoviat/embedded add embedded option	2021-01-31 18:02:41 +01:00
Martin Kroeker	eea0c0f2ed	Merge pull request #3085 from alexhenrie/memory_alloc Fix null pointer check in blas_memory_alloc	2021-01-26 20:11:42 +01:00
Martin Kroeker	0cb9e9fc8d	Remove the VORTEX support bits again for now	2021-01-25 19:02:21 +01:00
Alex Henrie	113840da12	Fix null pointer check in blas_memory_alloc	2021-01-24 22:20:44 -07:00
Martin Kroeker	deb2e66bcc	Add DYNAMIC_LIST support for ARM64	2021-01-24 23:18:52 +01:00
xoviat	2e8d6e8690	add functions for embedded	2021-01-23 22:12:17 -06:00
Martin Kroeker	b94dab5250	patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel	2021-01-20 21:34:36 +01:00
Martin Kroeker	63fa3c3f8f	Require gcc 11 for builtin_cpu_is(power10) fixes #3074	2021-01-20 15:41:04 +01:00
xoviat	b60de4447a	add cortex-m platform	2021-01-19 08:57:44 -06:00
Martin Kroeker	2c445be8ba	Merge pull request #3051 from martin-frbg/rocketlake Add CPUID information for Intel Rocket Lake	2021-01-14 15:56:25 +01:00
Martin Kroeker	6fe0f1fab9	Label get_cpu_ftr as volatile to keep gcc from rearranging the code	2021-01-11 19:05:29 +01:00
Martin Kroeker	17c16f2a71	Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers	2020-12-19 23:21:22 +01:00
Martin Kroeker	865676682d	Add Intel Rocket Lake	2020-12-14 22:40:23 +01:00
Martin Kroeker	6232237dba	Make fallback from P10 to P9 conditional on suitable compiler	2020-12-11 23:41:17 +01:00
Martin Kroeker	18d8a67485	Merge pull request #2994 from antonblanchard/power10-fixes Power10 fixes	2020-12-11 23:37:30 +01:00
Martin Kroeker	83de62c20d	Merge pull request #3026 from martin-frbg/revert747 Revert PR747 - SYRK parameter changes for Haswell and related targets	2020-12-10 16:29:41 +01:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	a554712439	remove extra/intermediate size step for min_jj introduced in PR747	2020-12-08 21:01:36 +01:00
Martin Kroeker	5d26223f4a	remove extra/intermediate size step of min_jj from PR747	2020-12-08 20:59:56 +01:00
Martin Kroeker	bc5b1ddf0d	Merge pull request #3004 from martin-frbg/bsd_getauxval ARM64 DYNAMIC_ARCH build fix for BSD/OSX	2020-11-23 08:35:12 +01:00
Martin Kroeker	e7bf8ced6c	Build fix for systems that do not support getauxval	2020-11-22 20:20:28 +01:00
Martin Kroeker	5fa305172a	Use ifeq instead of ifdef for user-definable options	2020-11-22 16:29:56 +01:00
Martin Kroeker	d3ff1f889f	Convert ifndefs to ifneq	2020-11-22 16:27:17 +01:00
Alexander Grund	60005eb47b	Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may be called by num_cpu_avail in case the OpenBLAS NUM_THREADS differ from the OMP num threads. This leads to a memory leak which can cause subsequent execution of BLAS kernels to fail. Fixes #2993	2020-11-19 14:51:51 +01:00
Anton Blanchard	043f3d6faa	POWER10: Use POWER9 as a fallback If the toolchain is too old, or the mma features isn't set on a POWER10 fall back to the POWER9 loops.	2020-11-19 21:04:10 +11:00
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-08 22:43:00 +01:00
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-05 15:12:36 +08:00
Martin Kroeker	aa21cb5217	Merge pull request #2960 from thrasibule/avx2_detection fix avx2 detection	2020-10-31 20:24:21 +01:00
Guillaume Horel	1f564d729b	fix avx2 detection reword commits to make it clearer	2020-10-31 10:00:48 -04:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	2207a16235	Merge pull request #2952 from martin-frbg/issue2931 Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails	2020-10-28 09:37:32 +01:00
Martin Kroeker	b937d78a6d	Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails	2020-10-27 17:51:32 +01:00
Martin Kroeker	fd7da56965	Move definitions that are neither needed nor supported on SUNOS	2020-10-25 12:01:50 +01:00
Martin Kroeker	ff65952e46	Move HAVE_P10_SUPPORT to the build system to be able to include a binutils version check	2020-10-20 00:55:41 +02:00
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	006c7f6671	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:06:06 +02:00
Martin Kroeker	85154c2e18	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:05:05 +02:00
Martin Kroeker	887e00fd7f	Adapt for supporting only a subset of variable types	2020-10-11 14:58:57 +02:00
Martin Kroeker	886a8e3190	Adapt for supporting only a subset of variable types	2020-10-11 14:57:32 +02:00
Martin Kroeker	ac653c94f3	Merge branch 'develop' into issue2588-cmake	2020-10-11 13:57:07 +02:00
Martin Kroeker	f032d8966e	Merge pull request #2874 from Flamefire/memory_fixes Avoid out of bounds access on invalid memory free	2020-10-04 15:16:51 +02:00
Martin Kroeker	f6e4cf2f9d	Merge pull request #2876 from Flamefire/omp_fork_fix Lazyly reinit threads after a fork in OMP mode	2020-10-03 22:52:17 +02:00
User User-User	d2333e7842	aarch64 fix std=c18 compilation	2020-10-03 18:00:34 +03:00
Alexander Grund	3094fc6c83	Lazyly reinit threads after a fork in OMP mode This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that.	2020-10-01 15:41:42 +02:00
Alexander Grund	3c05f54df8	Avoid out of bounds access on invalid memory free	2020-10-01 10:48:45 +02:00
Alexander Grund	dee7c49938	Fix TABs and trailing space	2020-10-01 10:43:16 +02:00
Martin Kroeker	896bbd55e1	Add support for building only selected variable types	2020-09-26 23:25:55 +02:00
Martin Kroeker	357bff06b5	Add BUILD_vartype defines	2020-09-22 23:24:22 +02:00
Martin Kroeker	988a6f429e	Add BUILD_vartype defines	2020-09-22 23:23:33 +02:00
Martin Kroeker	e5e2fbd593	Support building only selected types	2020-09-22 23:21:30 +02:00
Martin Kroeker	3287848c8f	Support building only seleced types	2020-09-22 23:20:51 +02:00
y00512012	06cf73a239	fix a bug of trmm	2020-09-22 16:47:10 +08:00
Martin Kroeker	ddec244a5a	Merge pull request #2838 from austinpagan/gordon_trmm Adding performance patch for trmm, just like trsm (#2836)	2020-09-15 21:17:48 +02:00
fossum	dfeca46098	Adding performance patch for trmm, just like #2836	2020-09-15 08:59:50 -05:00
fossum	274d6e015b	Fixing a performance bug in trsm_[LR].c.	2020-09-14 13:10:48 -05:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Marius Hillenbrand	a55fe06f25	s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Enable building DYNAMIC_ARCH support with older versions of glibc that do not know about the hwcap flag HWCAP_S390_VXE yet. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	4f34bcfb5e	s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code ... instead of duplicating the (old) mechanism from the Makefile that aimed to derive supported architecture generations from the gcc version. To enable builds with DYNAMIC_ARCH with older compiler releases, the Makefile and drivers/other/dynamic_arch.c need a common view of the architecture support built into the library. We follow the notation from x86 when used with DYNAMIC_LIST, where defines DYN_<ARCH NAME> denote support for a given generation to be built in. Since there are far fewer architecture generations in OpenBLAS for s390x, that does not bloat command lines too much. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Martin Kroeker	330044d821	Fix potentiol domain error in sqrt	2020-09-05 09:44:33 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Chen, Guobing	0c1c903f1e	Fix OMP num specify issue In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-08-24 02:45:54 +08:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	60cd5e55fc	Protect against inadvertent activation of USE_CUDA	2020-08-01 12:31:39 +02:00
Martin Kroeker	7c02f4b1f7	Merge pull request #2744 from martin-frbg/issue2738 Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3	2020-07-28 19:32:04 +02:00
Martin Kroeker	12918358aa	Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2 also support AMD family 22 Jaguar/Puma as Bobcat	2020-07-28 13:53:17 +00:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	ce45af8151	Update conditional for atomics to use HAVE_C11	2020-07-18 17:09:56 +00:00
Martin Kroeker	6f38de06d2	Update conditional for atomics to use HAVE_C11	2020-07-18 17:09:01 +00:00
Martin Kroeker	09eb9d2584	Update conditional for atomics to HAVE_C11	2020-07-18 17:07:38 +00:00
Martin Kroeker	791e046744	Update conditional for atomics to use HAVE_C11	2020-07-18 17:05:59 +00:00
Martin Kroeker	94bab9d1f9	Update conditional for atomics to use HAVE_C11	2020-07-18 17:03:31 +00:00
Rajalakshmi Srinivasaraghavan	af1e140e35	Change minimum gcc version for POWER10 As the MMA patches for POWER10 are backported to gcc10.2, changing the minimum gcc version needed to build OpenBLAS for POWER10.	2020-07-09 21:46:06 -05:00
Rajalakshmi Srinivasaraghavan	45d819ca82	Changing mcpu option as power10 As compiler enabled mcpu option as power10, changing it from future.	2020-07-07 11:25:20 -05:00
Martin Kroeker	584ef8d4ae	Add support for Comet Lake H & S	2020-06-27 14:36:37 +02:00
Matthew Treinish	f37e941d52	Add support to driver/others/dynamic.c too	2020-06-25 11:56:49 -04:00
User User-User	e6b9275034	address vs2019 C4293	2020-06-24 09:12:23 +03:00
Martin Kroeker	6eaeb01263	Merge pull request #2658 from RajalakshmiSR/p10 powerpc: Add support for future processor	2020-06-23 00:02:37 +02:00
Martin Kroeker	007d9f97d7	Make gotoblas_corename report the name of the selected TARGET rather than its aliases	2020-06-13 19:25:28 +02:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Marius Hillenbrand	0dbe61a612	s390x: choose SIMD kernels at run-time based on OS and compiler support Extend and simplify the run-time detection for dynamic architecture support for z to check HW_CAP and only use SIMD features if advertised by the OS. While at it, also honor the env variable LD_HWCAP_MASK and do not use the CPU features masked there. Note that we can only use the SIMD features on z13 or newer (i.e., Vector Facility or Vector-Enhancements Facilities) when the operating system supports properly context-switching the vector registers. The OS advertises that support as a bit in the HW_CAP value in the auxiliary vector. While all recent Linux kernels have that support, we should maintain compatibility with older versions that may still be in use. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 11:01:16 +02:00
Marius Hillenbrand	8c338616f9	s390x: gate dynamic arch detection on gcc version and add generic When building OpenBLAS with DYNAMIC_ARCH=1 on s390x (aka zarch), make sure to include support for systems without the facilities introduced with z13 (i.e., zarch_generic). Adjust runtime detection to fallback to that generic code when running on a unknown platform other than Z13 through Z15. When detecting a Z13 or newer system, add a check for gcc support for the architecture-specific features before selecting the respective kernel. Fallback to Z13 or generic code, in case. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 11:01:16 +02:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Martin Kroeker	f4248af26e	Fix compiler warnings	2020-04-28 10:43:12 +02:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	f41600e66f	Add a read barrier in the traversing of the buffer list Needed on systems with weak memory ordering - the inferior, partially working fix from #2544 was already removed in #2551	2020-04-13 12:34:02 +02:00
Martin Kroeker	2a28448a96	Add safeguards for sufficient BUFFER_SIZE	2020-04-12 19:45:36 +02:00
Sharvil Nanavati	7b4773b24d	Add API to set thread affinity on Linux. Issue: #2545	2020-04-08 12:49:35 -07:00
Martin Kroeker	69f277f8ee	Add another memory barrier for ARM and a multicore test run on ThunderX to help detect such issues (#2544 ) * Add another memory barrier in memory.c to prevent races in memory slot allocation * Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores	2020-04-08 11:04:51 +02:00
Martin Kroeker	806f89166e	Make ARMV7 compile with xcode and add a CI job for it (#2537 ) * Add an ARMV7 iOS build on Travis * thread_local appears to be unavailable on ARMV7 iOS * Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH * Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler	2020-04-02 10:30:37 +02:00
Martin Kroeker	ad9e53154d	Merge pull request #2484 from RajalakshmiSR/power-dynamic Fix DYNAMIC_ARCH build for POWER9	2020-03-04 08:06:06 +01:00
Martin Kroeker	e6edb7431f	Merge pull request #2466 from AGSaidi/acq-rel-1 Switch blas_server to use acq/rel semantics	2020-03-04 07:59:31 +01:00
Martin Kroeker	d68e4ba59b	Fix cut/paste glitch	2020-03-03 21:37:48 +01:00
Martin Kroeker	635c9e4e09	Restore initializers for mutex and conditional	2020-03-03 21:04:12 +01:00
Rajalakshmi Srinivasaraghavan	2afc074803	Fix DYNAMIC_ARCH build for POWER9 Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some compiler version checks. This patch fixes some of the macros that are used to check compiler version. On fixing those checks, there are some new make failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9. This patch fixes those failures as well.	2020-03-03 12:35:10 -06:00
Ali Saidi	43c2e845ab	Switch blas_server to use acq/rel semantics Heavy-weight locking isn't required to pass the work queue pointer between threads and simple atomic acquire/release semantics can be used instead. This is especially important as pthread_mutex_lock() isn't fair. We've observed substantial variation in runtime because of the the unfairness of these locks which complety goes away with this implementation. The locks themselves are left to provide a portable way for idling threads to sleep/wakeup after many unsuccessful iterations waiting.	2020-03-02 02:52:49 +00:00
Martin Kroeker	2e6963259b	Merge pull request #2471 from AGSaidi/l3-fix-2 Fix barriers in level3_thread	2020-03-01 19:41:07 +01:00
Ali Saidi	97ce6bbce2	Fix barriers in level3_thread	2020-02-29 17:45:17 +00:00
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00
Martin Kroeker	4c5fac5a2b	Typo fix	2020-02-24 20:15:04 +01:00
Martin Kroeker	9b732696c6	Add DYNAMIC_ARCH support for ARMV8 EMAG8180	2020-02-24 19:20:00 +01:00
Martin Kroeker	cb6ef49857	Merge pull request #2407 from susilehtola/patch-2 Patch out instances of Z15 in dynamic_zarch.c	2020-02-11 13:04:44 +01:00
Susi Lehtola	5a6bba3061	Patch out instances of Z15 in dynamic_zarch.c There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.	2020-02-11 15:07:33 +13:00
Susi Lehtola	dff173e50e	Fix typo in dynamic_zarch.c	2020-02-11 14:46:30 +13:00
wjc404	2f96a2c55b	Update trmm_R.c	2020-02-05 10:15:02 +08:00
wjc404	833bd0f8ff	Update trmm_L.c	2020-02-05 10:09:41 +08:00
wjc404	77b8f49556	Update level3_thread.c	2020-02-04 20:33:08 +08:00
wjc404	1c3e20ce48	Update level3.c	2020-02-04 20:30:23 +08:00
Martin Kroeker	1f62a82789	Merge pull request #2376 from wjc404/develop Fix remaining bugs in parallel GEMM3M	2020-01-23 21:50:19 +01:00
wjc404	e9fb8f62b1	Update level3_gemm3m_thread.c	2020-01-22 17:40:03 +00:00
Martin Kroeker	78100b8093	Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT as suggested by hjmndv in #2370	2020-01-18 15:06:39 +01:00
int_13h	96ad579428	add in runtime cpu detection for zarch (#2349 ) add in runtime cpu detection for zarch	2019-12-31 18:03:27 +01:00
wjc404	4c35b8dbaa	Update gemm3m_level3.c	2019-12-27 18:03:01 +08:00
Jehan	13226e3101	driver: more reasonable thread wait timeout on Windows. It used to be 5ms, which might not be long enough in some cases for the thread to exit well, but then when set to 5000 (5s), it would slow down any program depending on OpenBlas. Let's just set it to 50ms, which is at least 10 times longer than originally, but still reasonable in case of failed thread termination.	2019-12-13 09:52:33 +01:00
Martin Kroeker	a4896b5538	Update DYNAMIC_ARCH support for ARM64 and PPC (#2332 ) * Update DYNAMIC_ARCH list of ARM64 targets for gmake * Update arm64 cpu list for runtime detection * Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets	2019-12-04 11:06:03 +01:00
Martin Kroeker	3518617f5b	Add Intel Goldmont+ cpuid was originally in #2228 but that PR had misplaced the file in the toplevel directory	2019-12-03 08:32:29 +01:00
Martin Kroeker	a4c3668f99	Merge pull request #2321 from martin-frbg/issue2319 Fix race conditions in multithreaded GEMM3M	2019-11-28 09:30:24 +01:00
Martin Kroeker	f95989cbc1	Fix AVX512 capability test (always returning zero) from #2322	2019-11-23 22:38:07 +01:00
Martin Kroeker	f3065a0eed	Fix race conditions in multithreaded GEMM3M by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago	2019-11-23 19:54:56 +01:00
Jehan	1f6071590d	Fix usage of TerminateThread() causing critical section corruption. This patch was submitted to the GIMP project by a publisher wishing to keep confidentiality (hence anonymously). I just pass along the patch. Here is the patch explanation which came with: First they remind us what Microsoft documentation says about TerminateThread: > TerminateThread is a dangerous function that should only be used in > the most extreme cases. You should call TerminateThread only if you > know exactly what the target thread is doing, and you control all of > the code that the target thread could possibly be running at the time > of the termination. (https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread) Then they say that 5 milliseconds time-out might not be long enough for the thread to exit gracefully. They propose to set it to a much higher value (for instance here 5 seconds). And finally you should always check the return value of WaitForSingleObject(). In particular you want to run TerminateThread() only if WaitForSingleObject() failed, not on success case.	2019-11-20 13:00:49 +01:00
Martin Kroeker	82b75f97e5	Disable the old QCDOC qalloc by default and copy utility functions from memory.c 1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default. 2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.	2019-11-17 19:22:04 +01:00
Martin Kroeker	1b90989662	Add NetBSD to the xBSD conditionals	2019-10-25 12:52:49 +02:00
Martin Kroeker	5f6206fa2d	Simplify OSX/IOS cross-compilation and add a CI test for it (#2279 ) * Add automatic fixups for OSX/IOS cross-compilation * Add OSX/IOS cross-compilation test to Travis CI * Handle platforms that lack hwcap.h by falling back to ARMV8 * Fix PROLOGUE for OSX/IOS	2019-10-08 20:13:14 +02:00
Martin Kroeker	8617d75548	Revert "Avoid taking root of negative number in symv_thread.c"	2019-10-01 23:50:41 +02:00
Sebastian Berg	6355c25dde	Avoid taking root of negative number in symv_thread.c This is similar to fixes in gh-1929, but there was one remaining occurance of this type of pattern in the driver/level2/*_thread.c files.	2019-09-29 22:03:12 -07:00
Martin Kroeker	673e5a0495	Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263 ) * Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0 * Use gcc-generated assembly instead of original C sources to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3 * Use gcc-generated assembly instead of the original C source to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3 * Add gcc7-generated assembler version of caxpy for power8 to work around wrong code generated by gcc 8.3 * Handle CONJ define for caxpyc * Handle CONJ define for caxpyc * Add gcc7-generated assembly cdot for POWER9 * Use prebuilt assembly for POWER9 cdot created with gcc 7.3.1 to work around ICE in older gcc versions * Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6 * Update Makefile.system * Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH * Disable POWER9 with old gcc versions	2019-09-22 22:35:22 +02:00
Andrew	4de545aa7d	address minor warnings from gcc7	2019-09-07 10:21:08 +03:00
Martin Kroeker	bf1430f7d7	Merge pull request #2208 from martin-frbg/munmap-debug Provide more information on mmap/munmap failure	2019-08-09 07:55:35 +02:00
Martin Kroeker	1776ad82c0	Add files via upload	2019-08-09 00:08:11 +02:00
Martin Kroeker	4e2f81cfa1	Provide more information on mmap/munmap failure for #2207	2019-08-08 23:15:35 +02:00
Martin Kroeker	3d36c45116	Add CPUID identification of Intel Ice Lake	2019-08-01 22:52:35 +02:00
Martin Kroeker	21d05a4835	Merge pull request #2140 from martin-frbg/pgi19 Do not try ancient PGI hacks with recent versions of that compiler	2019-05-26 12:39:20 +02:00
Martin Kroeker	1778fd4219	Do not try ancient PGI hacks with recent versions of that compiler should fix #2139	2019-05-22 13:48:27 +02:00
Martin Kroeker	86dda5c2fa	Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds	2019-05-15 23:21:20 +02:00
Martin Kroeker	5cabda79d0	Merge pull request #2117 from martin-frbg/issue2114 Fix errors in cpu affinity setup with glibc 2.6	2019-05-07 18:18:16 +02:00
Martin Kroeker	a6a8cc2b7f	Fix errors in cpu enumeration with glibc 2.6 for #2114	2019-05-07 13:34:52 +02:00
Martin Kroeker	a387a23518	Merge pull request #2101 from luzpaz/misc-typos Misc. typo fixes in comments and documentation	2019-05-04 22:28:29 +02:00
Martin Kroeker	b43c8382c8	Correct argument of CPU_ISSET for glibc <2.5 fixes #2104	2019-05-01 10:46:46 +02:00
luz.paz	daf2fec12d	Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`	2019-04-29 17:03:56 -04:00
Jeff Baylor	40e53e52d6	snprintf define consolidated to common.h	2019-04-22 17:01:34 -07:00
Rashmica Gupta	bcdf1d4917	Add in runtime CPU detection for POWER.	2019-04-09 14:20:16 +10:00
Erik M. Bray	8ba9e2a61a	Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.	2019-03-19 11:21:44 +01:00
Erik M. Bray	4ad694eda1	Fix for #2063 : The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1.	2019-03-19 09:26:50 +01:00
Martin Kroeker	3ce28fb81a	Merge pull request #2055 from martin-frbg/atomid Add CPUID data for Intel Denverton (as Nehalem)	2019-03-12 22:57:07 +01:00
Martin Kroeker	04f2226ea6	Add Intel Denverton	2019-03-12 16:09:55 +01:00
Martin Kroeker	4741ce803b	Merge pull request #2045 from martin-frbg/2033-3 Do not compile in AVX512 check if AVX support is disabled	2019-03-06 22:40:26 +01:00
Martin Kroeker	11cfd0bd75	Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway	2019-03-05 16:04:25 +01:00
Martin Kroeker	d7b2c53c0b	Merge pull request #2039 from brada4/meminit Address warning in memory.c	2019-03-05 12:11:15 +01:00
Martin Kroeker	10d841d8b9	Merge pull request #2026 from martin-frbg/trmv_threads Correct range limiting in trmv_thread and re-enable TRMV multithreading	2019-03-04 15:08:31 +01:00
Martin Kroeker	6c83b878f6	Merge pull request #2040 from martin-frbg/locks2002 Restore locking optimizations for OpenMP case	2019-03-04 15:07:14 +01:00
Martin Kroeker	af480b02a4	Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461	2019-03-03 14:17:07 +01:00

1 2 3 4 5 ...

656 Commits