OpenBLAS

Commit Graph

Author	SHA1	Message	Date
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Martin Kroeker	302ca7edc7	Merge pull request #4371 from barracuda156/970 cc.cmake: add optflags for G5 and G4 kernels	2023-12-13 14:32:37 +01:00
barracuda156	a8d3619f65	cc.cmake: add optflags for G5 and G4 kernels	2023-12-13 19:42:56 +08:00
Martin Kroeker	aa46f1e4e7	revert addition of MSVC-compatible complex (moved to lapacke_config.h)	2023-12-12 23:07:48 +01:00
Martin Kroeker	dcdc351272	Add MSVC-compatible complex types	2023-12-12 23:06:22 +01:00
Martin Kroeker	55a0718f72	Merge pull request #4369 from ChipKerchner/power10Copies Replace two vector loads with one vector pair load.	2023-12-12 18:49:21 +01:00
Chip-Kerchner	93747fb377	Merge remote-tracking branch 'origin/develop' into power10Copies	2023-12-12 09:32:49 -06:00
Martin Kroeker	dcf6999c4e	remove extraneous endif	2023-12-12 11:27:17 +01:00
Martin Kroeker	330101e0b3	Add complex type definitions for MSVC	2023-12-11 21:52:00 +01:00
Martin Kroeker	d9f1478068	Merge pull request #4367 from barracuda156/unbreak_powerpc Fix arch detection with CMake build for PowerPC	2023-12-11 21:38:32 +01:00
barracuda156	9dbc8129b3	cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case	2023-12-11 21:09:06 +08:00
barracuda156	c732f275a2	system_check.cmake: fix arch detection for Darwin PowerPC	2023-12-11 21:05:31 +08:00
Martin Kroeker	e60fb0f397	Merge pull request #4359 from mseminatore/win_perf Improve Windows threading performance scaling	2023-12-09 23:40:26 +01:00
Mark Seminatore	efa9515a23	Merge branch 'OpenMathLib:develop' into win_perf	2023-12-09 10:09:49 -08:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00
Martin Kroeker	1332f8a822	Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy Set tail policy to undisturbed for RVV intrinsics accumulators	2023-12-08 10:25:41 +01:00
Mark Seminatore	edac80d7e8	some cleanup, dynamically scale threads, add missing WIN_CASE defn	2023-12-07 14:59:27 -08:00
Martin Kroeker	2d316c2920	Merge pull request #4125 from OMaghiarIMG/risc-v Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels	2023-12-07 14:50:58 +01:00
Martin Kroeker	5b09833b1c	Merge pull request #4019 from uniontech-lilinjie/develop fix typo	2023-12-07 14:46:17 +01:00
Martin Kroeker	3193aa9c7e	Merge pull request #4362 from yinshiyou/la-dev Add 15 level1 optimizations for LoongArch.	2023-12-07 09:15:15 +01:00
yancheng	d32f38fb37	loongarch64: Add optimizations for nrm2.	2023-12-07 14:36:26 +08:00
yancheng	f9b468990e	loongarch64: Add optimizations for rot.	2023-12-07 14:36:26 +08:00
yancheng	c80e7e27d1	loongarch64: Add optimizations for sum and asum.	2023-12-07 14:36:26 +08:00
yancheng	d4c96a35a8	loongarch64: Add optimizations for axpy and axpby.	2023-12-07 14:36:26 +08:00
yancheng	360acc0a41	loongarch64: Add optimizations for swap.	2023-12-07 14:36:26 +08:00
yancheng	174c25766b	loongarch64: Add optimizations for copy.	2023-12-07 14:36:26 +08:00
yancheng	49829b2b7d	loongarch64: Add optimizations for iamin.	2023-12-07 14:36:07 +08:00
yancheng	be83f5e4e0	loongarch64: Add optimizations for iamax.	2023-12-07 14:36:07 +08:00
yancheng	e3fb2b5afa	loongarch64: Add optimizations for imin.	2023-12-07 14:36:07 +08:00
yancheng	e46b48e372	loongarch64: Add optimizations for imax.	2023-12-07 14:36:07 +08:00
yancheng	702fc1d56d	loongarch64: Add optimization for min.	2023-12-07 14:36:07 +08:00
yancheng	346b384d1c	loongarch64: Add optimization for max.	2023-12-07 14:36:07 +08:00
yancheng	ff2ecc6cda	loongarch64: Add optimization for amin.	2023-12-07 14:36:07 +08:00
yancheng	265b5f2e80	loongarch64: Add optimizations for amax.	2023-12-07 14:36:07 +08:00
yancheng	993ede7c70	loongarch64: Add optimizations for scal.	2023-12-07 14:36:07 +08:00
Mark Seminatore	4ebf814b42	fix bug failing to mark task as finished.	2023-12-05 23:28:37 -08:00
Mark Seminatore	5f51811728	try at new threading model	2023-12-05 22:43:36 -08:00
Martin Kroeker	a8cb611157	Merge pull request #4358 from martin-frbg/lapack954 Fix keyword used to count successful tests (Reference-LAPACK PR 954)	2023-12-05 22:20:15 +01:00
Martin Kroeker	589f2b6466	Fix search phrase used to count successful tests (Reference-LAPACK PR 954)	2023-12-05 20:10:20 +01:00
Martin Kroeker	6aa5f53e26	Merge pull request #4357 from martin-frbg/lapack953 Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)	2023-12-05 20:03:21 +01:00
Martin Kroeker	effb7af2a2	Fix memory leak (Reference-LAPACK PR 953)	2023-12-05 17:55:38 +01:00
Martin Kroeker	5915a69734	Merge pull request #4356 from martin-frbg/lapack736-2 Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736	2023-12-05 17:48:42 +01:00
Martin Kroeker	226a14c549	Restore library path adjustments	2023-12-05 15:50:06 +01:00
Martin Kroeker	c5fa318add	Add tests for DMD (Reference-LAPACK PR 736)	2023-12-05 15:45:59 +01:00
Martin Kroeker	fa03e5497a	Add tests for the DMD functions (Reference-LAPACK PR 736)	2023-12-05 15:43:28 +01:00
Martin Kroeker	a53a79e059	Add tests for the DMD functions (Reference-LAPACK PR 736)	2023-12-05 15:41:39 +01:00
Martin Kroeker	e3039fa7f6	Merge pull request #4351 from catap/cmake-old-macos Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin	2023-12-05 14:40:18 +01:00
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	2023-12-04 11:13:35 +00:00
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Erik Bråthen Solem	2381132ada	Darwin < 20: always write xerbla.c.o into archive Write xerbla.c.o into archive regardless of timestamp by using ar -rs instead of ar -ru.	2023-12-03 19:13:56 +01:00

... 4 5 6 7 8 ...

7962 Commits All Branches Search

7962 Commits

All Branches