The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:
First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread)
Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).
And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well