The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this improves performance by 20x on sgesv.goto.
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.