This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that. |
||
---|---|---|
.. | ||
level2 | ||
level3 | ||
mapper | ||
others |