This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that. |
||
|---|---|---|
| .. | ||
| level2 | ||
| level3 | ||
| mapper | ||
| others | ||