Martin Kroeker
61659f8765
Merge pull request #1648 from martin-frbg/nofort
...
Handle NOFORTRAN=0
2018-07-01 11:56:40 +02:00
Martin Kroeker
3a8f0a6a1f
Merge pull request #1656 from xianyi/develop
...
Update the 0.3 branch from develop
2018-07-01 11:55:21 +02:00
Martin Kroeker
3d3c19717c
Merge pull request #1655 from martin-frbg/issue1641
...
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
2018-07-01 08:41:22 +02:00
Martin Kroeker
24e344038d
Merge pull request #1654 from martin-frbg/avx512check
...
Add compiler option to avx512 test and hide test output
2018-07-01 01:17:03 +02:00
Martin Kroeker
4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
...
fixes #1641
2018-06-30 23:57:50 +02:00
Martin Kroeker
f5243e8e1f
Add compiler option to avx512 test and hide test output
2018-06-30 23:47:44 +02:00
Martin Kroeker
ba8388cee0
Merge pull request #1651 from martin-frbg/avx512-nodgemm
...
Disable the 16x2 DTRMM kernel on SkylakeX as well
2018-06-30 17:48:03 +02:00
Martin Kroeker
6e54b0a027
Disable the 16x2 DTRMM kernel on SkylakeX as well
2018-06-30 17:31:06 +02:00
Martin Kroeker
40c8cbc3bf
Merge pull request #1650 from martin-frbg/avx512-nodgemm
...
Disable the AVX512 DGEMM kernel for now
2018-06-30 13:05:46 +02:00
Martin Kroeker
d3c9eb4c7d
Merge pull request #1639 from martin-frbg/dyn_list
...
Add DYNAMIC_LIST option for user-defined list of dynamic targets
2018-06-30 13:05:30 +02:00
Martin Kroeker
f0a8dc2eec
Disable the AVX512 DGEMM kernel for now
...
due to #1643
2018-06-30 11:34:48 +02:00
Martin Kroeker
cc92257ea6
Update Makefile
2018-06-27 00:09:21 +02:00
Martin Kroeker
2aba1b1658
Merge branch 'develop' into nofort
2018-06-27 00:07:32 +02:00
Martin Kroeker
8396e9e777
Handle NOFORTRAN=0
2018-06-27 00:00:27 +02:00
Martin Kroeker
bfad307ed7
Merge pull request #1647 from martin-frbg/armv7-dot
...
Remove premature exits from ARMV7 xdot codes
2018-06-26 22:27:30 +02:00
Martin Kroeker
b83e4c60c7
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:46:42 +02:00
Martin Kroeker
e344db269b
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:45:57 +02:00
Martin Kroeker
545b82efd3
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:45:00 +02:00
Martin Kroeker
e322a951fe
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:44:13 +02:00
Martin Kroeker
ff2f171036
Merge pull request #1644 from martin-frbg/revert-filterout
...
Revert changes to NOFORTRAN handling in Makefile
2018-06-26 10:15:15 +02:00
Martin Kroeker
092175cfec
Revert changes to NOFORTRAN handling from 952541e
2018-06-26 08:09:52 +02:00
Martin Kroeker
750162a05f
Try gradual fallback for cores not in the dynamic core list
2018-06-25 21:02:31 +02:00
Martin Kroeker
e6d93f20f1
Merge pull request #2 from martin-frbg/develop
...
merge develop
2018-06-25 20:48:10 +02:00
Martin Kroeker
c38c65eb65
Merge pull request #1 from xianyi/develop
...
Merge xianyi:develop into develop
2018-06-25 20:45:56 +02:00
Martin Kroeker
ce3651516f
Merge pull request #1642 from oon3m0oo/develop
...
Rewrite &= -> = and simplify the initial blocking phase.
2018-06-25 19:23:40 +02:00
Craig Donner
0144068537
Rewrite &= -> = and simplify the initial blocking phase.
2018-06-25 15:08:55 +01:00
Martin Kroeker
1833a67071
Add support for a user-defined list of dynamic targets
2018-06-23 19:42:15 +02:00
Martin Kroeker
0b2b83d9ed
Add support for a user-defined list of dynamic targets
2018-06-23 19:41:32 +02:00
Martin Kroeker
62cf769aa6
Merge pull request #1638 from martin-frbg/issue1637
...
Expose the CBLAS interface to the IxAMIN functions and have make build it
2018-06-23 15:01:02 +02:00
Martin Kroeker
eb71d61c7c
Expose CBLAS interface to BLAS extensions iXamin
2018-06-23 13:31:09 +02:00
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
2018-06-23 13:27:30 +02:00
Martin Kroeker
cc66743b66
Merge pull request #1634 from oon3m0oo/develop
...
Fix data races reported by TSAN.
2018-06-21 21:01:03 +02:00
oon3m0oo
2aa0a5804e
Use BLAS rather than CBLAS in test_fork.c ( #1626 )
...
This is handy for people not using lapack.
2018-06-21 18:47:45 +02:00
Craig Donner
28c28ed275
Fix data races reported by TSAN.
2018-06-21 16:41:02 +01:00
oon3m0oo
a399d00425
Further improvements to memory.c. ( #1625 )
...
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
over-allocating; this saves having external structures do the bookkeeping, and
reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
read than before
2018-06-20 22:04:03 +02:00
Martin Kroeker
f66b9c8826
Merge pull request #1630 from martin-frbg/x86-march
...
Add -march=skylake-avx512 to flags if target is skylake x
2018-06-20 21:51:57 +02:00
Martin Kroeker
2946c46024
Merge pull request #1631 from oon3m0oo/stack
...
Avoid declaring arrays of size 0 when making large stack allocations.
2018-06-20 21:51:38 +02:00
Craig Donner
05978528c3
Avoid declaring arrays of size 0 when making large stack allocations.
2018-06-20 17:03:18 +01:00
Martin Kroeker
ef6f0b645e
Merge pull request #1629 from martin-frbg/issue1628
...
Make gfortran link libomp for clang in the tests; avoid two typical gotchas with NOFORTRAN
2018-06-20 16:41:13 +02:00
Martin Kroeker
0c5b7b400b
Add -march=skylake-avx512 to flags if target is skylake x
2018-06-20 15:16:19 +02:00
Martin Kroeker
952541e840
Need to use filter-out to handle NOFORTRAN not set
2018-06-20 13:20:30 +02:00
Martin Kroeker
9369d3e6e5
Modify NOFORTRAN tests to always check the value; fix rewriting of NO_FORTRAN
2018-06-19 23:28:06 +02:00
Martin Kroeker
10b70c904d
Handle erroneous user settings NOFORTRAN=0 and NO_FORTRAN
2018-06-19 20:53:19 +02:00
Martin Kroeker
6a5ab083b7
Handle special case of gfortran+clang+OpenMP
2018-06-19 20:47:33 +02:00
Martin Kroeker
1f9e4f3193
Handle special case of gfortran+clang+OpenMP
2018-06-19 20:46:36 +02:00
Martin Kroeker
5a6a2bed9a
Merge pull request #1623 from fenrus75/fast-thread
...
Initialize only the required subset of the jobs array, fix barriers and improve switch ratio on SkylakeX and Haswell. For issue #1622
2018-06-18 09:02:40 +02:00
Martin Kroeker
2d8cc7193a
Support upcoming Intel Cannon Lake CPUs as Skylake X ( #1621 )
...
* Support upcoming Cannon Lake as Skylake X
2018-06-17 23:38:14 +02:00
Arjan van de Ven
2ddc96c9e5
make WMB / MB safer on x86-64
...
make it so that
if (foo)
RMB;
else
MB;
is always done correctly and without syntax surprises
2018-06-17 18:06:24 +00:00
Arjan van de Ven
7e39ffe113
On x86-64, make MB/WMB compiler barriers
...
Whie on x86(64) one does not normally need full memory barriers, it's
good practice to at least use compiler barriers for places where on other
architectures memory barriers are used; this prevents the compiler
from over-optimizing.
2018-06-17 17:53:15 +00:00
Arjan van de Ven
73de17664d
Add missing barriers in gemm scheduler
...
a few places in the gemm scheduler code were missing barriers;
the code likely worked OK due to heavy use of volatile / _Atomic
but there's no reason to get this incorrect
2018-06-17 17:50:43 +00:00