Martin Kroeker
1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri
2021-06-26 23:47:41 +02:00
Martin Kroeker
baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
...
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker
7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
...
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker
f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-29 22:28:00 +02:00
Martin Kroeker
734bd265a8
revert symv changes for now
2021-05-29 15:40:03 +02:00
Martin Kroeker
1217eb910d
Fix copy-paste errors in variables used
2021-05-28 09:38:48 +02:00
Martin Kroeker
d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-27 22:39:18 +02:00
Martin Kroeker
f0e7345fb8
Add shortcut for small-size gemv_n with increments of one
2021-05-26 22:02:34 +02:00
Martin Kroeker
03297ff9f0
Add fast path for small xSYR with INCX==1
2021-05-22 20:41:18 +02:00
Gordon Fossum
8b599836db
Add error message token for SBGEMM in gemm.c
2021-05-04 13:55:02 -05:00
Martin Kroeker
904b221f03
Add cast to prevent overflow of intermediate result
2021-05-01 14:47:22 +02:00
Martin Kroeker
c5fb91f1bc
Fix division by zero in the non-x86 codepath
2021-04-29 09:47:18 +02:00
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
2021-02-24 14:07:20 +01:00
Martin Kroeker
bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 16:46:25 +01:00
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d
Use ifeq instead of ifdef for user-definable build options
2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9
Fix typos in currently unused sections
2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d
Allow supporting only a subset of variable types
2020-10-11 14:53:26 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
2020-03-26 21:12:56 +01:00
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-22 14:33:16 +01:00
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
...
fix error messages
2020-01-18 20:39:04 +01:00
Guillaume Horel
2463938879
fix error message
2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c
more bugfix
2019-09-10 17:30:57 -04:00
Guillaume Horel
459bb9291d
fix error codes
2019-09-10 17:10:33 -04:00
Guillaume Horel
5997b6b491
bugfix
2019-09-08 11:14:49 -04:00
Guillaume Horel
7ec7b999a5
add missing file
2019-09-08 11:14:49 -04:00
Guillaume Horel
af9ac0898a
fix Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6
update Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
ea747cf933
start working on ?trtrs
2019-09-08 11:14:49 -04:00
luz.paz
daf2fec12d
Misc. typo fixes
...
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker
268c28db7d
Merge pull request #2095 from martin-frbg/trsm
...
Correct length of name string in xerbla call
2019-04-28 09:55:25 +02:00
Martin Kroeker
0bd956fd21
Correct length of name string in xerbla call
2019-04-27 22:49:04 +02:00
Martin Kroeker
79cfc24a62
Add interface for ?sum (derived from ?asum)
2019-03-30 21:59:18 +01:00
Martin Kroeker
c19a449096
Merge pull request #2071 from martin-frbg/issue2068
...
Provide CBLAS interfaces to I?MIN and I?MAX
2019-03-30 14:54:28 +01:00
Martin Kroeker
3d1e36d4cb
Build CBLAS interfaces for I?MIN and I?MAX
2019-03-30 12:38:41 +01:00
Martin Kroeker
e29b0cfcc4
Allow multithreading TRMV again
...
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388 )
2019-02-19 21:03:30 +01:00
Martin Kroeker
8533aca964
Avoid penalizing tall skinny matrices
2019-01-23 10:03:00 +01:00
Martin Kroeker
cda81cfae0
Shift transition to multithreading towards larger matrix sizes
...
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
2019-01-19 00:10:01 +01:00
Arjan van de Ven
cdc668d82b
Add a "sgemm direct" mode for small matrixes
...
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.
This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.
But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.
This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.
What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
Martin Kroeker
5393759a98
Merge pull request #1869 from martin-frbg/axpy0
...
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-25 20:52:49 +01:00
Martin Kroeker
c171b8ad13
Handle special case INCX=0,INCY=0 in the axpy interface
2018-11-13 13:57:18 +01:00
Martin Kroeker
96d2f2c9b2
Merge pull request #1831 from brada4/hemv
...
disable threading in C/ZSWAP copying from S/DSWAP
2018-11-07 08:49:21 +01:00
Andrew
2992e3886a
disable threading in C/ZSWAP copying from S/DSWAP
2018-10-22 23:21:49 +03:00
Martin Kroeker
e3c262e5cf
Merge pull request #1825 from brada4/hemv
...
Delay _hemv threading in attempt to address #1820
2018-10-21 20:34:05 +02:00
Andrew
a293bdcd5e
re-arrange new code for readability
2018-10-20 21:37:53 +03:00
Andrew
c7bbf9c987
Attempt to tame _hemv threading #1820
2018-10-20 11:13:29 +03:00
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
...
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
f3c262156e
Add an explicit cast to silence a warning
...
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8
Add explicit cast to silence a warning
...
for #1710
2018-09-13 14:23:31 +02:00
Martin Kroeker
4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
...
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
165f00c159
fabs -> fabsl
2018-08-04 20:14:51 +02:00
Martin Kroeker
933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64
2018-08-04 20:06:49 +02:00
Steven G. Johnson
a4e321400b
fabs -> fabsl
...
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
2018-08-03 13:00:10 -04:00
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
2018-06-23 13:27:30 +02:00
Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
...
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Craig Donner
66316b9f4c
Improve performance of GEMM for small matrices when SMP is defined.
...
Always checking num_cpu_avail() regardless of whether threading will actually
be used adds noticeable overhead for small matrices. Most other uses of
num_cpu_avail() do so only if threading will be used, so do the same here.
2018-06-07 15:29:13 +01:00
Martin Kroeker
e8880c1699
Use a single thread for small input size
...
copies daxpy improvement from #27 , see #1560
2018-06-07 10:26:55 +02:00
Martin Kroeker
1d27fa8507
Merge pull request #1539 from martin-frbg/ztrmv-1332
...
Disable multithreading in ztrmv
2018-04-27 23:10:21 +02:00
Martin Kroeker
a8ed428bab
Disable multithreading in ztrmv
...
BLAS-Tester shows that the same problem exists as with DTRMV (issue #1332 )
2018-04-25 22:35:46 +02:00
Martin Kroeker
809fd0d451
Rewrite ROTMG to address cases not covered by the netlib algorithm ( #1480 )
...
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
2018-03-04 17:39:56 +01:00
Martin Kroeker
72f14a0363
Fix conditionals in the rescaling against GAMSQ
2018-02-18 12:54:52 +01:00
Martin Kroeker
798f1595d5
Fix condition in both second scaling loops
2018-02-18 12:37:09 +01:00
Martin Kroeker
0464aa6784
Remove debug printfs
2018-02-09 23:06:50 +01:00
Martin Kroeker
55840f0bc9
Keep the flag handling separate from the scaling loops
...
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
2018-02-09 23:00:03 +01:00
Andrew
47deec2c1a
fix couple of dead assignment warnings
2017-12-22 00:56:35 +01:00
Martin Kroeker
38763ec4f3
Disable multithreading for trmv
...
as a (hopefully temporary) workaround for #1332
2017-12-03 22:40:54 +01:00
Martin Kroeker
9251a2efde
Merge pull request #1359 from brada4/develop
...
Eliminate mode variable where not needed in syrk interface
2017-11-18 23:47:17 +01:00
Martin Kroeker
b46e2b57cc
Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well
2017-11-18 20:28:02 +01:00
Martin Kroeker
3ce401f51b
Make last parameter of cblas_Xdotc_sub/cblas_Xdotu_sub a void pointer as well
2017-11-18 18:58:40 +01:00
Andrew
27575d200a
Eliminate mode variable where not needed
2017-11-15 15:32:38 +01:00
Martin Kroeker
2c222f1faa
Modify complex CBLAS functions to take void pointers
...
Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)
2017-11-05 15:53:14 +01:00
Martin Kroeker
742f54c235
Merge pull request #1303 from martin-frbg/imatcopy-rowscols
...
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
2017-09-14 21:46:26 +02:00
Martin Kroeker
d674fbb4c7
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
...
Equivalent of #1244 (issue #899 ) for the non-complex cases. Fixes #1289
2017-09-14 19:59:05 +02:00
Martin Kroeker
46c9357c72
Merge pull request #1288 from quickwritereader/develop
...
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Abdurrauf
1cfdb2295d
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
2017-09-06 16:41:08 +04:00
Martin Kroeker
00740c0e34
Merge pull request #1290 from martin-frbg/imatcopy
...
Use in-place transform shortcut only if matrix is square
2017-09-03 13:02:10 +02:00
Martin Kroeker
254db9bd7c
Use in-place transform shortcut only if matrix is square
2017-09-03 09:52:55 +02:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
376048156b
Use in-place transform shortcut only if matrix is square
2017-07-21 11:20:15 +02:00
Martin Kroeker
d1c5b8f913
Add files via upload
2017-07-20 20:51:06 +02:00
Martin Kroeker
91bde7d315
Exchange rows and cols in final omatcopy with BlasTrans
...
This is MicMuc's patch from #899
2017-07-15 22:02:53 +02:00
Martin Kroeker
1e06b49854
Update xerbla.c
2017-04-26 20:29:30 +02:00
Martin Kroeker
7f546f54fa
Add cblas_xerbla
2017-04-26 20:01:34 +02:00
Martin Kroeker
a809431e34
Add cblas_xerbla()
2017-04-26 19:58:59 +02:00
Andrew
99880f7906
Address unlikely memleak in zimatcopy interface ( #1129 )
...
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker
211d2eceb5
Update zdot.c
2017-03-13 18:08:00 +01:00
Martin Kroeker
5813ed095b
Update zdot.c
2017-03-13 17:49:07 +01:00
Martin Kroeker
e44b028fe5
Replace gnu _real_, _imag_ extensions in initializers
2017-03-13 00:40:11 +01:00
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
2017-02-03 03:55:06 -08:00
Werner Saar
dd6212e684
updated some level1 funcions, that are not thread save
2017-01-10 14:05:07 +01:00
jiahaipeng
84b8170bfb
Adding multi-threading for copy, dot, rot, and asum funcitons
2017-01-10 11:48:58 +08:00
Werner Saar
ae4ac6f984
removed obj-files, that are moved to lapack 3.7.0
2017-01-06 16:14:53 +01:00
Jerome Robert
d346c533b1
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
...
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
2016-06-07 16:11:09 +02:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Werner Saar
41000c8443
added directory for optimized lapack fortan codes and added dlaqr5.f
2016-05-31 12:53:07 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Jerome Robert
40af513669
Disable multi-threading in swap
...
* Close #873
2016-05-16 13:07:55 +00:00
Jerome Robert
16ec5323c9
Fix zgemv.c compilation when stack allocation is disabled
2016-02-08 12:05:02 +01:00
Jerome Robert
5fc2203d8a
zgemv: Add a workaround for #746
2016-02-08 11:25:15 +01:00
Jerome Robert
78dcf5c3d5
Improve performances of ztrmv on small matrices
...
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Jerome Robert
32f793195f
Use stack allocation in zgemv and zger
...
For better performance with small matrices
Ref #727
2016-02-08 11:24:21 +01:00
Jerome Robert
1fe3aab047
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.
Close #742
2016-01-24 11:31:40 +01:00
Jerome Robert
1a1935507b
[z]ger: increase multithread threshold
...
The ones given in 3ae30cd
was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.
2016-01-24 10:46:35 +01:00
Jerome Robert
66eafb16cf
swap: disable multi-threading for small matrices
...
Close #731
2016-01-19 17:14:46 +01:00
Jerome Robert
3ae30cd6b9
Disable multi-threading for small matrices in [z]ger
...
Ref #731
2016-01-19 17:14:31 +01:00
Jerome Robert
87a2ccc37c
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
...
Ref #727
2016-01-08 16:03:52 +01:00
Jerome Robert
f9890a6452
Fix compilation when MAX_STACK_ALLOC is not set
...
Close #722
2015-12-31 14:43:09 +01:00
Zhang Xianyi
285d042b10
Fixed rotg bug on ARM.
2015-12-14 10:07:01 -06:00
Zhang Xianyi
640cccc2b1
Refs #697 . Fixed gemv bug for Windows.
...
Thank matzeri's patch.
2015-11-30 15:19:45 -06:00
Ralph Campbell
55a0b27c01
Minor C code fixes in interface/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi
5a291606ad
Refs #671 . the return of i?max cannot larger than N.
2015-10-24 01:16:34 +08:00
Zhang Xianyi
8fade093aa
Fixed cmake bug on Visual Studio.
2015-10-20 14:37:22 -05:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi
baec8f5cac
Refs #638 . Fixed compiling bug with clang on Mac OS X.
2015-09-10 10:32:07 -05:00
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
...
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Werner Saar
f8f2e261fe
use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
2015-05-06 10:41:53 +02:00
Jerome Robert
ab567d8443
gemv: Ensure stack buffer is large enough to handle memory alignment
...
Ref #478
2015-04-24 10:12:49 +02:00
Zhang Xianyi
847e19c04e
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
2015-04-20 23:22:40 -05:00
Zhang Xianyi
fd9fd42936
Refs #478 , #482 . Fixed bug on previous commit.
2015-04-13 23:22:27 -05:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi
cdefdb21cd
Refs #492 . Fixed c/zsyr bug with negative incx.
2015-02-26 06:37:03 +08:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
b2284647a3
More complex objects.
2015-02-23 07:51:05 -06:00
Hank Anderson
a6116e5859
Added some more complex-only objects.
2015-02-22 17:49:28 -06:00
Hank Anderson
67e39bd8fb
Added mangled complex filenames to interface and lapack CMakeLists.txt.
2015-02-17 13:12:30 -06:00
Hank Anderson
9eb1499095
Added another param to GenerateNamedObjects to mangle complex source names.
...
There are a lot of sources for complex float types that are the same
names as the real sources, except with z prepended.
2015-02-17 10:30:28 -06:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
2015-02-11 11:16:57 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
58cff2fed8
Added CBLAS define/naming convention to GenerateNamedObjects.
2015-02-04 11:30:15 -06:00