Martin Kroeker
36fcb52094
Fix logic - we want real OR imaginary part of X to be nonzero here
2023-04-01 00:02:54 +02:00
H. Vetinari
f2659516ef
remove unqualified ifdef's for NO_LAPACK(E)
2023-03-28 19:01:31 +11:00
Martin Kroeker
7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set
2022-01-27 22:00:39 +01:00
Martin Kroeker
5f6a609253
Add sbgemv
2021-09-14 16:13:57 +02:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
887e00fd7f
Adapt for supporting only a subset of variable types
2020-10-11 14:58:57 +02:00
Martin Kroeker
3287848c8f
Support building only seleced types
2020-09-22 23:20:51 +02:00
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
...
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
8617d75548
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-01 23:50:41 +02:00
Sebastian Berg
6355c25dde
Avoid taking root of negative number in symv_thread.c
...
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
2019-09-29 22:03:12 -07:00
Martin Kroeker
45333d5793
Fix error introduced during cleanup
2019-02-19 22:16:33 +01:00
Martin Kroeker
78d9910236
Correct range_n limiting
...
same bug as seen in #1388 , somehow missed in corresponding PR #1389
2019-02-19 20:59:48 +01:00
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
...
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
2018-12-30 15:22:37 +01:00
Martin Kroeker
368d14f8c8
Fix harmless typo
...
fixes #1872
2018-11-16 14:58:28 +01:00
Martin Kroeker
0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
...
otherwise the introduction of a static array in 8e5a108
to improve #532 breaks concurrent calls from multiple threads as seen in #1844
2018-11-10 15:45:54 +01:00
Martin Kroeker
cc9500db41
Merge pull request #1403 from brada4/develop
...
Address few more warnings
2017-12-30 14:51:34 +01:00
Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Martin Kroeker
177b78c8b4
Issue1388 ( #1389 )
...
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262 - should fix #1388
* Calculation of range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
2017-12-09 22:29:03 +01:00
Andrew
281a2b952f
warning cleanup ( #1380 )
...
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
b414283f48
Disable gemv unrolling
...
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Andrew
e14d50d86e
eliminate Wunused-const gcc7 warning
2017-11-24 19:13:24 +01:00
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
2017-08-19 15:07:42 +10:00
Martin Kroeker
719fcc56b0
Merge pull request #1262 from martin-frbg/xmv_thread-splitting
...
Make sure that range limit of last thread never exceeds data size
2017-08-06 14:11:44 +02:00
Martin Kroeker
0ba64cee60
Update trmv_thread.c
2017-08-02 12:03:54 +02:00
Martin Kroeker
c4e5ba1bfe
Make sure that range_n of last thread never exceeds the actual data size when splitting the workload
2017-08-02 00:37:58 +02:00
Martin Kroeker
a6f533b248
Revert "Fix calculated range limit exceeding actual data size for last thread"
2017-08-01 19:28:08 +02:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
585c0010a5
Fix range limit exceeding actual data size in last step
2017-07-28 00:27:02 +02:00
Martin Kroeker
857f61bc5d
Fix range limit exceeding data size in last step
2017-07-28 00:21:53 +02:00
Martin Kroeker
9332042d5f
Fix range exceeding actual data size in quick_divide
2017-07-28 00:13:24 +02:00
Andrew
529bfc36ec
Fix write past fixed size buffer
2017-07-12 00:59:30 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Jerome Robert
53ba1a77c8
ztrmv_L.c: no longer need a 4kB buffer
...
Fix #786
2016-03-05 19:07:03 +01:00
Jerome Robert
78dcf5c3d5
Improve performances of ztrmv on small matrices
...
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
2015-10-20 04:30:55 +08:00
Zhang Xianyi
f8eba3d548
Fixed cmake build bugs on Linux.
2015-08-11 16:25:16 -05:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Zhang Xianyi
8e5a1083bb
Refs #532 . Improve gemv paralel with small m and large n case.
...
Splite the matrix and reduction.
2015-05-08 05:33:17 +08:00
Hank Anderson
ab7043373f
Fixed bug generating trmv complex source names.
2015-02-24 15:18:41 -06:00
Hank Anderson
0553476fba
Added TRANS defines for complex sources in lapack.
2015-02-24 14:30:35 -06:00
Hank Anderson
2416d9dbac
Fixed TRANSA defines for complex sources in driver/level2.
2015-02-24 13:18:07 -06:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
1b7f427401
Added conj gemv objects for complex build.
2015-02-23 10:24:31 -06:00
Hank Anderson
fb5d5bb971
Added defines for complex trmv.
2015-02-21 12:39:03 -06:00
Hank Anderson
33c5e8db7f
Added a helper function for setting the L1 kernel defaults.
...
Added loop to build objects with different KERNEL defines.
2015-02-17 21:36:23 -06:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
2f59135eb6
Added gemv to level2 CMakeLists.txt.
2015-02-07 21:15:21 -06:00