Commit Graph

217 Commits

Author SHA1 Message Date
Zhang Xianyi e95d64333a Refs #519. Avoid calling strncpy. 2015-03-19 15:57:22 -05:00
Ton van den Heuvel b6438dedea Fix issue #508
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Hank Anderson 5ae8993752 Added intrinsics for MSVC. 2015-02-25 11:52:51 -06:00
Hank Anderson 84d90d6ed8 Fixed some compiler errors/warnings for clang. 2015-02-25 11:52:25 -06:00
Hank Anderson 9eaea02f33 Added additional gemm defines for complex types. 2015-02-25 09:39:11 -06:00
Hank Anderson ab7043373f Fixed bug generating trmv complex source names. 2015-02-24 15:18:41 -06:00
Hank Anderson 0553476fba Added TRANS defines for complex sources in lapack. 2015-02-24 14:30:35 -06:00
Hank Anderson 2416d9dbac Fixed TRANSA defines for complex sources in driver/level2. 2015-02-24 13:18:07 -06:00
Hank Anderson 0d8e227ea7 Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.

This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson 1b7f427401 Added conj gemv objects for complex build. 2015-02-23 10:24:31 -06:00
Hank Anderson fb5d5bb971 Added defines for complex trmv. 2015-02-21 12:39:03 -06:00
Hank Anderson 371071d461 Added CONJ defines for trmm/trsm. 2015-02-21 10:59:02 -06:00
Hank Anderson 8a143516e3 Added alternate_name to a couple of the name mangling schemes.
Added zherk_k sources to driver/level3.
2015-02-20 17:03:33 -06:00
Hank Anderson e5897ecb9b Added zherk_kernel.c objects to driver/level3. 2015-02-19 16:19:56 -06:00
Hank Anderson 33c5e8db7f Added a helper function for setting the L1 kernel defaults.
Added loop to build objects with different KERNEL defines.
2015-02-17 21:36:23 -06:00
Hank Anderson 4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson e74462a3f5 Moved declarations to start of functions to satisfy MSVC C89 implementation. 2015-02-11 11:16:57 -06:00
Hank Anderson 056ba26755 Changed a number of inline calls to use __inline.
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
2015-02-11 11:13:17 -06:00
Hank Anderson c94fe71278 Removed incoming-stack-boundary for MSVC.
Made float type optional for GenerateNamedObjects.

Called GenerateNamedObjects for a couple of driver/others files that
needed NAME/CNAME set.
2015-02-11 10:54:14 -06:00
Hank Anderson e8c39138c6 Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson 7fa5c4e2fd Fixed some case issues with ARCH.
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Zhang Xianyi cfa9392ffa Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread. 2015-02-08 01:30:23 -06:00
Hank Anderson 2f59135eb6 Added gemv to level2 CMakeLists.txt. 2015-02-07 21:15:21 -06:00
Hank Anderson 6b5d26e07b Added SMP sources to level2 CMakeLists.txt. 2015-02-06 16:52:19 -06:00
Hank Anderson 627d5e7401 Added SMP objects to driver/level3. 2015-02-05 12:22:48 -06:00
Hank Anderson 943fa2fb58 Fixed object names in level2. 2015-02-05 10:49:11 -06:00
Hank Anderson 461e691127 Codes when define is absent are now a parameter to AllCombinations.
The level3 object names should now be correct.
2015-02-05 09:23:47 -06:00
Hank Anderson cfaf1c678f Added option to append define codes with an underscore.
Fixed the code array not getting reset on subsequent AllCombinations
calls.
2015-02-05 09:17:18 -06:00
Hank Anderson 0d7bad1f35 Changed GenerateObjects to append combination codes (e.g. dtrmm_TU). 2015-02-05 09:02:54 -06:00
Hank Anderson 2828f6630c Added SMP sources to COMMONOBJS. 2015-02-04 14:01:36 -06:00
Erik Schnetter 65a847cd36 Introduce openblas_get_num_threads and openblas_get_num_procs 2015-02-03 12:23:41 -05:00
Hank Anderson 7194424fef Added missing common objects to the library. 2015-02-02 15:21:29 -06:00
Hank Anderson d11bde60d0 DOUBLE define for DBLAS objects is now set in main CMakeLists.txt.
Since the objects are the same, could generate SINGLE/COMPLEX/etc here
without having to rewrite all the object enumeration code again.
2015-02-02 15:00:44 -06:00
Hank Anderson 5057a4b4df Added openblas add_library call that uses DBLAS_OBJS ojbects. 2015-01-30 15:21:21 -06:00
Hank Anderson 3e8ea7a351 Added COMMONOBJS to driver/others CMakeLists.txt. 2015-01-30 14:06:14 -06:00
Hank Anderson d3dcdddf75 Moved functions into util cmake file. 2015-01-30 13:47:40 -06:00
Hank Anderson e5e7595bf9 Added paramater to GenerateObjects for defines that affect all sources. 2015-01-30 13:31:13 -06:00
Hank Anderson 7693887d61 Added empty set to the combinations generated by AllCombinations. 2015-01-30 13:01:11 -06:00
Hank Anderson 8d9b196e0d Moved loop over define combos into a function.
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Hank Anderson a6cf8aafc0 Updated level3/CMakeLists with correct defines using all combos. 2015-01-30 11:21:50 -06:00
Hank Anderson dbdca7bf0c Added first pass at driver/level3 Makefile conversion.
Added a rather convoluted CMake function to find all combinations
of a given list. This will be useful for the object files that are
compiled multiple times with different combinations of preprocessor
definitions.
2015-01-29 22:53:11 -06:00
Hank Anderson 8c23965da3 prebuild.cmake now reads the output from getarch into CMake vars. 2015-01-28 22:57:44 -06:00
Hank Anderson 8ede4a8da4 getarch now compiles and sets config.h defines properly.
Still isn't parsed into CMake variables, and getarch_2 needs to
get the same treatment.
2015-01-28 17:18:26 -06:00
Hank Anderson 1c5b6bb4f7 Added CORE define to config.h in prebuild.cmake (temporarily). 2015-01-28 16:33:48 -06:00
Hank Anderson 9a508abdc7 Added first pass at driver/level2 makefile conversion. 2015-01-28 14:52:15 -06:00
Werner Saar 0dc559ed30 bugfix in dynamic.c 2014-12-28 17:15:42 +01:00
Werner Saar 4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Zhang Xianyi 2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Zhang Xianyi 695e0fa649 #463 fixed a compiling bug on AIX. 2014-11-10 14:39:56 +08:00
wernsaar 7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar a64fe9bcc9 added optimized sgemv_n kernel for sandybridge 2014-09-06 08:41:53 +02:00
wernsaar 2021d0f9d6 experimentally removed expensive function calls 2014-09-05 15:05:53 +02:00
Isaac Dunham f7eb81a846 Fix link error on Linux/musl.
get_nprocs() is a GNU convenience function equivalent to POSIX2008
sysconf(_SC_NPROCESSORS_ONLN); the latter should be available in unistd.h
on any current *nix. (OS X supports this call since 10.5, and FreeBSD
currently supports it. But this commit does not change FreeBSD or OS X
versions.)
2014-08-03 15:06:30 -07:00
wernsaar 793175be3a added experimental support for big numa machines 2014-08-02 13:40:16 +02:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar 3ea4dadd30 optimizations for trsm 2014-07-25 11:59:17 +02:00
wernsaar 1b10ff129a optimizations for trmm 2014-07-25 10:00:23 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
Zhang Xianyi c94762bb56 Refs #401. Added NO_AVX2 flag for old binutils (e.g. RHEL6) 2014-07-16 08:38:25 +08:00
Zhang Xianyi 552119c484 Fixed #407. Support outputing the CPU corename on runtime.
The user can use char * openblas_get_config() or char * openblas_get_corename().
2014-07-08 12:48:08 +08:00
wernsaar 50e99a52ea added definitions for PILEDRIVER and HASWELL 2014-07-06 12:08:27 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
Zhang Xianyi 7a8949e0ce Merge branch 'develop' of https://github.com/TimothyGu/OpenBLAS into TimothyGu-develop
Conflicts:
	driver/others/memory.c
2014-06-28 20:51:31 +08:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Jameson Nash f41f03ab83 fix #394. this cleans up some handles after using them, and doesn't disable ALL process privileges upon success 2014-06-27 12:16:57 -04:00
wernsaar 438002204d Ref #393: fix for INTERFACE64=0 and ARCH_X86 in divtable 2014-06-21 12:29:23 +02:00
wernsaar 53bfa51ee0 Ref #385: fixed warnings in dynamic.c 2014-06-12 18:17:08 +02:00
wernsaar a86d349a51 Ref #380: enhancements for dynamic_arch 2014-06-12 14:20:03 +02:00
wernsaar a35a1a9ae7 changed makefiles for lapack development 2014-05-07 11:33:02 +02:00
Olivier Grisel 2c556f093a Add cast to function pointer to remove warning 2014-02-25 11:08:32 +01:00
Olivier Grisel 3b027d2528 Do not reference pthread_atfork in non-SMP_SERVER mode 2014-02-25 11:08:32 +01:00
Olivier Grisel 49bd98f410 Do not reference pthread_atfork under windows 2014-02-19 19:25:48 +01:00
Olivier Grisel 138a841390 FIX #294: make OpenBLAS thread-pool resilient to fork via pthread_atfork 2014-02-19 19:01:15 +01:00
Olivier Grisel 046e4013cb Revert "Refs #294. Used pthread_atfork to avoid hang after a Unix fork."
This reverts commit 3617c22a56.
2014-02-19 18:32:54 +01:00
Zhang Xianyi 3617c22a56 Refs #294. Used pthread_atfork to avoid hang after a Unix fork.
The problem is the mutex we used in blas_server. Thus, we must clear
the mutex before the fork and re-init them at parent and child process.

If you used OpenMP, GOMP has the same problem by now. Please try other OpenMP
implemantation.
2014-02-18 15:36:04 +08:00
Zhang Xianyi 8c7687b419 Refs #338. Added OPENBLAS_VERBOSE environment variable on runtime
By default, OpenBLAS doesn't output the warning message. You can set
OPENBLAS_VERBOSE (e.g. export OPENBLAS_VERBOSE=1) to enable the warning
message on runtime.
2014-01-24 02:05:59 +08:00
Zhang Xianyi ab69443bd4 Refs #332. Added addtional Intel Ivy Bridge and Haswell CPU-id. 2014-01-05 23:44:29 +08:00
Zhang Xianyi b263e096af Refs #307. Delete debug printf. 2013-12-31 15:53:13 +08:00
wernsaar 8191efc420 changed level3.c 2013-12-01 13:52:08 +01:00
wernsaar 0b6e13b689 Merge remote branch 'origin/develop' into haswell 2013-12-01 13:38:11 +01:00
wernsaar 5c648a8984 Merge remote branch 'origin/develop' into haswell 2013-12-01 11:25:33 +01:00
Zhang Xianyi 5048a80032 Refs #283. Fixed the incorrect usage of long data type for Windows 64. 2013-11-14 13:46:42 +08:00
Zhang Xianyi a2942456ef Refs #307. Fixed the hang bug when free OpenBLAS dll in Windows. 2013-11-13 10:00:18 +08:00
wernsaar a77c71eaf5 added highly optimized dgemm_kernel for HASWELL 2013-10-28 10:23:47 +01:00
wernsaar fe8c5666f9 optimized dgemm_kernel for HASWELL 2013-10-20 16:52:26 +02:00
Sébastien Villemot eae4cfa3f6 Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.

Closes #272
2013-08-28 14:29:42 +02:00
Zhang Xianyi 2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
Zhang Xianyi 673e453b3f Enable bulldozer kernels. 2013-08-05 16:07:54 +08:00
Zhang Xianyi 143cca4dd5 Merge branch 'develop' into bulldozer 2013-08-05 15:51:53 +08:00
Zhang Xianyi 534c5ec919 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-29 16:48:35 +08:00
Zhang Xianyi 5b504d6c23 Refs #263. Rollback bulldozer and piledriver kernels to barcelona kernels. 2013-07-28 17:39:24 +08:00
Zhang Xianyi 72b1edaf1b Merge branch 'develop' into bulldozer
Conflicts:
	kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
Zhang Xianyi 4471c77905 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-26 23:43:54 +08:00
Zhang Xianyi 77b572fa0b Merge branch 'loongson3a' into develop
Conflicts:
	Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi 2a7503e563 Refs #225. Fixed a bug in GEMM OpenMP threading. 2013-07-15 09:56:19 +08:00
grisuthedragon c19a488af2 create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
2013-07-11 21:39:19 +08:00
Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar 6f008abcef replaced defined(DOUBLE) by !defined(XDOUBLE) 2013-07-09 18:17:50 +02:00
Zhang Xianyi f54f5bac9e Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00