Commit Graph

709 Commits

Author SHA1 Message Date
Zhang Xianyi
d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 2013-03-01 14:36:47 +08:00
Zhang Xianyi
3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 2013-02-26 00:48:21 +08:00
Zhang Xianyi
5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi
5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi
6ae2f868fd Set the affinity. Only use 1 core of each module on bulldozer. 2013-02-09 18:19:02 +01:00
Zhang Xianyi
299b5a44dc Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer 2013-02-09 16:22:04 +01:00
Zhang Xianyi
8cdb795438 Refs #187. Use binary code for xgetbv, which is compatible with old compiler. 2013-01-22 00:25:08 +08:00
Zhang Xianyi
a4ee6f3915 Fixed #172. Support Intel Xeon E7540. 2012-12-18 08:57:46 +08:00
Zhang Xianyi
fba6b590f2 Merge branch 'master' into develop 2012-12-15 22:49:37 +08:00
Julian Taylor
1138817dd2 add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi
bdf8d9411e Refs #163. Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.

Example:
 #include <stdio.h>
extern char * openblas_get_config();
void main()
{
  printf("%s\n",openblas_get_config());
  return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi
bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi
b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi
6751f7b9a7 Fixed #157. Only detect the number of physical CPU cores on Mac OSX. 2012-11-13 15:48:57 +08:00
Zhang Xianyi
538c764d2b Refs #153. Restore the original CPU affinity when calling openblas_set_num_threads(1).
Please read the issue on github.com for the detail.
2012-11-06 18:21:46 +08:00
Zhang Xianyi
6c5899dff5 Don't use xgetbv instruction when NO_AVX=1 2012-10-09 14:52:35 +08:00
Zhang Xianyi
735ca38b8f Refs #139. Check OS supporting AVX on runtime. 2012-09-18 15:46:20 +08:00
Zhang Xianyi
f76a384841 Refs #139. Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
2012-09-17 23:25:46 +08:00
Jameson Nash
d0e731e8b8 provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line 2012-08-21 00:31:12 -04:00
Zhang Xianyi
fe4ab95cd5 Refs #136. Fixed a bug about controlling the number of threads on Windows. 2012-08-19 23:50:54 +08:00
Xianyi Zhang
801383effe Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows. 2012-08-14 18:34:32 +08:00
Zhang Xianyi
54cd65e47f Use sandy bridge kernel when DYNAMIC_ARCH=1. 2012-08-13 15:25:08 +08:00
Zhang Xianyi
a55821a2ec Refs #132. Kill the threads when unload the library. 2012-08-11 21:33:15 +08:00
Zhang Xianyi
d007cca61d Refs #134. Fixed the building bug on IBM Power. 2012-08-10 11:54:21 +08:00
Xianyi Zhang
25f1a573fd Fixed the build bug when DYNAMIC_ARCH=0. 2012-07-07 12:12:24 +08:00
Sylvestre Ledru
3692b4d631 Improve the detection of sparc 2012-07-02 02:51:38 +02:00
Xianyi Zhang
a507b56ab1 Refs #119 #118. Fixed disabling hyper threading bug. 2012-06-29 15:53:24 +08:00
Xianyi Zhang
853d16ed7e Added openblas_set_num_threads dummy function on Windows. We plan to implement this feature in next version. 2012-06-23 13:07:38 +08:00
Zhang Xianyi
422359d09a Export openblas_set_num_threads in shared library. 2012-06-23 11:32:43 +08:00
Zhang Xianyi
d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 2012-05-31 18:17:45 +08:00
Zhang Xianyi
90d6ad569d Merge branch 'sandybridge' into develop
Just copy the kernel codes from Nehalem. The optimization is ongoing.
2012-05-31 12:44:55 +08:00
Xianyi Zhang
a6adbb299d Refs #112. Improved setting thread affinity in Linux. Remove the limit (64) about the number of CPU cores. 2012-05-29 15:23:52 +08:00
Xianyi Zhang
a53c6e2440 Merge branch 'develop' into sandybridge 2012-05-25 23:16:44 +08:00
Zaheer Chothia
a431042475 Fix inconsistent case for OS_* macros (Refs pull request #111) 2012-05-23 00:01:14 +02:00
Mike Nolta
4e29b6ffc0 FreeBSD: fix OS_FreeBSD -> OS_FREEBSD typos 2012-05-21 16:57:19 -04:00
Xianyi Zhang
19a48b82cf Init Sandybridge codes based on Nehalem. 2012-03-30 20:01:03 +08:00
Xianyi Zhang
0b89a7a92d Ref #82. Disable outputing debug information in alloc_mmap. 2012-03-23 18:17:12 +08:00
Wang Qian
8163ab7e55 Change the block size on Loongson 3B. 2011-11-23 18:41:49 +00:00
Xianyi Zhang
ef6f7f32ae Fixed mbind bug on Loongson 3B. Check the return value of my_mbind function. 2011-11-23 17:17:41 +00:00
Xianyi Zhang
b95ad4cfaf Support detecting ICT Loongson-3B CPU. 2011-11-09 19:29:50 +00:00
traz
9fe3049de6 Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms. 2011-09-26 15:21:45 +00:00
traz
831858b883 Modify aligned address of sa and sb to improve the performance of multi-threads. 2011-09-23 20:59:48 +00:00
Xianyi Zhang
16fc083322 Refs #47. Fixed the seting parameter bug on Loongson 3A single thread version. 2011-09-08 16:39:34 +00:00
Xianyi Zhang
3c856c0c1a Check the return value of pthread_create. Update the docs with known issue on Loongson 3A. 2011-09-06 18:27:33 +00:00
Xianyi Zhang
4727fe8abf Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. 2011-09-05 15:13:52 +00:00
Xianyi Zhang
82f5274828 Refs #39. It's unnecessary to include sys/mman.h file in blas_server_omp.c. 2011-06-22 01:52:20 +08:00
Xianyi Zhang
1496383224 Print the wall time (cycles) with enabling FUNCTION_PROFILE. 2011-06-09 10:40:15 +08:00
Xianyi Zhang
af40551c9f Fixed the makefile bug about openblas_set_num_threads. 2011-05-27 21:15:30 +08:00
Xianyi Zhang
417b8ec792 Added openblas_set_num_threads for Fortran. 2011-05-06 17:03:35 +08:00