Commit Graph

82 Commits

Author SHA1 Message Date
wernsaar 5400a9f4e4 redefined functions for TIMING and YIELDING for ARMV7 processor 2013-11-03 10:34:04 +01:00
Sébastien Villemot eae4cfa3f6 Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.

Closes #272
2013-08-28 14:29:42 +02:00
Zhang Xianyi 673e453b3f Enable bulldozer kernels. 2013-08-05 16:07:54 +08:00
Zhang Xianyi 143cca4dd5 Merge branch 'develop' into bulldozer 2013-08-05 15:51:53 +08:00
Zhang Xianyi 534c5ec919 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-29 16:48:35 +08:00
Zhang Xianyi 5b504d6c23 Refs #263. Rollback bulldozer and piledriver kernels to barcelona kernels. 2013-07-28 17:39:24 +08:00
Zhang Xianyi 72b1edaf1b Merge branch 'develop' into bulldozer
Conflicts:
	kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
Zhang Xianyi 4471c77905 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-26 23:43:54 +08:00
Zhang Xianyi 77b572fa0b Merge branch 'loongson3a' into develop
Conflicts:
	Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi 2a7503e563 Refs #225. Fixed a bug in GEMM OpenMP threading. 2013-07-15 09:56:19 +08:00
grisuthedragon c19a488af2 create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
2013-07-11 21:39:19 +08:00
Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar 6f008abcef replaced defined(DOUBLE) by !defined(XDOUBLE) 2013-07-09 18:17:50 +02:00
Zhang Xianyi f54f5bac9e Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi 32dbeb636d Refs #221. Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256. 2013-07-02 14:17:55 +08:00
Dan Luu 88ef307cef Refs #241. Add Haswell support (using sandybridge optimizations) 2013-06-30 22:35:14 +08:00
Zhang Xianyi cd1d473ba0 Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
2013-06-13 07:29:27 -07:00
wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
Zhang Xianyi 65ffead0cf Refs #124. Check XSAVE flag on x86 CPU. 2013-06-06 22:50:43 +08:00
Xianyi Zhang 6b01d58712 Disable the optimization of muli-threading gemm on the Loongson3A. 2013-03-30 20:12:43 +00:00
Zhang Xianyi f1ce74ffdd Improved the print when OS don't support AVX. 2013-03-02 14:15:54 +08:00
Zhang Xianyi d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 2013-03-01 14:36:47 +08:00
Zhang Xianyi 3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 2013-02-26 00:48:21 +08:00
Zhang Xianyi 5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi 6ae2f868fd Set the affinity. Only use 1 core of each module on bulldozer. 2013-02-09 18:19:02 +01:00
Zhang Xianyi 299b5a44dc Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer 2013-02-09 16:22:04 +01:00
Zhang Xianyi 8cdb795438 Refs #187. Use binary code for xgetbv, which is compatible with old compiler. 2013-01-22 00:25:08 +08:00
Zhang Xianyi a4ee6f3915 Fixed #172. Support Intel Xeon E7540. 2012-12-18 08:57:46 +08:00
Zhang Xianyi fba6b590f2 Merge branch 'master' into develop 2012-12-15 22:49:37 +08:00
Julian Taylor 1138817dd2 add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi bdf8d9411e Refs #163. Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.

Example:
 #include <stdio.h>
extern char * openblas_get_config();
void main()
{
  printf("%s\n",openblas_get_config());
  return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi 6751f7b9a7 Fixed #157. Only detect the number of physical CPU cores on Mac OSX. 2012-11-13 15:48:57 +08:00
Zhang Xianyi 538c764d2b Refs #153. Restore the original CPU affinity when calling openblas_set_num_threads(1).
Please read the issue on github.com for the detail.
2012-11-06 18:21:46 +08:00
Zhang Xianyi 6c5899dff5 Don't use xgetbv instruction when NO_AVX=1 2012-10-09 14:52:35 +08:00
Zhang Xianyi 735ca38b8f Refs #139. Check OS supporting AVX on runtime. 2012-09-18 15:46:20 +08:00
Zhang Xianyi f76a384841 Refs #139. Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
2012-09-17 23:25:46 +08:00
Jameson Nash d0e731e8b8 provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line 2012-08-21 00:31:12 -04:00
Zhang Xianyi fe4ab95cd5 Refs #136. Fixed a bug about controlling the number of threads on Windows. 2012-08-19 23:50:54 +08:00
Xianyi Zhang 801383effe Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows. 2012-08-14 18:34:32 +08:00
Zhang Xianyi 54cd65e47f Use sandy bridge kernel when DYNAMIC_ARCH=1. 2012-08-13 15:25:08 +08:00
Zhang Xianyi a55821a2ec Refs #132. Kill the threads when unload the library. 2012-08-11 21:33:15 +08:00
Zhang Xianyi d007cca61d Refs #134. Fixed the building bug on IBM Power. 2012-08-10 11:54:21 +08:00
Xianyi Zhang 25f1a573fd Fixed the build bug when DYNAMIC_ARCH=0. 2012-07-07 12:12:24 +08:00
Sylvestre Ledru 3692b4d631 Improve the detection of sparc 2012-07-02 02:51:38 +02:00
Xianyi Zhang a507b56ab1 Refs #119 #118. Fixed disabling hyper threading bug. 2012-06-29 15:53:24 +08:00