Zhang Xianyi
2638370844
Init code base for Intel Haswell.
2013-08-13 00:54:59 +08:00
Zhang Xianyi
673e453b3f
Enable bulldozer kernels.
2013-08-05 16:07:54 +08:00
Zhang Xianyi
143cca4dd5
Merge branch 'develop' into bulldozer
2013-08-05 15:51:53 +08:00
Zhang Xianyi
534c5ec919
Fixed #261 . Use strncmp instead of a comparing trick.
2013-07-29 16:48:35 +08:00
Zhang Xianyi
5b504d6c23
Refs #263 . Rollback bulldozer and piledriver kernels to barcelona kernels.
2013-07-28 17:39:24 +08:00
Zhang Xianyi
72b1edaf1b
Merge branch 'develop' into bulldozer
...
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
Zhang Xianyi
4471c77905
Fixed #261 . Use strncmp instead of a comparing trick.
2013-07-26 23:43:54 +08:00
Zhang Xianyi
77b572fa0b
Merge branch 'loongson3a' into develop
...
Conflicts:
Makefile.system
2013-07-20 22:33:17 +08:00
Zhang Xianyi
2a7503e563
Refs #225 . Fixed a bug in GEMM OpenMP threading.
2013-07-15 09:56:19 +08:00
grisuthedragon
c19a488af2
create openblas_get_parallel to retrieve information which
...
parallelization model is used by OpenBLAS.
2013-07-11 21:39:19 +08:00
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
...
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
2013-07-09 18:17:50 +02:00
Zhang Xianyi
f54f5bac9e
Refs #248 . Fixed the LSB compatiable issue for BLAS only.
...
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
...
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
2013-07-06 12:06:43 -03:00
Zhang Xianyi
32dbeb636d
Refs #221 . Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.
2013-07-02 14:17:55 +08:00
Dan Luu
88ef307cef
Refs #241 . Add Haswell support (using sandybridge optimizations)
2013-06-30 22:35:14 +08:00
Zhang Xianyi
cd1d473ba0
Merge pull request #230 from wernsaar/develop
...
Refs #230 . New dgemm and sgemm Kernel for BULLDOZER
2013-06-13 07:29:27 -07:00
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
2013-06-08 09:40:17 +02:00
Zhang Xianyi
65ffead0cf
Refs #124 . Check XSAVE flag on x86 CPU.
2013-06-06 22:50:43 +08:00
Xianyi Zhang
6b01d58712
Disable the optimization of muli-threading gemm on the Loongson3A.
2013-03-30 20:12:43 +00:00
Zhang Xianyi
f1ce74ffdd
Improved the print when OS don't support AVX.
2013-03-02 14:15:54 +08:00
Zhang Xianyi
d744c9590a
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
2013-03-01 14:36:47 +08:00
Zhang Xianyi
3cc6ae793e
Refs #174 . Return sb pointer when OpenMP or Windows.
2013-02-26 00:48:21 +08:00
Zhang Xianyi
5155e3f509
Refs #174 . Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
...
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
2013-02-10 01:19:42 +08:00
Zhang Xianyi
6ae2f868fd
Set the affinity. Only use 1 core of each module on bulldozer.
2013-02-09 18:19:02 +01:00
Zhang Xianyi
299b5a44dc
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
2013-02-09 16:22:04 +01:00
Zhang Xianyi
8cdb795438
Refs #187 . Use binary code for xgetbv, which is compatible with old compiler.
2013-01-22 00:25:08 +08:00
Zhang Xianyi
a4ee6f3915
Fixed #172 . Support Intel Xeon E7540.
2012-12-18 08:57:46 +08:00
Zhang Xianyi
fba6b590f2
Merge branch 'master' into develop
2012-12-15 22:49:37 +08:00
Julian Taylor
1138817dd2
add a sanity check on the detected cpu type
...
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi
bdf8d9411e
Refs #163 . Obtain the build configure on runtime.
...
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.
Example:
#include <stdio.h>
extern char * openblas_get_config();
void main()
{
printf("%s\n",openblas_get_config());
return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
2012-12-07 00:53:31 +08:00
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
2012-12-06 07:29:54 -05:00
Zhang Xianyi
6751f7b9a7
Fixed #157 . Only detect the number of physical CPU cores on Mac OSX.
2012-11-13 15:48:57 +08:00
Zhang Xianyi
538c764d2b
Refs #153 . Restore the original CPU affinity when calling openblas_set_num_threads(1).
...
Please read the issue on github.com for the detail.
2012-11-06 18:21:46 +08:00
Zhang Xianyi
6c5899dff5
Don't use xgetbv instruction when NO_AVX=1
2012-10-09 14:52:35 +08:00
Zhang Xianyi
735ca38b8f
Refs #139 . Check OS supporting AVX on runtime.
2012-09-18 15:46:20 +08:00
Zhang Xianyi
f76a384841
Refs #139 . Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
...
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
2012-09-17 23:25:46 +08:00
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
2012-08-21 00:31:12 -04:00
Zhang Xianyi
fe4ab95cd5
Refs #136 . Fixed a bug about controlling the number of threads on Windows.
2012-08-19 23:50:54 +08:00
Xianyi Zhang
801383effe
Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows.
2012-08-14 18:34:32 +08:00
Zhang Xianyi
54cd65e47f
Use sandy bridge kernel when DYNAMIC_ARCH=1.
2012-08-13 15:25:08 +08:00
Zhang Xianyi
a55821a2ec
Refs #132 . Kill the threads when unload the library.
2012-08-11 21:33:15 +08:00
Zhang Xianyi
d007cca61d
Refs #134 . Fixed the building bug on IBM Power.
2012-08-10 11:54:21 +08:00
Xianyi Zhang
25f1a573fd
Fixed the build bug when DYNAMIC_ARCH=0.
2012-07-07 12:12:24 +08:00
Sylvestre Ledru
3692b4d631
Improve the detection of sparc
2012-07-02 02:51:38 +02:00
Xianyi Zhang
a507b56ab1
Refs #119 #118 . Fixed disabling hyper threading bug.
2012-06-29 15:53:24 +08:00
Xianyi Zhang
853d16ed7e
Added openblas_set_num_threads dummy function on Windows. We plan to implement this feature in next version.
2012-06-23 13:07:38 +08:00
Zhang Xianyi
422359d09a
Export openblas_set_num_threads in shared library.
2012-06-23 11:32:43 +08:00
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
2012-05-31 18:17:45 +08:00
Zhang Xianyi
90d6ad569d
Merge branch 'sandybridge' into develop
...
Just copy the kernel codes from Nehalem. The optimization is ongoing.
2012-05-31 12:44:55 +08:00
Xianyi Zhang
a6adbb299d
Refs #112 . Improved setting thread affinity in Linux. Remove the limit (64) about the number of CPU cores.
2012-05-29 15:23:52 +08:00
Xianyi Zhang
a53c6e2440
Merge branch 'develop' into sandybridge
2012-05-25 23:16:44 +08:00
Zaheer Chothia
a431042475
Fix inconsistent case for OS_* macros (Refs pull request #111 )
2012-05-23 00:01:14 +02:00
Mike Nolta
4e29b6ffc0
FreeBSD: fix OS_FreeBSD -> OS_FREEBSD typos
2012-05-21 16:57:19 -04:00
Xianyi Zhang
19a48b82cf
Init Sandybridge codes based on Nehalem.
2012-03-30 20:01:03 +08:00
Xianyi Zhang
0b89a7a92d
Ref #82 . Disable outputing debug information in alloc_mmap.
2012-03-23 18:17:12 +08:00
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
2011-11-23 18:41:49 +00:00
Xianyi Zhang
ef6f7f32ae
Fixed mbind bug on Loongson 3B. Check the return value of my_mbind function.
2011-11-23 17:17:41 +00:00
Xianyi Zhang
b95ad4cfaf
Support detecting ICT Loongson-3B CPU.
2011-11-09 19:29:50 +00:00
traz
9fe3049de6
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.
2011-09-26 15:21:45 +00:00
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
2011-09-23 20:59:48 +00:00
Xianyi Zhang
16fc083322
Refs #47 . Fixed the seting parameter bug on Loongson 3A single thread version.
2011-09-08 16:39:34 +00:00
Xianyi Zhang
3c856c0c1a
Check the return value of pthread_create. Update the docs with known issue on Loongson 3A.
2011-09-06 18:27:33 +00:00
Xianyi Zhang
4727fe8abf
Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
2011-09-05 15:13:52 +00:00
Xianyi Zhang
82f5274828
Refs #39 . It's unnecessary to include sys/mman.h file in blas_server_omp.c.
2011-06-22 01:52:20 +08:00
Xianyi Zhang
1496383224
Print the wall time (cycles) with enabling FUNCTION_PROFILE.
2011-06-09 10:40:15 +08:00
Xianyi Zhang
af40551c9f
Fixed the makefile bug about openblas_set_num_threads.
2011-05-27 21:15:30 +08:00
Xianyi Zhang
417b8ec792
Added openblas_set_num_threads for Fortran.
2011-05-06 17:03:35 +08:00
Xianyi Zhang
989c6f8b06
Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64.
2011-04-07 14:48:10 +08:00
Xianyi Zhang
e4bb6f2482
Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch.
2011-03-22 14:09:47 +08:00
Xianyi Zhang
f7a5e049e2
Enable Debug flags in memory alloc and init functions.
2011-02-26 11:51:39 +08:00
Xianyi Zhang
1b97ec1a7c
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
2011-02-26 11:19:54 +08:00
Xianyi Zhang
128418f49b
Fixed #10 . Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables.
2011-02-24 16:32:13 +08:00
Xianyi Zhang
e51364edb4
Fixed #5 Detected Intel Westmere (using Nehalem codes) in build and dynamic arch build.
...
Thanks Cao He from Dawning supporting Intel Xeon 5660 testbed.
2011-02-19 00:03:50 +08:00
Xianyi Zhang
e6c13e2b3c
changed library name to openblas and modified environment variable.
2011-01-24 17:58:05 +00:00
Xianyi Zhang
5c9f1ebbf9
Fixed a bug when compiling dynamic ARCH x86 in GotoBLAS2.
2011-01-24 16:04:17 +00:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00