kaustubh
1480f3df71
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:27:23 +05:30
kaustubh
88afb3bc94
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:22:09 +05:30
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
2017-01-09 05:48:09 -05:00
Abdurrauf
6418667818
dtrmm and dgemm for z13
2017-01-04 19:32:33 +04:00
Shivraj Patil
a9bf8a781a
Added prefetch to CGEMV and ZGEMV.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-12-27 11:33:51 +05:30
kaustubh
5f93aa5f87
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-14 14:05:11 +05:30
kaustubh
9db451acd0
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 14:02:14 +05:30
kaustubh
3eaff85191
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 11:41:17 +05:30
kaustubh
00abce3b93
Add data prefetch in DOT and ASUM functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-11-22 11:21:03 +05:30
Andrew
becf8bc7a0
remove dead code
2016-10-31 12:46:56 +01:00
kaustubh
f3419e634c
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-17 18:29:38 +05:30
Zhang Xianyi
7472c79ea6
Merge pull request #984 from ksraste/develop
...
STRSM, DTRSM functions data prefetch
2016-10-17 11:33:16 +08:00
kaustubh
90e2321ac3
STRSM, DTRSM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-14 16:41:28 +05:30
Martin Kroeker
4998e19869
Change file comments to work around clang 3.9 assembler bug
2016-10-13 16:51:08 +02:00
Martin Kroeker
91610f3835
Update zdot_msa.c
2016-10-05 18:59:09 +02:00
Martin Kroeker
6e22ecf102
Update zdot.c
2016-10-05 18:58:03 +02:00
Martin Kroeker
6221d6df5f
Update zdot.c
2016-10-05 18:57:14 +02:00
Martin Kroeker
16446d1d23
Remove explicit include of complex.h
2016-09-29 23:45:56 +02:00
Martin Kroeker
a6e9e0b94b
Remove explicit include of complex.h
2016-09-29 23:43:28 +02:00
Martin Kroeker
3178e4fea0
Remove explicit include of complex.h
2016-09-29 23:41:43 +02:00
Martin Kroeker
95c245ddb0
Remove explicit include of complex.h
2016-09-29 23:40:36 +02:00
Martin Kroeker
4b1b27347f
Remove explicit include of complex.h
2016-09-29 23:39:35 +02:00
Shivraj Patil
54747fe24a
DGEMM function split and data prefech
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 10:58:21 +05:30
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
...
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
2016-07-14 13:49:34 +05:30
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi
4a30a2584a
Merge pull request #897 from ksraste/develop
...
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
mdong
098d8ec5d6
remove input from clobbered list
2016-06-24 16:37:58 -04:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Kaustubh Raste
011431b9d7
STRSM optimized for MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30
Kaustubh Raste
c8a7860eb3
STRSM optimized
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Zhang Xianyi
2daad2bcb5
Merge pull request #893 from biddisco/develop
...
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
2016-05-30 14:52:58 +08:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Aleksey Kuleshov
fca66262c4
mips64/axpy: fix error when INCY == 0
2016-05-23 13:30:27 +03:00
Werner Saar
412bcd187a
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
2016-05-23 11:20:41 +02:00
Werner Saar
bd06b246cc
Merge pull request #890 from wernsaar/develop
...
optimized dtrsm_kernel_LT for POWER8
2016-05-22 16:01:35 +02:00
Werner Saar
8b140220c8
optimized dtrsm_kernel_LT for POWER8
2016-05-22 15:20:04 +02:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Kaustubh Raste
ad9f317870
STRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-20 10:59:03 +05:30
Shivraj Patil
c4ba40e308
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-19 11:04:42 +05:30
Zhang Xianyi
7a19065369
Merge pull request #878 from ksraste/develop
...
DTRSM bug fix for MIPS P5600 and I6400
2016-05-19 11:16:43 +08:00
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
2016-05-17 14:45:27 +02:00
Kaustubh Raste
d7cbc7ac13
DTRSM bug fix for MIPS P5600 and I6400
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-17 15:48:02 +05:30
Werner Saar
88011f625d
Merge pull request #876 from wernsaar/develop
...
optimized dgemm on power8 for 20 threads
2016-05-16 14:52:40 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Kaustubh Raste
edb5980c13
DTRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-09 15:15:26 +05:30
Shivraj Patil
085cf236c2
conflict resolved by syncing with 'xianyi:develop'
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-04 11:07:14 +05:30
Shivraj Patil
b7b3d8ec8e
DGEMM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-03 14:42:26 +05:30
Zhang Xianyi
cd7af5260a
Merge pull request #847 from sva-img/develop
...
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
2016-04-29 11:44:36 -04:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
298b13bba4
updated some kernel files for EXCAVATOR
2016-04-25 10:36:23 +02:00
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Shivraj Patil
2c3dfe2bf3
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
...
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-04-22 14:03:18 +05:30
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
wernsaar
6fbca2a4a1
Merge pull request #845 from wernsaar/develop
...
optimized sgemm for power8
2016-04-20 13:44:22 +02:00
Werner Saar
0001260f4b
optimized sgemm
2016-04-20 13:06:38 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Zhang Xianyi
dd43661cfd
Init IBM z system (s390x) porting.
2016-04-15 18:02:24 -04:00
Zhang Xianyi
f24d5307cf
Refs #834 . Fix zgemv config bug on Steamroller.
2016-04-12 22:26:11 +08:00
Werner Saar
8037d78eed
bugfix for arm scal.c and zscal.c
2016-04-11 11:21:36 +02:00
wernsaar
0a4276bc2f
Merge pull request #837 from wernsaar/develop
...
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 11:13:27 +02:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Zhang Xianyi
d4380c1fe4
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
2016-04-07 01:44:18 +08:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
d4c0330967
updated cgemm- and ctrmm-kernel for POWER8
2016-04-03 14:30:49 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
2016-03-28 14:12:08 +02:00
Werner Saar
294f933869
added optimized zasum kernel for POWER8
2016-03-28 13:37:32 +02:00
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
2016-03-28 12:44:25 +02:00
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
2016-03-28 12:17:15 +02:00
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
2016-03-27 18:31:37 +02:00
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
2016-03-27 16:31:50 +02:00
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
2016-03-27 11:05:56 +02:00
wernsaar
e125a3dc33
Merge pull request #824 from wernsaar/develop
...
added optimized drot-kernel and srot-kernel for POWER8
2016-03-27 10:43:17 +02:00
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
2016-03-27 08:57:11 +02:00
Zhang Xianyi
7b4b7179ba
Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
...
Cortex-A57: Fixes and Optimizations
2016-03-27 00:04:20 -04:00
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
2016-03-25 17:34:55 +01:00
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
2016-03-25 16:54:25 +01:00
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
2016-03-25 16:06:56 +01:00
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
2016-03-25 15:27:34 +01:00
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
2016-03-25 14:35:43 +01:00
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
2016-03-25 13:03:02 +01:00
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
2016-03-25 09:42:08 +01:00
Ashwin Sekhar T K
278511ad2d
Cortex-A57: Fix clang compilation errors
2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K
3b5ffb49d3
Cortex-A57: Improve DGEMM 8x4 Implementation
2016-03-24 10:25:18 +05:30
Werner Saar
55eda3813b
added optimized zaxpy kernel for POWER8
2016-03-23 11:20:23 +01:00
Werner Saar
0664ba4c97
added optimized daxpy kernel for POWER8
2016-03-22 14:50:03 +01:00
Werner Saar
11c44dede1
added optimized sdot kernel for POWER8
2016-03-21 13:18:23 +01:00
Werner Saar
9e4584d069
added optimized zdot kernel for POWER8
2016-03-21 10:12:07 +01:00
Werner Saar
cd9fafc054
ddot for POWER8: updated licence information
2016-03-20 11:19:27 +01:00
Werner Saar
84b92e6373
added optimized ddot kernel for POWER8
2016-03-20 11:06:06 +01:00
wernsaar
c279a53ed8
Merge pull request #806 from wernsaar/develop
...
adding optimized single precision blas level3 kernels for POWER8
2016-03-18 12:46:16 +01:00
Werner Saar
e1df5a6e23
fixed sgemm- and strmm-kernel
2016-03-18 12:12:03 +01:00
Werner Saar
5c658f8746
add optimized cgemm- and ctrmm-kernel for POWER8
2016-03-18 08:17:25 +01:00
Ashwin Sekhar T K
5ac02f6dc7
Optimize Dgemm 4x4 for Cortex A57
2016-03-14 19:35:23 +05:30
Ashwin Sekhar T K
7aa1ad4923
Functional Assembly Kernels for CortexA57
...
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
2016-03-14 19:33:21 +05:30
Werner Saar
dcd15b546c
BUGFIX: KERNEL.POWER8
2016-03-14 14:36:59 +01:00
Werner Saar
96284ab295
added sgemm- and strmm-kernel for POWER8
2016-03-14 13:52:44 +01:00
Werner Saar
faa5e2e5e3
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
2016-03-10 11:10:38 +01:00
Werner Saar
fdf291be30
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
2016-03-10 09:42:07 +01:00
Werner Saar
c99cc41cbd
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
2016-03-09 14:02:03 +01:00
Werner Saar
acdff55a6a
Bugfix for ztrmv
2016-03-07 09:39:34 +01:00
Zhang Xianyi
7d6b68eb4a
Refs #786 . Revert to default assembly kernel.
2016-03-07 11:34:58 +08:00
Werner Saar
cd5241d0cf
modified KERNEL for power, to use the generic DSDOT-KERNEL
2016-03-06 09:07:24 +01:00
Zhang Xianyi
8c43d7fa5f
Merge remote-tracking branch 'origin/power8' into develop
...
Refs #774
2016-03-05 06:03:19 -05:00
Werner Saar
085f215257
Modified assembly label name, so that they are hidden.
...
Added license informations.
2016-03-05 10:27:27 +01:00
Zhang Xianyi
8f758eeff9
Refs #786 . avoid old assembly c/zgemv kernels.
2016-03-05 08:32:03 +08:00
Werner Saar
0afc76fd65
enabled gemm_beta assembly kernels
2016-03-04 15:01:15 +01:00
Werner Saar
91e1c5080c
modified configuration, to use power6 sgemm kernel for power8
2016-03-04 13:38:57 +01:00
Werner Saar
73f04c2c72
enabled hemv assemly function for power8
2016-03-04 13:20:50 +01:00
Werner Saar
3e633152c6
enabled symv assembly kernels on power8
2016-03-04 13:08:18 +01:00
Werner Saar
d5130ce7e3
enabled gemv assembly on power8
2016-03-04 12:53:31 +01:00
Werner Saar
4824b88fcb
enabled all level1 assembly kernels for power8
2016-03-04 12:35:25 +01:00
Werner Saar
b752858d6c
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
2016-03-01 07:33:56 +01:00
Zhang Xianyi
efa4f5c936
Refs #695 #783 . Replace default x86_64 cgemv_t
...
asm kernel by C kernel.
2016-03-01 11:18:56 +08:00
Zhang Xianyi
74b0672223
Fix c/zaxpyc kernel bug on Cortex-A57.
2016-02-23 22:47:53 +00:00
Zhang Xianyi
6e7be06e07
Refs JuliaLang/julia#5728 . Fix gemv performance bug on Haswell Mac OSX.
...
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
2016-02-19 17:56:07 -05:00
Zhang Xianyi
d06b92906a
Add gemm3m building for CMake.
2016-02-12 05:02:51 +08:00
Zhang Xianyi
962376664d
Refs #768 . Swap the result of zdot x87 fp kernel.
2016-02-02 09:15:02 +08:00
Zhang Xianyi
c44ff4d648
Refs #714 . avoid compiling warnings.
2016-01-28 04:38:07 +08:00
Werner Saar
63a7d7fb24
updated gemv_n_vfpv3.S for armv7
2016-01-25 15:00:13 +01:00
Werner Saar
b4ede558a5
updated nrm2 kernel for armv7
2016-01-25 11:55:25 +01:00
Werner Saar
de3e2d4349
updated trmm kernels for armv7
2016-01-25 11:08:56 +01:00
Werner Saar
a0e51e96f1
updated gemm kernels for armv7
2016-01-25 10:46:10 +01:00
Werner Saar
c2891330bc
updated KERNEL.ARMV6
2016-01-24 17:12:07 +01:00
Werner Saar
ceaa931e48
updated gemv kernel for armv6
2016-01-24 16:31:19 +01:00
Werner Saar
eaa63165df
updated cgemv and zgemv kernels for armv6
2016-01-24 14:42:38 +01:00
Werner Saar
c65357c566
updated trmm_kernels for armv6
2016-01-24 13:03:33 +01:00
Werner Saar
e63e9f9f26
updated gemm_kernels for armv6
2016-01-24 11:55:50 +01:00
Werner Saar
aafd3ab60e
updated cdot and zdot on arm
2016-01-24 10:56:49 +01:00
Werner Saar
d2f84c9c8a
Ref #740 : updated nrm2_vfp.S
2016-01-23 17:47:58 +01:00
Werner Saar
ca32253f32
Ref #740 : updated asum_vfp.S and iamax_vfp.S
2016-01-23 14:44:34 +01:00
Werner Saar
9066d1f982
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
2016-01-23 11:59:51 +01:00
Werner Saar
692d9c881c
Ref #740 : simple solution to clear floating point register on arm
2016-01-17 15:37:12 +01:00
Zhang Xianyi
3602a2cd1f
#736 Revert #733 patch to fix bus error on ARM.
2016-01-12 22:19:58 +00:00
Zhang Xianyi
e3e20e2242
Merge pull request #733 from yuyichao/arm-asm
...
Do not use vsub to clear the register values
2016-01-05 19:35:12 -06:00
Yichao Yu
594b9f4c73
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
2016-01-05 16:54:05 +00:00
Werner Saar
c8f2c5d636
added optimized trsm_kernels
2016-01-05 13:05:05 +01:00
Ashwin Sekhar T K
318f0949c3
lapack-test fixes in nrm2 kernels for Cortex A57
2015-11-23 13:43:36 +05:30
Ashwin Sekhar T K
98965da2e8
lapack-test fixes for Cortex A57
2015-11-20 01:15:04 +05:30
Ashwin Sekhar T K
c99c43d51e
Optimized trmm kernels for CORTEXA57
2015-11-09 14:15:54 +05:30
Ashwin Sekhar T K
1397b47197
Optimized zgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
45f78963ac
Optimized cgemm kernel for CORTEXA57
...
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
402443bf9c
Optimized dgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
19fdbee291
Improve the sgemm kernel for CORTEXA57
2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K
3b0cdfab1e
Optimized gemv kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
46efa6a1da
Optimized swap kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
ea1465cdf8
Optimized scal kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
fb4be3b3eb
Optimized rot kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:52 +05:30
Ashwin Sekhar T K
6c2f4ddbcd
Optimized nrm2 kernels for CORTEXA57
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
870c4d49c0
Optimized dot kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
cd7684097c
Optimized copy kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
2690b71b1f
Optimized axpy kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
3e4acedf0e
Optimized asum kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:51 +05:30
Ashwin Sekhar T K
2610752dbb
Optimized iamax kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ashwin Sekhar T K
dbb213655e
Optimized amax kernels for CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ashwin Sekhar T K
f2f8a0fe8b
Adding arm64 target CORTEXA57
...
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
2015-11-09 14:15:50 +05:30
Ralph Campbell
c053559ed9
Minor C code fixes in kernel/arm
2015-11-09 14:15:49 +05:30
Ralph Campbell
55e4332f00
Remove duplicate -D args in kernel/Makefile.L1
2015-11-09 14:15:48 +05:30
Zhang Xianyi
3e8d6ea74f
Init POWER8 kernels by POWER6.
2015-11-03 12:34:23 +08:00
Zhang Xianyi
69363622a8
Fix DYNAMIC_ARCH=1 bug.
2015-10-27 05:10:40 +08:00
Zhang Xianyi
53b6023a6c
Fix cmake bug on MSVC 32-bit.
2015-10-26 14:52:13 -05:00
Zhang Xianyi
309875de3c
Fix cmake bug on x86 32-bit.
...
e.g. Build 32-bit on 64-bit Linux.
cmake -DBINARY=32
2015-10-27 02:54:53 +08:00
Zhang Xianyi
8fade093aa
Fixed cmake bug on Visual Studio.
2015-10-20 14:37:22 -05:00
Zhang Xianyi
96f0bbe067
Fixed cmake bug on haswell.
2015-10-21 02:24:54 +08:00
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
2015-10-20 04:30:55 +08:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
...
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
7df0820160
Use C kernels for s/dgemv on x86.
2015-08-19 08:07:47 -05:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
898fc7552a
Merge pull request #612 from ibmsoe/ppc64le
...
ppc64le platform support (ELF ABI v2)
2015-08-04 16:58:24 -05:00
Zhang Xianyi
ab0a0a75fc
Merge branch 'develop' into cmake
2015-08-03 23:59:01 -05:00
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
...
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
Zhang Xianyi
7ac7e147d4
Fixed cmake building bugs on Linux. Disable LAPACK by default.
2015-08-04 04:37:05 +08:00
Matthew Brandyberry
7ba4fe5afb
ppc64le platform support (ELF ABI v2)
2015-07-21 22:20:19 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
2015-06-13 16:16:29 +02:00
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
2015-06-13 10:28:27 +02:00
Werner Saar
24f58c8bb1
added optimized cscal and zscal kernels for steamroller
2015-05-18 12:40:07 +02:00
Werner Saar
95b1faf667
added optimized cscal and zscal kernels for steamroller and piledriver
2015-05-18 10:50:57 +02:00
Werner Saar
2d9e406050
added optimized cscal kernel for sandybridge
2015-05-18 08:46:06 +02:00
Werner Saar
59083e3ce1
added optimized cscal kernel for bulldozer
2015-05-18 07:33:52 +02:00
wernsaar
685be40339
Merge pull request #571 from wernsaar/develop
...
added optimized cscal and zscal functions
2015-05-17 14:09:14 +02:00
Werner Saar
31c9e399e9
added optimized cscal kernel for haswell
2015-05-17 13:44:09 +02:00
Werner Saar
7de6bb9889
added optimized zscal kernel for bulldozer
2015-05-17 11:45:19 +02:00
Werner Saar
d63034303b
added optimized zscal kernel for haswell
2015-05-16 16:41:45 +02:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
Werner Saar
18e90ee2e3
bugfix: added static to functions
2015-05-13 13:31:26 +02:00
Werner Saar
e00cccc41e
added optimized dscal kernel for piledriver
2015-05-13 13:05:35 +02:00
Werner Saar
73f09bf64f
optimized dscal kernel for increment != 1
2015-05-13 12:14:39 +02:00
Werner Saar
02e772c7e4
added optimized dscal kernel for haswell
2015-05-12 17:19:58 +02:00
Werner Saar
7aee913991
added optimized dscal kernel for sandybridge
2015-05-12 16:27:43 +02:00
Werner Saar
e50a933037
added optimized dscal kernel for bulldozer
2015-05-12 12:28:44 +02:00
Werner Saar
133c11a156
updated dgemv_n kernel for nehalem
2015-04-30 14:38:06 +02:00
Werner Saar
30f52d53df
optimized dgemv_n kernel for haswell
2015-04-30 12:11:39 +02:00
Werner Saar
5e83d80725
optimized dger kernel for sandybridge
2015-04-28 16:58:11 +02:00
Werner Saar
b2e1797dc6
added optimized sger kernel for sandybridge
2015-04-28 15:33:38 +02:00
Werner Saar
e216f686cb
optimized saxpy and daxpy for sandybridge
2015-04-28 10:18:32 +02:00
Werner Saar
fc0e0391f3
bugfixes: replaced int with BLASLONG
2015-04-24 14:30:44 +02:00
Werner Saar
c22068c406
optimized sdot.c for increments != 1
2015-04-24 13:13:20 +02:00
Werner Saar
dee100d0e4
optimized saxpy.c for increments != 1
2015-04-24 11:52:59 +02:00
Werner Saar
0273966abb
optimized daxpy kernel for increments != 1
2015-04-24 11:39:17 +02:00
Werner Saar
3a67daa954
optimized ddot.c for increments != 1
2015-04-24 10:56:55 +02:00
Werner Saar
b4f2153dcd
added optimized ssymv kernels for sandybridge
2015-04-23 12:19:24 +02:00
Werner Saar
1c4b0eeae3
added optimized ssymv kernels for haswell
2015-04-23 10:23:13 +02:00
Werner Saar
1bec9abb9a
added optimized dsymv kernels for sandybridge
2015-04-22 12:09:43 +02:00
Werner Saar
3814bf60d3
added optimized dsymv kernels for haswell
2015-04-22 10:42:50 +02:00
Werner Saar
6d0db0151f
added optimized zaxpy-kernels
2015-04-16 11:19:37 +02:00
Zhang Xianyi
37b9033c90
Merge pull request #543 from jeromerobert/develop
...
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
2015-04-15 11:18:14 -05:00
Werner Saar
13889515b3
added optimized caxpy-kernel for sandybridge
2015-04-15 16:29:25 +02:00
Werner Saar
248c9340c3
added optimized caxpy-kernel for haswell
2015-04-15 15:16:31 +02:00
Werner Saar
e9f33b4ca7
added optimized caxpy-kernel for steamroller
2015-04-15 13:49:23 +02:00
Werner Saar
f5d847122a
updated caxpy_microk_bulldozer-2.c and caxpy.c
2015-04-15 11:59:38 +02:00
Jerome Robert
a4c96eca67
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
...
Refs #478 , #482 , 9798481
, fd9fd42
2015-04-15 11:46:48 +02:00
Werner Saar
baa0363ea2
add optimized ddot-kernel for piledriver
2015-04-14 15:09:13 +02:00
Werner Saar
34ba66606a
add optimized daxpy-kernel for piledriver
2015-04-14 14:23:29 +02:00
Werner Saar
f615dc7603
added optimized saxpy kernel for steamroller
2015-04-14 09:09:39 +02:00
Werner Saar
331c417637
optimized saxpy for piledriver
2015-04-14 08:34:11 +02:00
Werner Saar
d7a17ad85d
optimized sdot-kernel for pilediver
2015-04-13 13:19:21 +02:00
Werner Saar
d35f6c63c2
add optimized daxpy-kernel for steamroller
2015-04-13 12:22:43 +02:00
Werner Saar
166d76e864
added optimized sdot-kernel for steamroller
2015-04-11 08:48:18 +02:00
Werner Saar
f9f127d838
added optimized ddot kernel for steamroller
2015-04-10 16:18:03 +02:00
wernsaar
62231ab337
Merge pull request #538 from wernsaar/develop
...
Added optimized cdot- and zdot-kernels
2015-04-10 16:03:37 +02:00
Werner Saar
3119def9a7
updated cdot and zdot
2015-04-10 11:10:31 +02:00
Werner Saar
33b332372a
add optimized cdot- and zdot-kernel for sandybridge
2015-04-10 09:37:26 +02:00
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
2015-04-09 15:13:52 +02:00
Werner Saar
b57a60dac8
updated cdot and zdot for piledriver
2015-04-09 10:33:46 +02:00
Werner Saar
5c51163972
added optimized cdot- and zdot-kernel for steamroller
2015-04-09 09:45:23 +02:00
Werner Saar
9299d8cfd6
added optimized cdot- and zdot-kernels for bulldozer
2015-04-08 16:29:55 +02:00
Zhang Xianyi
0a3d3b945d
Refs #535 . Fix the wrong vector instruction in sgemm sandy bridge kernel.
2015-04-08 03:55:49 +08:00
Werner Saar
60c6dec6e6
updated some lines for bulldozer
2015-04-06 18:47:16 +02:00
Werner Saar
47898cca35
added optimized saxpy- and daxpy-kernel for sandybridge
2015-04-06 16:05:16 +02:00
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
2015-04-06 12:33:16 +02:00
Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
2015-04-05 20:19:38 +02:00
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
2015-04-05 19:47:05 +02:00
Werner Saar
9707d608d5
removed double definition line
2015-04-05 18:35:34 +02:00
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
2015-04-05 17:57:53 +02:00
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
2015-03-24 15:27:17 -05:00
Hank Anderson
84d90d6ed8
Fixed some compiler errors/warnings for clang.
2015-02-25 11:52:25 -06:00
Hank Anderson
518e2424a8
Fixed bad filename for cpuid.S compile.
2015-02-25 11:51:29 -06:00
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
2015-02-26 01:47:11 +08:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
12d1fb2e40
Fixed incorrect object name in kernel CMakeLists.txt
2015-02-24 10:30:16 -06:00
Hank Anderson
1b7f427401
Added conj gemv objects for complex build.
2015-02-23 10:24:31 -06:00
Hank Anderson
b2284647a3
More complex objects.
2015-02-23 07:51:05 -06:00
Hank Anderson
a6116e5859
Added some more complex-only objects.
2015-02-22 17:49:28 -06:00
Hank Anderson
714638c187
Added some TRMM objects for complex types.
2015-02-19 16:11:51 -06:00
Hank Anderson
e27c372e53
Fixed reuse of float_char from parent loop.
...
Fixed in/it/on/otcopy names.
2015-02-19 13:53:29 -06:00
Hank Anderson
f3f2b3d768
Added complex and single netlib-lapack fortran sources to lapack.cmake.
2015-02-19 12:26:11 -06:00
Hank Anderson
9492298048
Added other float types to Makefile.L3.
2015-02-18 13:01:05 -06:00
Hank Anderson
14fd3d35de
Added checks for missing defines in kernel.
2015-02-18 10:25:01 -06:00
Hank Anderson
cebc07cebd
ParseMakefileVars now recursively parses included makefiles.
2015-02-17 22:09:41 -06:00
Hank Anderson
33c5e8db7f
Added a helper function for setting the L1 kernel defaults.
...
Added loop to build objects with different KERNEL defines.
2015-02-17 21:36:23 -06:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
162791e30e
Added common objects from kernel Makefile.
2015-02-10 12:42:05 -06:00
Hank Anderson
c0624a26be
Fixed some dgemm_copy function names.
2015-02-09 14:34:29 -06:00
Hank Anderson
4bfaf1ce66
Removed some list appends I missed.
2015-02-09 12:56:55 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
f992799226
Added the rest of Makefile.L3.
2015-02-09 10:47:35 -06:00
Hank Anderson
4c65afcce1
Changed kernel filenames to vars. These will need to be read from KERNEL.
...
Added some kernel/L3 objects.
2015-02-09 09:52:14 -06:00
Hank Anderson
7fa5c4e2fd
Fixed some case issues with ARCH.
...
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Hank Anderson
fa0e6a6c93
Added the rest of the L1 kernel makefile.
2015-02-07 21:37:46 -06:00
Hank Anderson
38681fb1c6
Added more kernel files.
2015-02-07 12:54:30 -06:00
Hank Anderson
189fadfde0
Started implementing kernel/Makefile in cmake.
2015-02-05 21:05:11 -06:00
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
2015-01-12 08:55:29 +00:00
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
...
Allow to do gemv and ger buffer allocation on the stack
2015-01-02 02:26:17 +08:00
Werner Saar
ddf983d643
added optimizations for steamroller
2014-12-30 20:14:45 +08:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
2014-12-22 17:01:18 +01:00
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
2014-12-18 20:35:51 +01:00
Werner Saar
bc5fff7085
changed inline assembler labels to short form
2014-12-07 12:38:54 +01:00
Zhang Xianyi
0cf29ba6d2
Fixed a bug of sgemm sandy bridge kernel.
...
Reported by Julia project. JuliaLang/julia#9084
2014-12-03 17:38:41 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
2014-11-24 15:34:48 +08:00
Benedikt Huber
58c90d5937
# The first commit's message is:
...
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
b7c9566eea
removed obsolete gemv kernel files
2014-09-14 11:00:53 +02:00
wernsaar
6df1b0be81
optimized zgemv_n_microk_sandy-4.c
2014-09-14 10:21:22 +02:00
wernsaar
2ac1e076c1
added optimized zgemv_n kernel for sandybridge
2014-09-14 09:02:05 +02:00
wernsaar
9908b6031c
bugfix in KERNEL.PILEDRIVER
2014-09-13 16:26:53 +02:00
wernsaar
8f100a14f2
optimized cgemv_t kernel for haswell
2014-09-13 16:13:27 +02:00
wernsaar
53b5726b04
added optimized cgemv_t kernel for haswell
2014-09-13 15:14:12 +02:00
wernsaar
1a352b24e6
updated KERNEL.HASWELL
2014-09-13 12:23:27 +02:00
wernsaar
5194818d4b
updated zgemv_t_4.c
2014-09-13 09:48:34 +02:00
wernsaar
8a39cdb1c1
added optimized zgemv_t kernel for haswell
2014-09-13 09:47:07 +02:00
wernsaar
0a1390f2d8
enabled optimized zgemv_t kernel for bulldozer
2014-09-12 17:43:47 +02:00
wernsaar
a8b0812feb
optimized zgemv_t for bulldozer
2014-09-12 17:42:25 +02:00
wernsaar
a0fb68ab42
added optimized zgemv_t kernel for bulldozer
2014-09-12 17:04:22 +02:00
wernsaar
44c11165d5
bugfix in cgemv_t_4.c
2014-09-12 14:12:24 +02:00
wernsaar
564be4eb72
added optimized cgemv_t kernel
2014-09-12 13:38:01 +02:00
wernsaar
107c3ea7d5
added optimized zgemv_t routine
2014-09-12 12:35:20 +02:00
wernsaar
bb8d698335
optimized zgemv_n_microk_haswell-4.c for small size
2014-09-11 13:44:55 +02:00
wernsaar
e0192a6914
bugfix in zgemv_n_4.c
2014-09-11 13:18:00 +02:00
wernsaar
bced4594bb
added optimized zgemv_n kernel
2014-09-11 12:34:57 +02:00
wernsaar
cafba99b6b
bufix in cgemv_n_microk_haswell-4.c
2014-09-11 11:12:44 +02:00
wernsaar
ac8f232b2a
more optimizations
2014-09-11 10:25:48 +02:00
wernsaar
f98e1244c4
optimized cgemv_n_4.c
2014-09-10 19:26:14 +02:00
wernsaar
be95700b30
added optimized cgemv_kernel for haswell
2014-09-10 14:11:24 +02:00
wernsaar
4aa534ae93
added cgemv_n kernel, optimized for small sizes
2014-09-10 13:45:13 +02:00
wernsaar
baa46e4fba
added and tested optimized dgemv_n kernel for haswell
2014-09-09 16:17:45 +02:00
wernsaar
faab7a181d
added optimized dgemv_n kernel for haswell
2014-09-09 15:32:32 +02:00
wernsaar
8109d8232c
optimized dgemv_t kernel for haswell
2014-09-09 14:38:08 +02:00
wernsaar
debc6d1a05
bugfix in KERNEL.HASWELL
2014-09-09 14:04:44 +02:00
wernsaar
e73a0113ec
added optimized gemv kernels
2014-09-09 13:54:55 +02:00
wernsaar
44f2bf9bae
added optimized dgemv_t kernel for haswell
2014-09-09 13:34:22 +02:00
wernsaar
cd34e9701b
removed obsolete files
2014-09-08 19:15:31 +02:00
wernsaar
658939faaa
optimized dgemv_n kernel for small sizes
2014-09-08 15:22:35 +02:00
wernsaar
c4d9d4e5f8
added haswell optimized kernel
2014-09-08 12:25:16 +02:00
wernsaar
7c0a94ff47
bugfix in sgemv_n_microk_haswell-4.c
2014-09-08 10:54:33 +02:00
wernsaar
cbbc80aad3
added optimized sgemv_t kernel for haswell
2014-09-08 10:13:39 +02:00
wernsaar
2be5c7a640
bugfix for windows
2014-09-07 21:48:42 +02:00
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
2014-09-07 21:13:57 +02:00
wernsaar
553e275407
optimized sgemv_n kernel for sandybridge
2014-09-07 20:53:30 +02:00
wernsaar
7b3932b3f3
optimized sgemv_n kernel for nehalem
2014-09-07 19:20:08 +02:00
wernsaar
75207b1148
optimized sgemv_n for very small size of m
2014-09-07 18:23:48 +02:00
wernsaar
274828fa50
optimizations for very small sizes
2014-09-07 13:45:03 +02:00
wernsaar
5ae1731fe6
better optimzations for sgemv_t kernel
2014-09-06 21:28:57 +02:00
wernsaar
c8eaf3ae2d
optimized sgemv_t_4 kernel for very small sizes
2014-09-06 19:41:57 +02:00
wernsaar
3a7ab47ee9
optimized sgemv_t
2014-09-06 18:34:25 +02:00
wernsaar
cf5544b417
optimization for small size
2014-09-06 13:17:56 +02:00
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
2014-09-06 12:08:48 +02:00
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
2014-09-06 08:41:53 +02:00
wernsaar
6df7a88930
optimized sgemv_t for sandybridge
2014-09-05 10:22:50 +02:00
wernsaar
53de943690
bugfix for sgemv_n_4.c
2014-09-04 18:55:52 +02:00
wernsaar
7f910010a0
optimized sgemv_n kernel for small sizes
2014-09-04 13:09:27 +02:00
wernsaar
3a5d8dbff9
optimized sgemv_n_4.c
2014-09-03 15:34:30 +02:00
wernsaar
2a60c6d4b0
optimized sgemv_n for small sizes
2014-09-03 14:48:45 +02:00
wernsaar
0fc560ba23
bugfix for buffer overflow
2014-09-03 10:13:47 +02:00
wernsaar
f3b50dcf5b
removed obsolete instructions from sgemv_t_4.c
2014-09-02 13:35:41 +02:00
wernsaar
93eaba959d
optimized sgemv_t for bulldozer
2014-09-02 12:42:36 +02:00
wernsaar
9570e56965
optimized sgemv_t_4.c for small sizes
2014-09-01 15:11:37 +02:00
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
2014-08-31 14:33:15 +02:00
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
2014-08-31 13:23:44 +02:00
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
2014-08-30 13:36:27 +02:00
wernsaar
20cd850125
modification for clang compiler
2014-08-27 09:00:20 +02:00
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
2014-08-25 15:52:35 +02:00
wernsaar
ee74445155
added optimized caxpy kernel for bulldozer
2014-08-25 14:53:28 +02:00
wernsaar
9d2ace8bac
added optimized daxpy kernel for bulldozer
2014-08-24 10:57:12 +02:00
wernsaar
b55f997302
added optimized daxpy kernel for nehalem
2014-08-23 17:53:07 +02:00
wernsaar
e45c960c2c
added optimized saxpy kernel for nehalem
2014-08-23 17:15:21 +02:00
wernsaar
ac76b6267f
added optimized dgemv_n kernel for nehalem
2014-08-23 10:40:57 +02:00
wernsaar
f1b96c4846
added optimized ddot kernel for bulldozer
2014-08-22 21:19:29 +02:00
wernsaar
16d6be852d
added optimized ddot kernel for nehalem
2014-08-22 20:34:41 +02:00
wernsaar
95a707ced3
update of KERNEL.BULLDOZER
2014-08-22 17:01:27 +02:00
wernsaar
5d97b0754c
added optimized sdot kernel for nehalem
2014-08-22 17:00:26 +02:00
wernsaar
8a9e868919
added optimized sdot for bulldozer
2014-08-22 14:29:17 +02:00
wernsaar
c8b0645266
added optimized symv_L kernels for nehalem
2014-08-21 14:27:00 +02:00
wernsaar
ec05ff3f64
added optimized ssymv_L kernel for bulldozer
2014-08-21 13:32:06 +02:00
wernsaar
f6f9122660
added optimized dsymv_L kernel for bulldozer
2014-08-21 13:02:53 +02:00
wernsaar
8247f38dc1
added optimized dsymv_U kernel for nehalem
2014-08-20 09:58:04 +02:00
wernsaar
ef6374196d
updated optimized dsymv_U kernel for bulldozer
2014-08-20 09:00:56 +02:00
wernsaar
f824c2b751
updated optimized ssymv_U for bulldozer
2014-08-19 19:25:03 +02:00
wernsaar
4ba4ab623f
added optimized ssymv_U kernel for nehalem
2014-08-19 17:09:45 +02:00
wernsaar
4f39447c05
added optimized ssymv_U kernel for bulldozer
2014-08-18 13:52:24 +02:00
wernsaar
74c9465672
added optimized dsymv_U kernel for bulldozer
2014-08-18 12:18:10 +02:00
wernsaar
101dd08173
add reference in C for symv_U
2014-08-16 13:52:50 +02:00
wernsaar
493d4fe7e5
added reference in C for symv_L
2014-08-16 11:36:48 +02:00
wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
c1a6374c6f
optimized zgemv_n kernel for sandybridge
2014-08-13 16:10:03 +02:00
wernsaar
2470129132
added fast return, if m or n < 1
2014-08-13 13:54:19 +02:00
wernsaar
8c582d362d
optimized zgemv_t_microk_haswell-2.c
2014-08-13 13:42:22 +02:00
wernsaar
11e34ddd1b
bugfix for zgemv_n_microk_haswell-2.c
2014-08-13 12:54:18 +02:00
wernsaar
9528f0d9ee
bugfix in zgemv_n_microk_sandy-2.c
2014-08-13 12:18:03 +02:00
wernsaar
b06550519e
added optimized cgemv_t c-kernel
2014-08-12 12:15:41 +02:00
wernsaar
6093ee5363
bugfix in zgemv_n_microk_haswell-2.c
2014-08-12 10:02:25 +02:00
wernsaar
07c66b1960
modified algorithm for better numerical stability
2014-08-12 08:35:42 +02:00
wernsaar
58b075daef
added optimized zgemv_t kernel for haswell
2014-08-11 16:57:52 +02:00
wernsaar
09fcd3a341
add optimized zgemv_t kernel for bulldozer
2014-08-11 14:19:25 +02:00
wernsaar
726ad085cb
added optimized zgemv_t for haswell
2014-08-11 13:10:12 +02:00
wernsaar
6fe416976d
added optimimized zgemv_t c-kernel
2014-08-11 09:13:18 +02:00
wernsaar
dbc2eff029
disabled optimized haswell zgemv_n kernel for windows ( bad rounding )
2014-08-10 11:57:24 +02:00
wernsaar
462b4885ff
added optimized zgemv_n kernel for haswell
2014-08-10 08:39:17 +02:00
wernsaar
aa54fe064c
added zgemv_n c-function
2014-08-07 22:30:20 +02:00
wernsaar
006ef3ea01
added optimized dgemv_t kernel for haswell
2014-08-07 10:08:54 +02:00
wernsaar
60f17628cc
added optimized dgemv_n kernel for haswell
2014-08-07 09:18:02 +02:00
wernsaar
c9bad1403a
added optimized sgemv_t kernel for sandybridge
2014-08-07 07:49:33 +02:00
wernsaar
2f8927376f
enabled optimized nehalem sgemv_t kernel for windows
2014-08-06 16:58:21 +02:00
wernsaar
d945a2b06d
added optimized sgemv_t kernel for nehalem
2014-08-06 16:21:48 +02:00
wernsaar
ca6c8d06ce
enabled optimized sgemv kernels for windows
2014-08-06 14:24:36 +02:00
wernsaar
7aa43c8928
enabled optimized sgemv kernels for windows
2014-08-06 14:06:30 +02:00
wernsaar
891b960854
added optimized sgemv_t kernel for haswell
2014-08-06 13:42:41 +02:00