Commit Graph

817 Commits

Author SHA1 Message Date
Andrew Pinski 8fdb0655e9 THUNDERX: Add an optimized version of ddot 2017-01-10 15:01:37 +05:30
Andrew Pinski fb200c7245 ARM64: Add Cavium THUNDERX Target 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 0b8e876d89 VULCAN: Add optimized DGEMM implementation 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 4713e7c47f ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K 6085386b10 CORTEXA57: Add assembly kernels for copy routines 2017-01-10 15:01:05 +05:30
kaustubh 1480f3df71 Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:27:23 +05:30
kaustubh 88afb3bc94 Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:22:09 +05:30
Zhang Xianyi b678471d65 Merge branch 'z13' into develop
Conflicts:
	CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi 864e202afd Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3 2017-01-09 05:48:09 -05:00
Abdurrauf 6418667818 dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
Shivraj Patil a9bf8a781a Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-12-27 11:33:51 +05:30
kaustubh 5f93aa5f87 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-14 14:05:11 +05:30
kaustubh 9db451acd0 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 14:02:14 +05:30
kaustubh 3eaff85191 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 11:41:17 +05:30
kaustubh 00abce3b93 Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-11-22 11:21:03 +05:30
Andrew becf8bc7a0 remove dead code 2016-10-31 12:46:56 +01:00
kaustubh f3419e634c SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-17 18:29:38 +05:30
Zhang Xianyi 7472c79ea6 Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
2016-10-17 11:33:16 +08:00
kaustubh 90e2321ac3 STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-14 16:41:28 +05:30
Martin Kroeker 4998e19869 Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
Martin Kroeker 91610f3835 Update zdot_msa.c 2016-10-05 18:59:09 +02:00
Martin Kroeker 6e22ecf102 Update zdot.c 2016-10-05 18:58:03 +02:00
Martin Kroeker 6221d6df5f Update zdot.c 2016-10-05 18:57:14 +02:00
Martin Kroeker 16446d1d23 Remove explicit include of complex.h 2016-09-29 23:45:56 +02:00
Martin Kroeker a6e9e0b94b Remove explicit include of complex.h 2016-09-29 23:43:28 +02:00
Martin Kroeker 3178e4fea0 Remove explicit include of complex.h 2016-09-29 23:41:43 +02:00
Martin Kroeker 95c245ddb0 Remove explicit include of complex.h 2016-09-29 23:40:36 +02:00
Martin Kroeker 4b1b27347f Remove explicit include of complex.h 2016-09-29 23:39:35 +02:00
Shivraj Patil 54747fe24a DGEMM function split and data prefech
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Zhang Xianyi 515bc56ea9 Refs #946. Use nrm2 reference implementation for Power8. 2016-08-18 18:59:43 -07:00
Zhang Xianyi ae70b916f4 Refs #929. Deal with zero and NaNs for scale. 2016-08-18 10:24:42 -07:00
Shivraj Patil 9687437928 MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil d1c6469283 MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Ashwin Sekhar T K c54a29bb48 Cortex A57: Improvements to DGEMM 8x4 kernel 2016-07-26 10:58:21 +05:30
Shivraj Patil beb1d076a4 Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi 8a592ee386 Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K 0a5ff9f9f9 Improvements to TRMM and GEMM kernels 2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K 8a40f1355e Improvements to GEMV kernels 2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K 78782485b6 Improvements to COPY and IAMAX kernels 2016-07-14 13:49:34 +05:30
Shivraj Patil 57df7956ee Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi 4a30a2584a Merge pull request #897 from ksraste/develop
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
mdong 098d8ec5d6 remove input from clobbered list 2016-06-24 16:37:58 -04:00
Werner Saar f04af36ad0 Merge pull request #898 from wernsaar/develop
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Kaustubh Raste 011431b9d7 STRSM optimized for MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30
Kaustubh Raste c8a7860eb3 STRSM optimized
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Zhang Xianyi 2daad2bcb5 Merge pull request #893 from biddisco/develop
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
2016-05-30 14:52:58 +08:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Aleksey Kuleshov fca66262c4 mips64/axpy: fix error when INCY == 0 2016-05-23 13:30:27 +03:00
Werner Saar 412bcd187a optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S 2016-05-23 11:20:41 +02:00
Werner Saar bd06b246cc Merge pull request #890 from wernsaar/develop
optimized dtrsm_kernel_LT for POWER8
2016-05-22 16:01:35 +02:00
Werner Saar 8b140220c8 optimized dtrsm_kernel_LT for POWER8 2016-05-22 15:20:04 +02:00
Werner Saar 8fb5a1aaff added optimized dtrsm_LT kernel for POWER8 2016-05-22 13:09:05 +02:00
Kaustubh Raste ad9f317870 STRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-20 10:59:03 +05:30
Shivraj Patil c4ba40e308 SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-19 11:04:42 +05:30
Zhang Xianyi 7a19065369 Merge pull request #878 from ksraste/develop
DTRSM bug fix for MIPS P5600 and I6400
2016-05-19 11:16:43 +08:00
Werner Saar 6a2bde7a2d optimized dgemm and dgetrf for POWER8 2016-05-17 14:45:27 +02:00
Kaustubh Raste d7cbc7ac13 DTRSM bug fix for MIPS P5600 and I6400
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-17 15:48:02 +05:30
Werner Saar 88011f625d Merge pull request #876 from wernsaar/develop
optimized dgemm on power8 for 20 threads
2016-05-16 14:52:40 +02:00
Werner Saar 8310d4d3f7 optimized dgemm for 20 threads 2016-05-16 14:14:25 +02:00
Kaustubh Raste edb5980c13 DTRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-09 15:15:26 +05:30
Shivraj Patil 085cf236c2 conflict resolved by syncing with 'xianyi:develop'
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-04 11:07:14 +05:30
Shivraj Patil b7b3d8ec8e DGEMM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-03 14:42:26 +05:30
Zhang Xianyi cd7af5260a Merge pull request #847 from sva-img/develop
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
2016-04-29 11:44:36 -04:00
Werner Saar 56948dbf0f optimized dgemm for POWER8 2016-04-29 12:52:47 +02:00
Werner Saar 0d0c6f7d7d optimized dgemm for POWER8 2016-04-27 14:01:08 +02:00
Werner Saar 298b13bba4 updated some kernel files for EXCAVATOR 2016-04-25 10:36:23 +02:00
Werner Saar 78b05f6476 bugfix for EXCAVATOR and DYNAMIC_ARCH 2016-04-25 10:13:30 +02:00
Werner Saar a3da10662f added sgemm_tcopy_8_power8.S 2016-04-23 10:04:41 +02:00
Werner Saar d46f07bb4e added cgemm_tcopy_8_power8.S 2016-04-23 07:37:18 +02:00
Werner Saar 879a51165f Optimized zgemm and tested zgemm again 2016-04-22 13:07:12 +02:00
Shivraj Patil 2c3dfe2bf3 MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Seperated mips and mips64 files.
Configurations support for mips 32 bit.

Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-04-22 14:03:18 +05:30
Werner Saar 9276c9012f Optimized sgemm and dgemm and tested again. 2016-04-21 11:37:57 +02:00
wernsaar 6fbca2a4a1 Merge pull request #845 from wernsaar/develop
optimized sgemm for power8
2016-04-20 13:44:22 +02:00
Werner Saar 0001260f4b optimized sgemm 2016-04-20 13:06:38 +02:00
Werner Saar 3c6294ca3d added optimized sgemm_tcopy for power8 2016-04-19 16:08:54 +02:00
Zhang Xianyi dd43661cfd Init IBM z system (s390x) porting. 2016-04-15 18:02:24 -04:00
Zhang Xianyi f24d5307cf Refs #834. Fix zgemv config bug on Steamroller. 2016-04-12 22:26:11 +08:00
Werner Saar 8037d78eed bugfix for arm scal.c and zscal.c 2016-04-11 11:21:36 +02:00
wernsaar 0a4276bc2f Merge pull request #837 from wernsaar/develop
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 11:13:27 +02:00
Werner Saar e173c51c04 updated zgemm- and ztrmm-kernel for POWER8 2016-04-08 09:05:37 +02:00
Werner Saar 9c42f0374a Updated cgemm- and sgemm-kernel for POWER8 SMP 2016-04-07 15:08:15 +02:00
Zhang Xianyi d4380c1fe4 Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver. 2016-04-07 01:44:18 +08:00
Werner Saar a51102e9b7 bugfixes for sgemm- and cgemm-kernel 2016-04-06 11:15:21 +02:00
Werner Saar c5b1fbcb2e updated optimized cgemm- and ctrmm-kernel for POWER8 2016-04-04 09:12:08 +02:00
Werner Saar d4c0330967 updated cgemm- and ctrmm-kernel for POWER8 2016-04-03 14:30:49 +02:00
Werner Saar 6a9bbfc227 updated sgemm- and strmm-kernel for POWER8 2016-04-02 17:16:36 +02:00
Werner Saar 68a69c5b50 added optimized dgemv_n kernel for POWER8 2016-03-30 11:10:53 +02:00
Werner Saar c2464a7c4a added optimized casum kernel for POWER8 2016-03-28 14:12:08 +02:00
Werner Saar 294f933869 added optimized zasum kernel for POWER8 2016-03-28 13:37:32 +02:00
Werner Saar f59c9bd6ef added optimized sasum kernel for POWER8 2016-03-28 12:44:25 +02:00
Werner Saar c53be46d78 added optimized dasum kernel for POWER8 2016-03-28 12:17:15 +02:00
Werner Saar 659ed16591 added otimized cswap and zswap kernels for POWER8 2016-03-27 18:31:37 +02:00
Werner Saar 35c98a3556 added optimized zscal kernel for POWER8 2016-03-27 16:31:50 +02:00
Werner Saar f1a5dd06c5 added optimized sscal kernel for POWER8 2016-03-27 11:05:56 +02:00
wernsaar e125a3dc33 Merge pull request #824 from wernsaar/develop
added optimized drot-kernel and srot-kernel for POWER8
2016-03-27 10:43:17 +02:00
Werner Saar 35f1f21a7f added drot- and srot-kernel optimimized for POWER8 2016-03-27 08:57:11 +02:00
Zhang Xianyi 7b4b7179ba Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
Cortex-A57: Fixes and Optimizations
2016-03-27 00:04:20 -04:00
Werner Saar 3d9a50e841 added optimized sswap kernel for POWER8 2016-03-25 17:34:55 +01:00
Werner Saar 828c849b44 added optimized ccopy kernel for POWER8 2016-03-25 16:54:25 +01:00
Werner Saar ecc0bc9813 added optimized scopy kernel for POWER8 2016-03-25 16:06:56 +01:00