1689 lines
		
	
	
		
			70 KiB
		
	
	
	
		
			TeX
		
	
	
	
			
		
		
	
	
			1689 lines
		
	
	
		
			70 KiB
		
	
	
	
		
			TeX
		
	
	
	
| \documentclass[11pt]{report}
 | |
| 
 | |
| \usepackage{indentfirst}
 | |
| \usepackage[body={6in,8.5in}]{geometry}
 | |
| \usepackage{hyperref}
 | |
| \usepackage{graphicx}
 | |
| \DeclareGraphicsRule{.ps}{eps}{}{}
 | |
| 
 | |
| \renewcommand{\thesection}{\arabic{section}}
 | |
| \setcounter{tocdepth}{3}
 | |
| \setcounter{secnumdepth}{3}
 | |
| 
 | |
| \begin{document}
 | |
| \begin{center}
 | |
|   {\Large LAPACK Working Note 81\\
 | |
|   Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
 | |
|  supported by NSF Grant No. ASC-8715728  and NSF Grant No. 0444486}}
 | |
| \end{center}
 | |
| \begin{center}
 | |
| %  Edward Anderson\footnote{Current address:  Cray Research Inc.,
 | |
| %                           655F Lone Oak Drive, Eagan, MN  55121},
 | |
|   The LAPACK Authors\\
 | |
|   Department of Computer Science \\
 | |
|   University of Tennessee \\
 | |
|   Knoxville, Tennessee  37996-1301 \\
 | |
| \end{center}
 | |
| \begin{center}
 | |
|   REVISED:  VERSION 3.1.1, February 2007 \\
 | |
|   REVISED:  VERSION 3.2.0, November 2008
 | |
| \end{center}
 | |
| 
 | |
| \begin{center}
 | |
| Abstract
 | |
| \end{center}
 | |
| This working note describes how to install, and test version 3.2.0
 | |
| of LAPACK, a linear algebra package for high-performance
 | |
| computers, on a Unix System.  The timing routines are not actually included in
 | |
| release 3.2.0, and that part of the LAWN refers to release 3.0.  Also,
 | |
| version 3.2.0 contains many prototype routines needing user feedback.
 | |
| Non-Unix installation instructions and
 | |
| further details of the testing and timing suites are only contained in
 | |
| LAPACK Working Note 41, and not in this abbreviated version.
 | |
| %Separate instructions are provided for the Unix and non-Unix
 | |
| %versions of the test package.
 | |
| %Further details are also given on the design of the test and timing
 | |
| %programs. 
 | |
| \newpage
 | |
| 
 | |
| \tableofcontents
 | |
| 
 | |
| \newpage
 | |
| % Introduction to Implementation Guide
 | |
| 
 | |
| \section{Introduction}
 | |
| 
 | |
| LAPACK is a linear algebra library for high-performance
 | |
| computers.
 | |
| The library includes Fortran subroutines for 
 | |
| the analysis and solution of systems of simultaneous linear algebraic
 | |
| equations, linear least-squares problems, and matrix eigenvalue
 | |
| problems.
 | |
| Our approach to achieving high efficiency is based on the use of
 | |
| a standard set of Basic Linear Algebra Subprograms (the BLAS),
 | |
| which can be optimized for each computing environment.
 | |
| By confining most of the computational work to the BLAS,
 | |
| the subroutines should be 
 | |
| transportable and efficient across a wide range of computers.
 | |
| 
 | |
| This working note describes how to install, test, and time this
 | |
| release of LAPACK on a Unix System.
 | |
| 
 | |
| The instructions for installing, testing, and timing 
 | |
| \footnote{timing are only provided in LAPACK 3.0 and before} 
 | |
| are designed for a person whose
 | |
| responsibility is the maintenance of a mathematical software library.
 | |
| We assume the installer has experience in compiling and running 
 | |
| Fortran programs and in creating object libraries.
 | |
| The installation process involves untarring the file, creating a set of
 | |
| libraries, and compiling and running the test and timing programs 
 | |
| \footnotemark[\value{footnote}]. 
 | |
| 
 | |
| %This guide combines the instructions for the Unix and non-Unix
 | |
| %versions of the LAPACK test package (the non-Unix version is in Appendix
 | |
| %~\ref{appendixe}).
 | |
| %At this time, the non-Unix version of LAPACK can only be obtained
 | |
| %after first untarring the Unix tar tape and then following the instructions in
 | |
| %Appendix ~\ref{appendixe}.
 | |
| 
 | |
| Section~\ref{fileformat} describes how the files are organized in the
 | |
| file, and
 | |
| Section~\ref{overview} gives a general overview of the parts of the test package.
 | |
| Step-by-step instructions appear in Section~\ref{installation}.
 | |
| %for the Unix version and in the appendix for the non-Unix version.
 | |
| 
 | |
| For users desiring additional information, please refer to LAPACK
 | |
| Working Note 41.
 | |
| % Sections~\ref{moretesting}
 | |
| %and ~\ref{moretiming} give
 | |
| %details of the test and timing programs and their input files.
 | |
| %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe 
 | |
| %the LAPACK routines and auxiliary routines provided
 | |
| %in this release.  
 | |
| %Appendix ~\ref{appendixc} lists the operation counts we have computed 
 | |
| %for the BLAS and for some of the LAPACK routines.
 | |
| Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known 
 | |
| problems from our own experiences, with suggestions on how to 
 | |
| overcome them.
 | |
| 
 | |
| \textbf{It is strongly advised that the user read Appendix
 | |
| A before proceeding with the installation process.}
 | |
| %Appendix E contains the execution times of the different test
 | |
| %and timing runs on two sample machines.
 | |
| %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
 | |
| %system.
 | |
| 
 | |
| \section{Revisions Since the First Public Release}
 | |
| 
 | |
| Since its first public release in February, 1992, LAPACK has had
 | |
| several updates, which have encompassed the introduction of new routines
 | |
| as well as extending the functionality of existing routines.  The first
 | |
| update,
 | |
| June 30, 1992, was version 1.0a; the second update, October 31, 1992,
 | |
| was version 1.0b; the third update, March 31, 1993, was version 1.1;
 | |
| version 2.0 on September 30, 1994, coincided with the release of the
 | |
| Second Edition of the LAPACK Users' Guide; 
 | |
| version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
 | |
| the LAPACK Users' Guide; 
 | |
| version 3.1 was released on November, 2006;
 | |
| version 3.1.1 was released on November, 2007;
 | |
| and version 3.2.0 was released on November, 2008.
 | |
| 
 | |
| All LAPACK routines reflect the current version number with the date
 | |
| on the routine indicating when it was last modified.
 | |
| For more information on revisions in the latest release, please refer
 | |
| to the \texttt{revisions.info} file in the lapack directory on netlib.
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/lapack/revisions.info}
 | |
| \end{quote}
 | |
| 
 | |
| %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
 | |
| %available on netlib is always the most up-to-date.
 | |
| %
 | |
| %On-line manpages (troff files) for LAPACK driver and computational
 | |
| %routines, as well as most of the BLAS routines, are available via 
 | |
| %the \texttt{lapack} index on netlib.
 | |
| 
 | |
| \section{File Format}\label{fileformat}
 | |
| 
 | |
| The software for LAPACK is distributed in the form of a
 | |
| gzipped tar file (via anonymous ftp or the World Wide Web),
 | |
| which contains the Fortran source for LAPACK,
 | |
| the Basic Linear Algebra Subprograms
 | |
| (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
 | |
| and the timing programs\footnotemark[\value{footnote}]. 
 | |
| Users who wish to have a non-Unix installation should refer to LAPACK
 | |
| Working Note 41,
 | |
| although the overview in section~\ref{overview} applies to both the Unix and non-Unix
 | |
| versions.
 | |
| %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
 | |
| %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
 | |
| %versions.
 | |
| 
 | |
| The package may be accessed via the World Wide Web through
 | |
| the URL address:
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/lapack/lapack.tgz}
 | |
| \end{quote}
 | |
| 
 | |
| Or, you can retrieve the file via anonymous ftp at netlib:
 | |
| 
 | |
| \begin{verbatim}
 | |
|      ftp ftp.netlib.org
 | |
|      login:  anonymous
 | |
|      password:  <your email address>
 | |
|      cd lapack
 | |
|      binary
 | |
|      get lapack.tgz
 | |
|      quit
 | |
| \end{verbatim}
 | |
| 
 | |
| The software in the \texttt{tar} file
 | |
| is organized in a number of essential directories as shown
 | |
| in Figure 1.  Please note that this figure does not reflect everything
 | |
| that is contained in the \texttt{LAPACK} directory.  Input and instructional
 | |
| files are also located at various levels.
 | |
| \begin{figure}
 | |
| \vspace{11pt}
 | |
| \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
 | |
| \caption{Unix organization of LAPACK 3.0}
 | |
| \vspace{11pt}
 | |
| \end{figure}
 | |
| Libraries are created in the LAPACK directory and 
 | |
| executable files are created in one of the directories BLAS, TESTING,
 | |
| or TIMING\footnotemark[\value{footnote}].  Input files for the test and 
 | |
| timing\footnotemark[\value{footnote}]  programs are also
 | |
| found in these three directories so that testing may be carried out
 | |
| in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
 | |
| A top-level makefile in the LAPACK directory is provided to perform the 
 | |
| entire installation procedure.
 | |
| 
 | |
| \section{Overview of Tape Contents}\label{overview}
 | |
| 
 | |
| Most routines in LAPACK occur in four versions: REAL,
 | |
| DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
 | |
| The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
 | |
| are written in standard Fortran and are completely portable;
 | |
| the COMPLEX*16 version is provided for
 | |
| those compilers which allow this data type.
 | |
| Some routines use features of Fortran 90.
 | |
| For convenience, we often refer to routines by their single precision
 | |
| names; the leading `S' can be replaced by a `D' for double precision,
 | |
| a `C' for complex, or a `Z' for complex*16.  
 | |
| For LAPACK use and testing you must decide which version(s)
 | |
| of the package you intend to install at your site (for example,
 | |
| REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
 | |
| COMPLEX*16 on an IBM computer).
 | |
| 
 | |
| \subsection{LAPACK Routines}
 | |
| 
 | |
| There are three classes of LAPACK routines:
 | |
| \begin{itemize}
 | |
| 
 | |
| \item \textbf{driver} routines solve a complete problem, such as solving
 | |
| a system of linear equations or computing the eigenvalues of a real
 | |
| symmetric matrix.  Users are encouraged to use a driver routine if there
 | |
| is one that meets their requirements.  The driver routines are listed
 | |
| in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
 | |
| %in Appendix ~\ref{appendixa}.
 | |
| 
 | |
| \item \textbf{computational} routines, also called simply LAPACK routines,
 | |
| perform a distinct computational task, such as computing
 | |
| the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
 | |
| eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
 | |
| the $QR$ algorithm. 
 | |
| The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
 | |
| and the LAPACK Users' Guide~\cite{LUG}.
 | |
| %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
 | |
| %Working Note \#5 \cite{WN5}.
 | |
| 
 | |
| \item \textbf{auxiliary} routines are all the other subroutines called
 | |
| by the driver routines and computational routines.  
 | |
| %Among them are subroutines to perform subtasks of block algorithms,
 | |
| %in particular, the unblocked versions of the block algorithms;
 | |
| %extensions to the BLAS, such as matrix-vector operations involving
 | |
| %complex symmetric matrices;
 | |
| %the special routines LSAME and XERBLA which first appeared with the
 | |
| %BLAS;
 | |
| %and a number of routines to perform common low-level computations,
 | |
| %such as computing a matrix norm, generating an elementary Householder
 | |
| %transformation, and applying a sequence of plane rotations.
 | |
| %Many of the auxiliary routines may be of use to numerical analysts
 | |
| %or software developers, so we have documented the Fortran source for
 | |
| %these routines with the same level of detail used for the LAPACK
 | |
| %routines and driver routines.
 | |
| The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
 | |
| and the LAPACK Users' Guide~\cite{LUG}.
 | |
| %The auxiliary routines are listed in Appendix ~\ref{appendixb}.
 | |
| \end{itemize}
 | |
| 
 | |
| \subsection{Level 1, 2, and 3 BLAS}
 | |
| 
 | |
| The BLAS are a set of Basic Linear Algebra Subprograms that perform
 | |
| vector-vector, matrix-vector, and matrix-matrix operations. 
 | |
| LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
 | |
| of the parallelism in the LAPACK routines is contained in the BLAS.
 | |
| Therefore,
 | |
| the key to getting good performance from LAPACK lies in having an
 | |
| efficient version of the BLAS optimized for your particular machine. 
 | |
| Optimized BLAS libraries are available on a variety of architectures,
 | |
| refer to the BLAS FAQ on netlib for further information.
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/blas/faq.html}
 | |
| \end{quote}
 | |
| There are also freely available BLAS generators that automatically
 | |
| tune a subset of the BLAS for a given architecture.  E.g.,
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/atlas/}
 | |
| \end{quote}
 | |
| And, if all else fails, there is the Fortran~77 reference implementation
 | |
| of the Level 1, 2, and 3 BLAS available on netlib (also included in
 | |
| the LAPACK distribution tar file).
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/blas/blas.tgz}
 | |
| \end{quote}
 | |
| No matter which BLAS library is used, the BLAS test programs should
 | |
| always be run.
 | |
| 
 | |
| Users should not expect too much from the Fortran~77 reference implementation
 | |
| BLAS; these versions were written to define the basic operations and do not
 | |
| employ the standard tricks for optimizing Fortran code.
 | |
| 
 | |
| The formal definitions of the Level 1, 2, and 3 BLAS
 | |
| are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}. 
 | |
| The BLAS Quick Reference card is available on netlib.
 | |
| 
 | |
| \subsection{Mixed- and Extended-Precision BLAS: XBLAS}
 | |
| 
 | |
| The XBLAS extend the BLAS to work with mixed input and output
 | |
| precisions as well as using extra precision internally.  The XBLAS are
 | |
| used in the prototype extra-precise iterative refinement codes.
 | |
| 
 | |
| The current release of the XBLAS is available through
 | |
| Netlib\footnote{Development versions may be available through
 | |
|   \url{http://www.cs.berkeley.edu/~yozo/} or
 | |
|   \url{http://www.nersc.gov/~xiaoye/XBLAS/}.}  at
 | |
| \begin{quote}
 | |
|   \url{http://www.netlib.org/xblas}
 | |
| \end{quote}
 | |
| Their formal definition is in \cite{XBLAS}.
 | |
| 
 | |
| \subsection{LAPACK Test Routines}
 | |
| 
 | |
| This release contains two distinct test programs for LAPACK routines
 | |
| in each data type.  One test program tests the routines for solving
 | |
| linear equations and linear least squares problems,
 | |
| and the other tests routines for the matrix eigenvalue problem.
 | |
| The routines for generating test matrices are used by both test
 | |
| programs and are compiled into a library for use by both test programs.
 | |
| 
 | |
| \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
 | |
| 
 | |
| This release also contains two distinct timing programs for the
 | |
| LAPACK routines in each data type. 
 | |
| The linear equation timing program gathers performance data in
 | |
| megaflops on the factor, solve, and inverse routines for solving
 | |
| linear systems, the routines to generate or apply an orthogonal matrix
 | |
| given as a sequence of elementary transformations, and the reductions
 | |
| to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
 | |
| computations.
 | |
| The operation counts used in computing the megaflop rates are computed
 | |
| from a formula;
 | |
| see LAPACK Working Note 41~\cite{WN41}.
 | |
| % see Appendix ~\ref{appendixc}.
 | |
| The eigenvalue timing program is used with the eigensystem routines
 | |
| and returns the execution time, number of floating point operations, and
 | |
| megaflop rate for each of the requested subroutines.
 | |
| In this program, the number of operations is computed while the
 | |
| code is executing using special instrumented versions of the LAPACK
 | |
| subroutines.
 | |
| 
 | |
| \section{Installing LAPACK on a Unix System}\label{installation}
 | |
| 
 | |
| Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
 | |
| involves the following steps: 
 | |
| \begin{enumerate}
 | |
| \item Gunzip and tar the file.
 | |
| 
 | |
| \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
 | |
|  
 | |
| \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
 | |
| 
 | |
| %\item Test and Install the Machine-Dependent Routines \\
 | |
| %\emph{(WARNING:  You may need to supply a correct version of second.f and
 | |
| %dsecnd.f for your machine)}
 | |
| %{\tt
 | |
| %\begin{list}{}{}
 | |
| %\item cd LAPACK
 | |
| %\item make install
 | |
| %\end{list} }
 | |
| %
 | |
| %\item Create the BLAS Library, \emph{if necessary} \\
 | |
| %\emph{(NOTE:  For best performance, it is recommended you use the manufacturers' BLAS)}
 | |
| %{\tt
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make blaslib}
 | |
| %\end{list} }
 | |
| %
 | |
| %\item Run the Level 1, 2, and 3 BLAS Test Programs
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make blas\_testing}
 | |
| %\end{list}
 | |
| %
 | |
| %\item Create the LAPACK Library
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make lapacklib}
 | |
| %\end{list}
 | |
| %
 | |
| %\item Create the Library of Test Matrix Generators
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make tmglib}
 | |
| %\end{list}
 | |
| %
 | |
| %\item Run the LAPACK Test Programs
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make testing}
 | |
| %\end{list}
 | |
| %
 | |
| %\item Run the LAPACK Timing Programs
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make timing}
 | |
| %\end{list}
 | |
| %
 | |
| %\item Run the BLAS Timing Programs
 | |
| %\begin{list}{}{}
 | |
| %\item \texttt{cd LAPACK}
 | |
| %\item \texttt{make blas\_timing}
 | |
| %\end{list}
 | |
| \end{enumerate}
 | |
|  
 | |
| \subsection{Untar the File}
 | |
| 
 | |
| If you received a tar file of LAPACK via the World Wide
 | |
| Web or anonymous ftp, enter the following command:
 | |
| 
 | |
| \begin{list}{}
 | |
| \item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| This will create a top-level directory called \texttt{LAPACK}, which
 | |
| requires approximately 34 Mbytes of disk space.
 | |
| The total space requirements including the object files and executables
 | |
| is approximately 100 Mbytes for all four data types.
 | |
| 
 | |
| \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
 | |
| 
 | |
| Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
 | |
| run, you must define all machine-specific parameters for the
 | |
| architecture to which you are installing LAPACK.  All machine-specific
 | |
| parameters are contained in the file \texttt{LAPACK/make.inc}.
 | |
| An example of  \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given 
 | |
| in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
 | |
| 
 | |
| \begin{list}{}
 | |
| \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
 | |
| The first line of this \texttt{make.inc} file is:
 | |
| \begin{quote}
 | |
| SHELL = /bin/sh
 | |
| \end{quote}
 | |
| and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
 | |
| installing LAPACK on an SGI architecture.
 | |
| Second, you will
 | |
| need to modify the \texttt{PLAT} definition, which is appended to all
 | |
| library names, to specify the architecture to which you are installing
 | |
| LAPACK.  This features avoids confusion in library names when you are
 | |
| installing LAPACK on more than one architecture.  Next, you will need
 | |
| to modify \texttt{FORTRAN}, \texttt{OPTS}, \texttt{DRVOPTS}, \texttt{NOOPT}, \texttt{LOADER},
 | |
| and \texttt{LOADOPTS} to specify
 | |
| the compiler, compiler options, compiler options for the testing and
 | |
| timing\footnotemark[\value{footnote}] main programs, loader, loader options.
 | |
| Next you will have to choose which function you will use to time in the \texttt{SECOND} and \texttt{DSECND} routines.
 | |
| \begin{verbatim}
 | |
| #The Default : SECOND and DSECND will use a call to the EXTERNAL FUNCTION ETIME
 | |
| TIMER    = EXT_ETIME
 | |
| # For RS6K : SECOND and DSECND will use a call to the EXTERNAL FUNCTION ETIME_ 
 | |
| # TIMER    = EXT_ETIME_
 | |
| # For gfortran compiler: SECOND and DSECND will use the INTERNAL FUNCTION ETIME
 | |
| # TIMER    = INT_ETIME
 | |
| # If your Fortran compiler does not provide etime (like Nag Fortran Compiler, etc...)
 | |
| # SECOND and DSECND will use a call to the INTERNAL FUNCTION CPU_TIME
 | |
| # TIMER    = INT_CPU_TIME
 | |
| # If neither of this works...you can use the NONE value... 
 | |
| # In that case, SECOND and DSECND will always return 0
 | |
| # TIMER     = NONE
 | |
| \end{verbatim}
 | |
| Refer to the section~\ref{second} to get more information.
 | |
| 
 | |
| 
 | |
| Next, you will need to modify \texttt{ARCH}, \texttt{ARCHFLAGS}, and \texttt{RANLIB} to specify archiver,
 | |
| archiver options, and ranlib for your machine.  If your architecture
 | |
| does not require \texttt{ranlib} to be run after each archive command (as
 | |
| is the case with CRAY computers running UNICOS, Hewlett Packard
 | |
| computers running HP-UX, or SUN SPARCstations running Solaris), set
 | |
| \texttt{ranlib=echo}.  And finally, you must
 | |
| modify the \texttt{BLASLIB} definition to specify the BLAS library to which
 | |
| you will be linking.  If an optimized version of the BLAS is available
 | |
| on your machine, you are highly recommended to link to that library.
 | |
| Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
 | |
| 
 | |
| If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
 | |
| to some value, for example \texttt{USEXBLAS = Yes}.  Then set the
 | |
| variable \texttt{XBLASLIB} to point at the XBLAS library.  Note that
 | |
| the prototype iterative refinement routines and their testers will not
 | |
| be built unless \texttt{USEXBLAS} is defined.
 | |
| 
 | |
| \textbf{NOTE:}  Example \texttt{make.inc} include files are contained in the
 | |
| \texttt{LAPACK/INSTALL} directory.  Please refer to
 | |
| Appendix~\ref{appendixd} for machine-specific installation hints, and/or
 | |
| the \texttt{release\_notes} file on \texttt{netlib}.
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/lapack/release\_notes}
 | |
| \end{quote}
 | |
| 
 | |
| \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
 | |
| 
 | |
| This \texttt{Makefile} can be modified to perform as much of the
 | |
| installation process as the user desires.  Ideally, this is the ONLY
 | |
| makefile the user must modify.  However, modification of lower-level
 | |
| makefiles may be necessary if a specific routine needs to be compiled
 | |
| with a different level of optimization.  
 | |
| 
 | |
| First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
 | |
| \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
 | |
| to specify the data types desired.  For example,
 | |
| if you only wish to compile the single precision real version of the
 | |
| LAPACK library, you would modify the \texttt{lapacklib} definition to be:
 | |
| 
 | |
| \begin{verbatim}
 | |
| lapacklib:
 | |
|         ( cd SRC; $(MAKE) single )
 | |
| \end{verbatim}
 | |
| 
 | |
| Likewise, you could specify \texttt{double, complex, or complex16} to
 | |
| build the double precision real, single precision complex, or double
 | |
| precision complex libraries, respectively.  By default, the presence of
 | |
| no arguments following the \texttt{make} command will result in the
 | |
| building of all four data types.
 | |
| The make command can be run more than once to add another
 | |
| data type to the library if necessary.
 | |
| 
 | |
| %If you are installing LAPACK on a Silicon Graphics machine, you must
 | |
| %modify the respective definitions of \texttt{testing} and \texttt{timing} to be
 | |
| %\begin{verbatim}
 | |
| %testing:
 | |
| %        ( cd TESTING; $(MAKE) -f Makefile.sgi )
 | |
| %\end{verbatim}
 | |
| %and
 | |
| %\begin{verbatim}
 | |
| %timing:
 | |
| %        ( cd TIMING; $(MAKE) -f Makefile.sgi )
 | |
| %\end{verbatim}
 | |
|       
 | |
| Next, if you will be using a locally available BLAS library, you will need
 | |
| to remove \texttt{blaslib} from the \texttt{lib} definition.  And finally,
 | |
| if you do not wish to build all of the libraries individually and
 | |
| likewise run all of the testing and timing separately, you can
 | |
| modify the \texttt{all} definition to specify the amount of the
 | |
| installation process that you want performed.  By default,
 | |
| the \texttt{all} definition is set to
 | |
| \begin{verbatim}
 | |
| all: lapack_install lib lapack_testing blas_testing
 | |
| \end{verbatim}
 | |
| which will perform all phases of the installation
 | |
| process -- testing of machine-dependent routines, building the libraries,
 | |
| BLAS testing and LAPACK testing.
 | |
| 
 | |
| The entire installation process will then be performed by typing
 | |
| \texttt{make}.
 | |
| 
 | |
| Questions and/or comments can be directed to the
 | |
| authors as described in Section~\ref{sendresults}.  If test failures
 | |
| occur, please refer to the appropriate subsection in
 | |
| Section~\ref{furtherdetails}.
 | |
| 
 | |
| If disk space is limited, we suggest building each data type separately
 | |
| and/or deleting all object files after building the libraries.  Likewise, all
 | |
| testing and timing executables can be deleted after the testing and timing
 | |
| process is completed.  The removal of all object files and executables
 | |
| can be accomplished by the following:
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item \texttt{cd LAPACK}
 | |
| \item \texttt{make clean}
 | |
| \end{list}
 | |
| 
 | |
| \section{Further Details of the Installation Process}\label{furtherdetails}
 | |
| 
 | |
| Alternatively, you can choose to run each of the phases of the
 | |
| installation process separately.  The following sections give details
 | |
| on how this may be achieved.
 | |
| 
 | |
| \subsection{Test and Install the Machine-Dependent Routines.}
 | |
| 
 | |
| There are six machine-dependent functions in the test and timing
 | |
| package, at least three of which must be installed.  They are
 | |
| 
 | |
| \begin{tabbing}
 | |
| MONOMO  \=  DOUBLE PRECYSION  \=  \kill
 | |
| LSAME   \>  LOGICAL      \> Test if two characters are the same regardless of case \\
 | |
| SLAMCH  \>  REAL  \> Determine machine-dependent parameters \\
 | |
| DLAMCH  \>  DOUBLE PRECISION \> Determine machine-dependent parameters \\
 | |
| SECOND  \>  REAL  \> Return time in seconds from a fixed starting time \\
 | |
| DSECND  \>  DOUBLE PRECISION  \> Return time in seconds from a fixed starting time\\
 | |
| ILAENV  \>  INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
 | |
| \end{tabbing}
 | |
| 
 | |
| \noindent
 | |
| If you are working only in single precision, you do not need to install
 | |
| DLAMCH and DSECND, and if you are working only in double precision,
 | |
| you do not need to install SLAMCH and SECOND.
 | |
| 
 | |
| These six subroutines are provided in \texttt{LAPACK/INSTALL},
 | |
| along with six test programs.
 | |
| To compile the six test programs and run the tests, go to \texttt{LAPACK} and
 | |
| type \texttt{make lapack\_install}.  The test programs are called
 | |
| \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
 | |
| \texttt{testieee}.
 | |
| If you do not wish to run all tests, you will need to modify the 
 | |
| \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
 | |
| tests you wish to run.  Otherwise, all tests will be performed.
 | |
| The expected results of each test program are described below.
 | |
| 
 | |
| \subsubsection{Installing LSAME}
 | |
| 
 | |
| LSAME is a logical function with two character parameters, A and B.
 | |
| It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
 | |
| if they are different. 
 | |
| For example, the expression
 | |
| 
 | |
| \begin{list}{}{} 
 | |
| \item \texttt{LSAME( UPLO, 'U' )}
 | |
| \end{list}
 | |
| \noindent 
 | |
| is equivalent to
 | |
| \begin{list}{}{} 
 | |
| \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
 | |
| \end{list} 
 | |
|  
 | |
| The test program in \texttt{lsametst.f} tests all combinations of
 | |
| the same character in upper and lower case for A and B, and two
 | |
| cases where A and B are different characters.
 | |
| 
 | |
| Run the test program by typing \texttt{testlsame}.
 | |
| If LSAME works correctly, the only message you should see after the
 | |
| execution of \texttt{testlsame} is
 | |
| \begin{verbatim}
 | |
|  ASCII character set
 | |
|  Tests completed
 | |
| \end{verbatim}
 | |
| The file \texttt{lsame.f} is automatically copied to
 | |
| \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}. 
 | |
| The function LSAME is needed by both the BLAS and LAPACK, so it is safer
 | |
| to have it in both libraries as long as this does not cause trouble
 | |
| in the link phase when both libraries are used.
 | |
| 
 | |
| \subsubsection{Installing SLAMCH and DLAMCH}
 | |
| 
 | |
| SLAMCH and DLAMCH are real functions with a single character parameter
 | |
| that indicates the machine parameter to be returned.  The test 
 | |
| program in \texttt{slamchtst.f}
 | |
| simply prints out the different values computed by SLAMCH,
 | |
| so you need to know something about what the values should be. 
 | |
| For example, the output of the test program executable \texttt{testslamch}
 | |
| for SLAMCH on a Sun SPARCstation is
 | |
| \begin{verbatim}
 | |
|  Epsilon                      =     5.96046E-08
 | |
|  Safe minimum                 =     1.17549E-38
 | |
|  Base                         =     2.00000
 | |
|  Precision                    =     1.19209E-07
 | |
|  Number of digits in mantissa =     24.0000
 | |
|  Rounding mode                =     1.00000
 | |
|  Minimum exponent             =    -125.000
 | |
|  Underflow threshold          =     1.17549E-38
 | |
|  Largest exponent             =     128.000
 | |
|  Overflow threshold           =     3.40282E+38
 | |
|  Reciprocal of safe minimum   =     8.50706E+37
 | |
| \end{verbatim}
 | |
| On a Cray machine, the safe minimum underflows its output
 | |
| representation and the overflow threshold overflows its output
 | |
| representation, so the safe minimum is printed as 0.00000 and overflow
 | |
| is printed as R.  This is normal.
 | |
| If you would prefer to print a representable number, you can modify
 | |
| the test program to print SFMIN*100. and RMAX/100. for the safe
 | |
| minimum and overflow thresholds.
 | |
| 
 | |
| Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
 | |
| 
 | |
| If both tests were successful, go to Section~\ref{second}.
 | |
| 
 | |
| If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
 | |
| your own version of this function.  The following options are used in
 | |
| LAPACK and must be set:
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item {`B': }  Base of the machine
 | |
| \item {`E': }  Epsilon (relative machine precision)
 | |
| \item {`O': }  Overflow threshold
 | |
| \item {`P': }  Precision = Epsilon*Base
 | |
| \item {`S': }  Safe minimum (often same as underflow threshold)
 | |
| \item {`U': }  Underflow threshold
 | |
| \end{list}
 | |
| 
 | |
| Some people may be familiar with R1MACH (D1MACH), a primitive
 | |
| routine for setting machine parameters in which the user must
 | |
| comment out the appropriate assignment statements for the target
 | |
| machine.  If a version of R1MACH is on hand, the assignments in
 | |
| SLAMCH can be made to refer to R1MACH using the correspondence
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item {SLAMCH( `U' )}  $=$ R1MACH( 1 )
 | |
| \item {SLAMCH( `O' )}  $=$ R1MACH( 2 )
 | |
| \item {SLAMCH( `E' )}  $=$ R1MACH( 3 )
 | |
| \item {SLAMCH( `B' )}  $=$ R1MACH( 5 )
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| The safe minimum returned by SLAMCH( 'S' ) is initially set to the
 | |
| underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
 | |
| it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
 | |
| where $\varepsilon$ is the machine precision.
 | |
| 
 | |
| BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.  
 | |
| We suggest that installers run it once, save the results, and hard-code
 | |
| the constants in the version they put in their library.
 | |
| 
 | |
| \subsubsection{Installing SECOND and DSECND}\label{second}
 | |
| 
 | |
| Both the timing routines\footnotemark[\value{footnote}]  and the test routines call SECOND
 | |
| (DSECND), a real function with no arguments that returns the time
 | |
| in seconds from some fixed starting time.
 | |
| Our version of this routine 
 | |
| returns only ``user time'', and not ``user time $+$ system time''. 
 | |
| The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls 
 | |
| ETIME, a Fortran library routine available on some computer systems.
 | |
| If ETIME is not available or a better local timing function exists,
 | |
| you will have to provide the correct interface to SECOND and DSECND
 | |
| on your machine.
 | |
| 
 | |
| Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
 | |
| The version that will be used depends on the value of the TIMER variable in the make.inc
 | |
| 
 | |
| \begin{itemize}
 | |
| \item If ETIME is available as an external function, set the value of the TIMER variable in your 
 | |
| make.inc to \texttt{EXT\_ETIME}:\texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
 | |
| Usually on HPPA architectures,
 | |
| the compiler and loader flag \texttt{+U77} should be included to access
 | |
| the function \texttt{ETIME}.
 | |
| 
 | |
| \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc 
 | |
| to \texttt{EXT\_ETIME\_}:\texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
 | |
| It is the case on some IBM architectures such as IBM RS/6000s.
 | |
| 
 | |
| \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
 | |
| to \texttt{INT\_ETIME}:\texttt{second\_INT\_ETIME.f}  and \texttt{dsecnd\_INT\_ETIME.f} will be used. 
 | |
| This is the case with gfortan.
 | |
| 
 | |
| \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
 | |
| to \texttt{INT\_CPU\_TIME}:\texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
 | |
| 
 | |
| \item If none of these function is available, set the value of the TIMER variable in your make.inc
 | |
| to \texttt{NONE:}\texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
 | |
| These routines will always return zero.
 | |
| \end{itemize}
 | |
| 
 | |
| The test program in \texttt{secondtst.f}
 | |
| performs a million operations using 5000 iterations of 
 | |
| the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
 | |
| The total time and megaflops for this test is reported, then
 | |
| the operation is repeated including a call to SECOND on each of
 | |
| the 5000 iterations to determine the overhead due to calling SECOND.
 | |
| The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
 | |
| There is no single right answer, but the times
 | |
| in seconds should be positive and the megaflop ratios should be 
 | |
| appropriate for your machine.
 | |
| 
 | |
| \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
 | |
| 
 | |
| %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST 
 | |
| %modify ILAENV!  Otherwise, ILAENV will crash .  By default, ILAENV
 | |
| %assumes an IEEE machine, and does a test for IEEE-754 compliance.}
 | |
| 
 | |
| As some new routines in LAPACK rely on IEEE-754 compliance,
 | |
| two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
 | |
| (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
 | |
| infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
 | |
| machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
 | |
| are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
 | |
| as this test inside ILAENV will crash!}
 | |
| 
 | |
| If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 
 | |
| issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
 | |
| and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
 | |
| 
 | |
| Thus, for non-IEEE machines, the user must hard-code the setting of
 | |
| (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
 | |
| of \texttt{LAPACK/SRC/ilaenv.f} to be put in
 | |
| his library.  There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
 | |
| ILAENV that will also need to be modified.
 | |
| \begin{itemize}
 | |
| \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
 | |
| \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
 | |
| \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
 | |
| \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
 | |
| \end{itemize}
 | |
| 
 | |
| %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
 | |
| %is detected (via a call to the function ILAENV), alternative (slower)
 | |
| %algorithms will be chosen.
 | |
| %For further details, refer to the leading comments of routines such
 | |
| %as \texttt{LAPACK/SRC/sstevr.f}.
 | |
| 
 | |
| The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
 | |
| architecture
 | |
| to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.  
 | |
| A warning message to the user is printed if non-compliance is detected.
 | |
| This same test is performed inside the function ILAENV.  If
 | |
| \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is 
 | |
| issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
 | |
| and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
 | |
| 
 | |
| To avoid this IEEE test being run every time you call
 | |
| \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
 | |
| that the user hard-code the setting of
 | |
| \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
 | |
| his library.  As aforementioned, there are also specialized testing and
 | |
| timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
 | |
| 
 | |
| \subsection{Create the BLAS Library} 
 | |
| 
 | |
| Ideally, a highly optimized version of the BLAS library already
 | |
| exists on your machine. 
 | |
| In this case you can go directly to Section~\ref{testblas} to
 | |
| make the BLAS test programs.  
 | |
| 
 | |
| \begin{itemize}
 | |
| \item[a)]
 | |
| Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
 | |
| file \texttt{Makefile} to specify the data types desired, as in the example
 | |
| in Section~\ref{toplevelmakefile}.
 | |
| 
 | |
| If you already have some of the BLAS, you will need to edit the file
 | |
| \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines 
 | |
| defining the BLAS you have.  
 | |
| 
 | |
| \item[b)]
 | |
| Type \texttt{make blaslib}.
 | |
| The make command can be run more than once to add another
 | |
| data type to the library if necessary.  
 | |
| \end{itemize}
 | |
| 
 | |
| \noindent
 | |
| The BLAS library is created in \texttt{LAPACK/blas\_PLAT.a}, where
 | |
| \texttt{PLAT} is the user-defined architecture suffix specified in the file
 | |
| \texttt{LAPACK/make.inc}.
 | |
| 
 | |
| \subsection{Run the BLAS Test Programs}\label{testblas}
 | |
| 
 | |
| Test programs for the Level 1, 2, and 3 BLAS are in the directory 
 | |
| \texttt{LAPACK/BLAS/TESTING}.
 | |
| 
 | |
| To compile and run the Level 1, 2, and 3 BLAS test programs,
 | |
| go to \texttt{LAPACK} and type \texttt{make blas\_testing}.  The executable
 | |
| files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
 | |
| \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
 | |
| depending upon the level of BLAS that it is testing.  All executable and
 | |
| output files are created in \texttt{LAPACK/BLAS/}.
 | |
| For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
 | |
| \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}.  For the Level
 | |
| 2 and 3 BLAS, the name of the output file is indicated on the first line of the
 | |
| input file and is currently defined to be \texttt{sblat2.out} for
 | |
| the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
 | |
| version, with similar names for the other data types.
 | |
| 
 | |
| If the tests using the supplied data files were completed successfully,
 | |
| consider whether the tests were sufficiently thorough.
 | |
| For example, on a machine with vector registers, at least one value
 | |
| of $N$ greater than the length of the vector registers should be used;
 | |
| otherwise, important parts of the compiled code may not be
 | |
| exercised by the tests. 
 | |
| If the tests were not successful, either because the program did not
 | |
| finish or the test ratios did not pass the threshold, you will
 | |
| probably have to find and correct the problem before continuing. 
 | |
| If you have been testing a system-specific
 | |
| BLAS library, try using the Fortran BLAS for the routines that
 | |
| did not pass the tests.
 | |
| For more details on the BLAS test programs, 
 | |
| see \cite{BLAS2-test} and \cite{BLAS3-test}.
 | |
| 
 | |
| \subsection{Create the LAPACK Library}
 | |
| 
 | |
| \begin{itemize}
 | |
| \item[a)]
 | |
| Go to the directory \texttt{LAPACK} and edit the definition of
 | |
| \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
 | |
| as in the example in Section~\ref{toplevelmakefile}.
 | |
| 
 | |
| \item[b)]
 | |
| Type \texttt{make lapacklib}.
 | |
| The make command can be run more than once to add another
 | |
| data type to the library if necessary.  
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| \noindent
 | |
| The LAPACK library is created in \texttt{LAPACK/lapack\_PLAT.a}, where
 | |
| \texttt{PLAT} is the user-defined architecture suffix specified in the file
 | |
| \texttt{LAPACK/make.inc}.
 | |
| 
 | |
| \subsection{Create the Test Matrix Generator Library}
 | |
| 
 | |
| \begin{itemize}
 | |
| \item[a)]
 | |
| Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
 | |
| in the file \texttt{Makefile} to specify the data types desired, as in the
 | |
| example in Section~\ref{toplevelmakefile}.
 | |
| 
 | |
| \item[b)]
 | |
| Type \texttt{make tmglib}.
 | |
| The make command can be run more than once to add another
 | |
| data type to the library if necessary.  
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| \noindent
 | |
| The test matrix generator library is created in \texttt{LAPACK/tmglib\_PLAT.a},
 | |
| where \texttt{PLAT} is the user-defined architecture suffix specified in the 
 | |
| file \texttt{LAPACK/make.inc}.
 | |
| 
 | |
| \subsection{Run the LAPACK Test Programs}
 | |
| 
 | |
| There are two distinct test programs for LAPACK routines
 | |
| in each data type, one for the linear equation routines and
 | |
| one for the eigensystem routines.
 | |
| In each data type, there is one input file for testing the linear
 | |
| equation routines and eighteen input files for testing the eigenvalue
 | |
| routines.
 | |
| The input files reside in \texttt{LAPACK/TESTING}.
 | |
| For more information on the test programs and how to modify the
 | |
| input files, please refer to LAPACK Working Note 41~\cite{WN41}.
 | |
| % see Section~\ref{moretesting}. 
 | |
| 
 | |
| If you do not wish to run each of the tests individually, you can
 | |
| go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
 | |
| \texttt{Makefile} to specify the data types desired, and type \texttt{make
 | |
| lapack\_testing}.  This will
 | |
| compile and run the tests as described in sections~\ref{testlin} 
 | |
| and ~\ref{testeig}.
 | |
| 
 | |
| %If you are installing LAPACK on a Silicon Graphics machine, you must
 | |
| %modify the definition of \texttt{testing} to be
 | |
| %\begin{verbatim}
 | |
| %testing:
 | |
| %        ( cd TESTING; $(MAKE) -f Makefile.sgi )
 | |
| %\end{verbatim}
 | |
|  
 | |
| \subsubsection{Testing the Linear Equations Routines}\label{testlin}
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item[a)]
 | |
| Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
 | |
| desired.  The executable files are called \texttt{xlintsts, xlintstc,
 | |
| xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
 | |
| 
 | |
| \item[b)]
 | |
| Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
 | |
| For the REAL version, the command is
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xlintsts  < stest.in > stest.out}
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
 | |
| with the leading `s' in the input and output file names replaced
 | |
| by `d', `c', or `z'.
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| If you encountered failures in this phase of the testing process, please
 | |
| refer to Section~\ref{sendresults}.
 | |
| 
 | |
| \subsubsection{Testing the Eigensystem Routines}\label{testeig}
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item[a)]
 | |
| Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
 | |
| desired.  The executable files are called \texttt{xeigtsts,
 | |
| xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
 | |
| in \texttt{LAPACK/TESTING}.
 | |
| 
 | |
| \item[b)]
 | |
| Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
 | |
| The tests for the eigensystem routines use eighteen separate input files
 | |
| for testing the nonsymmetric eigenvalue problem,
 | |
| the symmetric eigenvalue problem, the banded symmetric eigenvalue
 | |
| problem, the generalized symmetric eigenvalue
 | |
| problem, the generalized nonsymmetric eigenvalue problem, the
 | |
| singular value decomposition, the banded singular value decomposition,
 | |
| the generalized singular value
 | |
| decomposition, the generalized QR and RQ factorizations, the generalized
 | |
| linear regression model, and the constrained linear least squares
 | |
| problem.
 | |
| The tests for the REAL version are as follows:
 | |
| \begin{list}{}{}
 | |
| \item \texttt{xeigtsts  < nep.in > snep.out}
 | |
| \item \texttt{xeigtsts  < sep.in > ssep.out}
 | |
| \item \texttt{xeigtsts  < svd.in > ssvd.out}
 | |
| \item \texttt{xeigtsts  < sec.in > sec.out}
 | |
| \item \texttt{xeigtsts  < sed.in > sed.out}
 | |
| \item \texttt{xeigtsts  < sgg.in > sgg.out}
 | |
| \item \texttt{xeigtsts  < sgd.in > sgd.out}
 | |
| \item \texttt{xeigtsts  < ssg.in > ssg.out}
 | |
| \item \texttt{xeigtsts  < ssb.in > ssb.out}
 | |
| \item \texttt{xeigtsts  < sbb.in > sbb.out}
 | |
| \item \texttt{xeigtsts  < sbal.in > sbal.out}
 | |
| \item \texttt{xeigtsts  < sbak.in > sbak.out}
 | |
| \item \texttt{xeigtsts  < sgbal.in > sgbal.out}
 | |
| \item \texttt{xeigtsts  < sgbak.in > sgbak.out}
 | |
| \item \texttt{xeigtsts  < glm.in > sglm.out}
 | |
| \item \texttt{xeigtsts  < gqr.in > sgqr.out}
 | |
| \item \texttt{xeigtsts  < gsv.in > sgsv.out}
 | |
| \item \texttt{xeigtsts  < lse.in > slse.out}
 | |
| \end{list}
 | |
| The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
 | |
| use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
 | |
| \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
 | |
| but the leading `s' in the other input file names must be changed
 | |
| to `c', `d', or `z'.
 | |
| \end{itemize}
 | |
| 
 | |
| If you encountered failures in this phase of the testing process, please
 | |
| refer to Section~\ref{sendresults}.
 | |
| 
 | |
| \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
 | |
| 
 | |
| There are two distinct timing programs for LAPACK routines
 | |
| in each data type, one for the linear equation routines and
 | |
| one for the eigensystem routines.  The timing program for the
 | |
| linear equation routines is also used to time the BLAS.
 | |
| We encourage you to conduct these timing experiments
 | |
| in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
 | |
| not necessary to send timing results in all four data types.
 | |
| 
 | |
| Two sets of input files are provided, a small set and a large set.
 | |
| The small data sets are appropriate for a standard workstation or
 | |
| other non-vector machine.
 | |
| The large data sets are appropriate for supercomputers, vector
 | |
| computers, and high-performance workstations.
 | |
| We are mainly interested in results from the large data sets, and
 | |
| it is not necessary to run both the large and small sets.
 | |
| The values of N in the large data sets are about five times larger
 | |
| than those in the small data set,
 | |
| and the large data sets use additional values for parameters such as the
 | |
| block size NB and the leading array dimension LDA.
 | |
| Small data sets finished with the \_small in their name , such as
 | |
| \texttt{stime\_small.in}, and large data sets finished with \_large in their name,
 | |
| such as \texttt{stime\_large.in}.
 | |
| Except as noted, the leading `s' in the input file name must be
 | |
| replaced by `d', `c', or `z' for the other data types.
 | |
| 
 | |
| We encourage you to obtain timing results with the large data sets,
 | |
| as this allows us to compare different machines.
 | |
| If this would take too much time, suggestions for paring back the large
 | |
| data sets are given in the instructions below.
 | |
| We also encourage you to experiment with these timing
 | |
| programs and send us any interesting results, such as results for
 | |
| larger problems or for a wider range of block sizes.
 | |
| The main programs are dimensioned for the large data sets,
 | |
| so the parameters in the main program may have to be reduced in order
 | |
| to run the small data sets on a small machine, or increased to run
 | |
| experiments with larger problems.
 | |
| 
 | |
| The minimum time each subroutine will be timed is set to 0.0 in
 | |
| the large data files and to 0.05 in the small data files, and on
 | |
| many machines this value should be increased.
 | |
| If the timing interval is not long
 | |
| enough, the time for the subroutine after subtracting the overhead
 | |
| may be very small or zero, resulting in megaflop rates that are
 | |
| very large or zero. (To avoid division by zero, the megaflop rate is
 | |
| set to zero if the time is less than or equal to zero.)
 | |
| The minimum time that should be used depends on the machine and the
 | |
| resolution of the clock.
 | |
| 
 | |
| For more information on the timing programs and how to modify the
 | |
| input files, please refer to LAPACK Working Note 41~\cite{WN41}.
 | |
| % see Section~\ref{moretiming}.
 | |
| 
 | |
| If you do not wish to run each of the timings individually, you can
 | |
| go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
 | |
| \texttt{Makefile} to specify the data types desired, and type \texttt{make
 | |
| lapack\_timing}.  This will compile
 | |
| and run the timings for the linear equation routines and the eigensystem
 | |
| routines (see Sections~\ref{timelin} and ~\ref{timeeig}). 
 | |
| 
 | |
| %If you are installing LAPACK on a Silicon Graphics machine, you must
 | |
| %modify the definition of \texttt{timing} to be
 | |
| %\begin{verbatim}
 | |
| %timing:
 | |
| %        ( cd TIMING; $(MAKE) -f Makefile.sgi )
 | |
| %\end{verbatim}
 | |
| 
 | |
| If you encounter failures in any phase of the timing process, please
 | |
| feel free to contact the authors as directed in Section~\ref{sendresults}.
 | |
| Tell us the 
 | |
| type of machine on which the tests were run, the version of the operating
 | |
| system, the compiler and compiler options that were used,
 | |
| and details of the BLAS library or libraries that you used.  You should
 | |
| also include a copy of the output file in which the failure occurs.
 | |
| 
 | |
| Please note that the BLAS
 | |
| timing runs will still need to be run as instructed in ~\ref{timeblas}.
 | |
| 
 | |
| \subsubsection{Timing the Linear Equations Routines}\label{timelin}
 | |
| 
 | |
| The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
 | |
| and the input files are in \texttt{LAPACK/TIMING}.
 | |
| Three input files are provided in each data type for timing the
 | |
| linear equation routines, one for square matrices, one for band
 | |
| matrices, and one for rectangular matrices.  The small data sets for the REAL version
 | |
| are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
 | |
| and the large data sets are
 | |
| \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
 | |
| 
 | |
| The timing program for the least squares routines uses special instrumented
 | |
| versions of the LAPACK routines to time individual sections of the code.
 | |
| The first step in compiling the timing program is therefore to make a library
 | |
| of the instrumented routines.
 | |
| 
 | |
| \begin{itemize}
 | |
| \item[a)]
 | |
| \begin{sloppypar}
 | |
| To make a library of the instrumented LAPACK routines, first
 | |
| go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
 | |
| by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 
 | |
| The library of instrumented code is created in
 | |
| \texttt{LAPACK/TIMING/LIN/linsrc\_PLAT.a},
 | |
| where \texttt{PLAT} is the user-defined architecture suffix specified in the
 | |
| file \texttt{LAPACK/make.inc}.
 | |
| \end{sloppypar}
 | |
| 
 | |
| \item[b)]
 | |
| To make the linear equation timing programs,
 | |
| go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
 | |
| types desired, as in the examples in Section~\ref{toplevelmakefile}.
 | |
| The executable files are called \texttt{xlintims},
 | |
| \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
 | |
| in \texttt{LAPACK/TIMING}.
 | |
| 
 | |
| \item[c)]
 | |
| Go to \texttt{LAPACK/TIMING} and
 | |
| make any necessary modifications to the input files.
 | |
| You may need to set the minimum time a subroutine will
 | |
| be timed to a positive value, or to restrict the size of the tests
 | |
| if you are using a computer with performance in between that of a
 | |
| workstation and that of a supercomputer.
 | |
| The computational requirements can be cut in half by using only one
 | |
| value of LDA.
 | |
| If it is necessary to also reduce the matrix sizes or the values of
 | |
| the blocksize, corresponding changes should be made to the 
 | |
| BLAS input files (see Section~\ref{timeblas}).
 | |
| 
 | |
| \item[d)]
 | |
| Run the programs for each data type you are using. 
 | |
| For the REAL version, the commands for the small data sets are
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
 | |
| \item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
 | |
| \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
 | |
| \end{list}
 | |
| or the commands for the large data sets are
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
 | |
| \item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
 | |
| \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| Similar commands should be used for the other data types.
 | |
| \end{itemize}
 | |
| 
 | |
| \subsubsection{Timing the BLAS}\label{timeblas}
 | |
| 
 | |
| The linear equation timing program is also used to time the BLAS.
 | |
| Three input files are provided in each data type for timing the Level
 | |
| 2 and 3 BLAS. 
 | |
| These input files time the BLAS using the matrix shapes encountered
 | |
| in the LAPACK routines, and we will use the results to analyze the
 | |
| performance of the LAPACK routines. 
 | |
| For the REAL version, the small data files are
 | |
| \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
 | |
| and the large data files are
 | |
| \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
 | |
| There are three sets of inputs because there are three
 | |
| parameters in the Level 3 BLAS, M, N, and K, and
 | |
| in most applications one of these parameters is small (on the order
 | |
| of the blocksize) while the other two are large (on the order of the
 | |
| matrix size).  
 | |
| In \texttt{sblasa\_small.in}, M and N are large but K is
 | |
| small, while in \texttt{sblasb\_small.in} the small parameter is M, and
 | |
| in \texttt{sblasc\_small.in} the small parameter is N.  
 | |
| The Level 2 BLAS are timed only in the first data set, where K
 | |
| is also used as the bandwidth for the banded routines.
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item[a)]
 | |
| Go to \texttt{LAPACK/TIMING} and
 | |
| make any necessary modifications to the input files.
 | |
| You may need to set the minimum time a subroutine will
 | |
| be timed to a positive value.
 | |
| If you modified the values of N or NB 
 | |
| in Section~\ref{timelin}, set M, N, and K accordingly.
 | |
| The large parameters among M, N, and K
 | |
| should be the same as the matrix sizes used in timing the linear
 | |
| equation routines,
 | |
| and the small parameter should be the same as the
 | |
| blocksizes used in timing the linear equation routines.
 | |
| If necessary, the large data set can be simplified by using only one
 | |
| value of LDA.
 | |
| 
 | |
| \item[b)]
 | |
| Run the programs for each data type you are using. 
 | |
| For the REAL version, the commands for the small data sets are
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
 | |
| \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
 | |
| \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
 | |
| \end{list}
 | |
| or the commands for the large data sets are
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
 | |
| \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
 | |
| \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| Similar commands should be used for the other data types.
 | |
| \end{itemize}
 | |
| 
 | |
| \subsubsection{Timing the Eigensystem Routines}\label{timeeig}
 | |
| 
 | |
| The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
 | |
| and the input files are in \texttt{LAPACK/TIMING}.
 | |
| Four input files are provided in each data type for timing the
 | |
| eigensystem routines,
 | |
| one for the generalized nonsymmetric eigenvalue problem, 
 | |
| one for the nonsymmetric eigenvalue problem, 
 | |
| one for the symmetric and generalized symmetric eigenvalue problem,
 | |
| and one for the singular value decomposition.
 | |
| For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
 | |
| \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
 | |
| and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
 | |
| \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
 | |
| Each of the four input files reads a different set of parameters,
 | |
| and the format of the input is indicated by a 3-character code
 | |
| on the first line.
 | |
| 
 | |
| The timing program for eigenvalue/singular value routines accumulates
 | |
| the operation count as the routines are executing using special
 | |
| instrumented versions of the LAPACK routines.  The first step in
 | |
| compiling the timing program is therefore to make a library of the
 | |
| instrumented routines.
 | |
| 
 | |
| \begin{itemize}
 | |
| \item[a)]
 | |
| \begin{sloppypar}
 | |
| To make a library of the instrumented LAPACK routines, first
 | |
| go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
 | |
| by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. 
 | |
| The library of instrumented code is created in
 | |
| \texttt{LAPACK/TIMING/EIG/eigsrc\_PLAT.a},
 | |
| where \texttt{PLAT} is the user-defined architecture suffix specified in the
 | |
| file \texttt{LAPACK/make.inc}.
 | |
| \end{sloppypar}
 | |
| 
 | |
| \item[b)]
 | |
| To make the eigensystem timing programs, 
 | |
| go to \texttt{LAPACK/TIMING/EIG} and
 | |
| type \texttt{make} followed by the data types desired, as in the examples
 | |
| of Section~\ref{toplevelmakefile}.  The executable files are called
 | |
| \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
 | |
| and are created in \texttt{LAPACK/TIMING}.
 | |
| 
 | |
| \item[c)]
 | |
| Go to \texttt{LAPACK/TIMING} and
 | |
| make any necessary modifications to the input files.
 | |
| You may need to set the minimum time a subroutine will
 | |
| be timed to a positive value, or to restrict the number of tests
 | |
| if you are using a computer with performance in between that of a
 | |
| workstation and that of a supercomputer.
 | |
| Instead of decreasing the matrix dimensions to reduce the time,
 | |
| it would be better to reduce the number of matrix types to be timed,
 | |
| since the performance varies more with the matrix size than with the
 | |
| type.  For example, for the nonsymmetric eigenvalue routines,
 | |
| you could use only one matrix of type 4 instead of four matrices of
 | |
| types 1, 3, 4, and 6.
 | |
| Refer to LAPACK Working Note 41~\cite{WN41} for further details.
 | |
| %  See Section~\ref{moretiming} for further details.
 | |
| 
 | |
| \item[d)]
 | |
| Run the programs for each data type you are using. 
 | |
| For the REAL version, the commands for the small data sets are
 | |
| 
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
 | |
| \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
 | |
| \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
 | |
| \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
 | |
| \end{list}
 | |
| or the commands for the large data sets are
 | |
| \begin{list}{}{}
 | |
| \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
 | |
| \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
 | |
| \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
 | |
| \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
 | |
| \end{list}
 | |
| 
 | |
| \noindent
 | |
| Similar commands should be used for the other data types.
 | |
| \end{itemize}
 | |
| 
 | |
| \subsection{Send the Results to Tennessee}\label{sendresults}
 | |
| 
 | |
| Congratulations!  You have now finished installing, testing, and
 | |
| timing LAPACK.  If you encountered failures in any phase of the 
 | |
| testing or timing process, please
 | |
| consult our \texttt{release\_notes} file on netlib.
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/lapack/release\_notes}
 | |
| \end{quote}
 | |
| This file contains machine-dependent installation clues which hopefully will 
 | |
| alleviate your difficulties or at least let you know that other users
 | |
| have had similar difficulties on that machine.  If there is not an entry
 | |
| for your machine or the suggestions do not fix your problem, please feel
 | |
| free to contact the authors at
 | |
| \begin{list}{}{}
 | |
| \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
 | |
| \end{list}
 | |
| Tell us the 
 | |
| type of machine on which the tests were run, the version of the operating
 | |
| system, the compiler and compiler options that were used,
 | |
| and details of the BLAS library or libraries that you used.  You should
 | |
| also include a copy of the output file in which the failure occurs.
 | |
| 
 | |
| We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
 | |
| Therefore, if you do not see an entry for your machine, please contact us
 | |
| with your testing results.
 | |
|  
 | |
| Comments and suggestions are also welcome.
 | |
| 
 | |
| We encourage you to make the LAPACK library available to your
 | |
| users and provide us with feedback from their experiences.
 | |
| %This release of LAPACK is not guaranteed to be compatible
 | |
| %with any previous test release.
 | |
| 
 | |
| \subsection{Get support}\label{getsupport}
 | |
| First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}. 
 | |
| if you still cannot solve your problem, you have 2 ways to go:
 | |
| \begin{itemize}
 | |
| \item
 | |
| either send a post in the LAPACK forum 
 | |
| \begin{quote}
 | |
| \url{http://icl.cs.utk.edu/lapack-forum}
 | |
| \end{quote}
 | |
| \item
 | |
| or send an email to the LAPACK mailing list: 
 | |
| \begin{list}{}{}
 | |
| \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
 | |
| \end{list}
 | |
| \end{itemize}
 | |
| \section*{Acknowledgments}
 | |
| 
 | |
| Ed Anderson and Susan Blackford contributed to previous versions of this report.
 | |
| 
 | |
| \appendix
 | |
| 
 | |
| \chapter{Caveats}\label{appendixd}
 | |
| 
 | |
| In this appendix we list a few of the machine-specific difficulties we
 | |
| have
 | |
| encountered in our own experience with LAPACK.  A more detailed list
 | |
| of machine-dependent problems, bugs, and compiler errors encountered
 | |
| in the LAPACK installation process is maintained
 | |
| on \emph{netlib}.
 | |
| \begin{quote}
 | |
| \url{http://www.netlib.org/lapack/release\_notes}
 | |
| \end{quote}
 | |
| 
 | |
| We assume the user has installed the machine-specific routines
 | |
| correctly and that the Level 1, 2 and 3 BLAS test programs have run
 | |
| successfully, so we do not list any warnings associated with those
 | |
| routines.
 | |
| 
 | |
| \section{\texttt{LAPACK/make.inc}}
 | |
| 
 | |
| All machine-specific
 | |
| parameters are specified in the file \texttt{LAPACK/make.inc}.
 | |
| 
 | |
| The first line of this \texttt{make.inc} file is:
 | |
| \begin{quote}
 | |
| SHELL = /bin/sh
 | |
| \end{quote}
 | |
| and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
 | |
| installing LAPACK on an SGI architecture.
 | |
| 
 | |
| \section{ETIME}
 | |
| 
 | |
| On HPPA architectures,
 | |
| the compiler and loader flag \texttt{+U77} should be included to access
 | |
| the function \texttt{ETIME}.
 | |
| 
 | |
| \section{ILAENV and IEEE-754 compliance}
 | |
| 
 | |
| %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
 | |
| %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
 | |
| %and (\texttt{ISPEC=11}) settings in ILAENV.
 | |
| %
 | |
| %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
 | |
| %as this test inside ILAENV will crash!
 | |
| 
 | |
| As some new routines in LAPACK rely on IEEE-754 compliance,
 | |
| two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
 | |
| (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
 | |
| infinity arithmetic, respectively.  By default, ILAENV assumes an IEEE
 | |
| machine, and does a test for IEEE-754 compliance.  \textbf{NOTE:  If you
 | |
| are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
 | |
| as this test inside ILAENV will crash!}
 | |
| 
 | |
| Thus, for non-IEEE machines, the user must hard-code the setting of
 | |
| (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
 | |
| of \texttt{LAPACK/SRC/ilaenv.f} to be put in
 | |
| his library.  For further details, refer to section~\ref{testieee}.
 | |
| 
 | |
| Be aware
 | |
| that some IEEE compilers by default do not enforce IEEE-754 compliance, and
 | |
| a compiler flag must be explicitly set by the user.
 | |
| 
 | |
| On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
 | |
| flag to enable IEEE-754 compliance.
 | |
| 
 | |
| And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
 | |
| result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
 | |
| Thus, if the user is installing on a machine that issues IEEE exception
 | |
| warning messages (like a Sun SPARCstation), the user can disregard these
 | |
| messages.  To avoid these messages, the user can hard-code the values
 | |
| inside ILAENV as explained in section~\ref{testieee}.
 | |
| 
 | |
| \section{Lack of \texttt{/tmp} space}
 | |
| 
 | |
| If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
 | |
| architecture, you may run out of space
 | |
| when compiling.  There are a few possible solutions to this problem.
 | |
| \begin{enumerate}
 | |
| \item You can ask your system administrator to increase the size of the
 | |
| \texttt{/tmp} partition.
 | |
| \item You can change the environment variable \texttt{TMPDIR} to point to
 | |
| your home directory for temporary space.  E.g.,
 | |
| \begin{quote}
 | |
| \texttt{setenv TMPDIR /home/userid/}
 | |
| \end{quote}
 | |
| where \texttt{/home/userid/} is the user's home directory.
 | |
| \item If your archive command has an \texttt{l} option, you can change the
 | |
| archive command to \texttt{ar crl} so that the
 | |
| archive command will only place temporary files in the current working
 | |
| directory rather than in the default temporary directory /tmp.
 | |
| \end{enumerate}
 | |
| 
 | |
| \section{BLAS}
 | |
| 
 | |
| If you suspect a BLAS-related problem and you are linking
 | |
| with an optimized version of the BLAS, we would strongly suggest
 | |
| as a first step that you link to the Fortran~77 version of
 | |
| the suspected BLAS routine and see if the error has disappeared.
 | |
| 
 | |
| We have included test programs for the Level 1 BLAS.
 | |
| Users should therefore beware of a common problem in machine-specific
 | |
| implementations of xNRM2,
 | |
| the function to compute the 2-norm of a vector.
 | |
| The Fortran version of xNRM2 avoids underflow or overflow
 | |
| by scaling intermediate results, but some library versions of xNRM2
 | |
| are not so careful about scaling.
 | |
| If xNRM2 is implemented without scaling intermediate results, some of
 | |
| the LAPACK test ratios may be unusually high, or
 | |
| a floating point exception may occur in the problems scaled near
 | |
| underflow or overflow.
 | |
| The solution to these problems is to link the Fortran version of
 | |
| xNRM2 with the test program.  \emph{On some CRAY architectures, the Fortran77
 | |
| version of xNRM2 should be used.}
 | |
| 
 | |
| \section{Optimization}
 | |
| 
 | |
| If a large numbers of test failures occur for a specific matrix type
 | |
| or operation, it could be that there is an optimization problem with
 | |
| your compiler.  Thus, the user could try reducing the level of
 | |
| optimization or eliminating optimization entirely for those routines
 | |
| to see if the failures disappear when you rerun the tests.
 | |
| 
 | |
| %LAPACK is written in Fortran 77.  Prospective users with only a
 | |
| %Fortran 66 compiler will not be able to use this package.
 | |
| 
 | |
| \section{Compiling testing/timing drivers}
 | |
| 
 | |
| The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
 | |
| xTIMEE)
 | |
| allocate large amounts of local variables.  Therefore, it is vitally
 | |
| important that the user know if his compiler by default allocates local
 | |
| variables statically or on the stack.  It is not uncommon for those
 | |
| compilers which place local variables on the stack to cause a stack
 | |
| overflow at runtime in the testing or timing process.  The user then
 | |
| has two options:  increase your stack size, or force all local variables
 | |
| to be allocated statically.
 | |
| 
 | |
| On HPPA architectures, the 
 | |
| compiler and loader flag \texttt{-K} should be used when compiling these testing
 | |
| and timing main programs to avoid such a stack overflow.  I.e., set
 | |
| \texttt{DRVOPTS = -K} in the \texttt{LAPACK/make.inc} file.
 | |
| 
 | |
| For similar reasons,
 | |
| on SGI architectures, the compiler and loader flag \texttt{-static} should be
 | |
| used.  I.e., set \texttt{DRVOPTS = -static} in the \texttt{LAPACK/make.inc} file.
 | |
| 
 | |
| \section{IEEE arithmetic}
 | |
| 
 | |
| Some of our test matrices are scaled near overflow or underflow,
 | |
| but on the Crays, problems with the arithmetic near overflow and
 | |
| underflow forced us to scale by only the square root of overflow
 | |
| and underflow.
 | |
| The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
 | |
| take the square root of underflow and overflow in cases where it
 | |
| could cause difficulties.
 | |
| We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
 | |
| is greater than 2000
 | |
| and take the square root of underflow and overflow in this case.
 | |
| The test in SLABAD is as follows:
 | |
| \begin{verbatim}
 | |
|       IF( LOG10( LARGE ).GT.2000. ) THEN
 | |
|          SMALL = SQRT( SMALL )
 | |
|          LARGE = SQRT( LARGE )
 | |
|       END IF
 | |
| \end{verbatim}
 | |
| Users of other machines with similar restrictions on the effective
 | |
| range of usable numbers may have to modify this test so that the
 | |
| square roots are done on their machine as well.  \emph{Usually on
 | |
| HPPA architectures, a similar restriction in SLABAD should be enforced
 | |
| for all testing involving complex arithmetic.}
 | |
| SLABAD is located in \texttt{LAPACK/SRC}.
 | |
| 
 | |
| For machines which have a narrow exponent range or lack gradual
 | |
| underflow (DEC VAXes for example), it is not uncommon to experience
 | |
| failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
 | |
| The failures in SLAQTR/DLAQTR and DTRSYL
 | |
| occur with test problems which are very badly scaled when the norm of
 | |
| the solution is very close to the underflow
 | |
| threshold (or even underflows to zero).  We believe that these failures
 | |
| could probably be avoided by an even greater degree of care in scaling,
 | |
| but we did not want to delay the release of LAPACK any further.  These
 | |
| tests pass successfully on most other machines.  An example failure in
 | |
| dec.out on a MicroVAX II looks like the following:
 | |
| 
 | |
| \begin{verbatim}
 | |
| Tests of the Nonsymmetric eigenproblem condition estimation routines
 | |
| DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
 | |
| 
 | |
| Relative machine precision (EPS) =     0.277556D-16
 | |
| Safe minimum (SFMIN)             =     0.587747D-38
 | |
| 
 | |
| Routines pass computational tests if test ratio is less than   20.00
 | |
| 
 | |
| DEC routines passed the tests of the error exits ( 35 tests done)
 | |
| Error in DTRSYL: RMAX =   0.155D+07
 | |
| LMAX =     5323 NINFO=    1600 KNT=   27648
 | |
| Error in DLAQTR: RMAX =   0.344D+04
 | |
| LMAX =    15792 NINFO=   26720 KNT=   45000
 | |
| \end{verbatim}
 | |
| 
 | |
| \section{Timing programs}
 | |
| 
 | |
| In the eigensystem timing program, calls are made to the LINPACK
 | |
| and EISPACK equivalents of the LAPACK routines to allow a direct
 | |
| comparison of performance measures.
 | |
| In some cases we have increased the minimum number of
 | |
| iterations in the LINPACK and EISPACK routines to allow
 | |
| them to converge for our test problems, but
 | |
| even this may not be enough.
 | |
| One goal of the LAPACK project is to improve the convergence
 | |
| properties of these routines, so error messages in the output
 | |
| file indicating that a LINPACK or EISPACK routine did not
 | |
| converge should not be regarded with alarm.
 | |
| 
 | |
| In the eigensystem timing program, we have equivalenced some work
 | |
| arrays and then passed them to a subroutine, where both arrays are
 | |
| modified.  This is a violation of the Fortran~77 standard, which
 | |
| says ``if a subprogram reference causes a dummy argument in the
 | |
| referenced subprogram to become associated with another dummy
 | |
| argument in the referenced subprogram, neither dummy argument may
 | |
| become defined during execution of the subprogram.''
 | |
| \footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
 | |
| If this causes any difficulties, the equivalence
 | |
| can be commented out as explained in the comments for the main
 | |
| eigensystem timing programs.
 | |
| 
 | |
| %\section*{MACHINE-SPECIFIC DIFFICULTIES}
 | |
| %Some IBM compilers do not recognize DBLE as a generic function as used
 | |
| %in LAPACK.  The software tools we use to convert from single precision
 | |
| %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
 | |
| %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
 | |
| %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
 | |
| %imaginary parts of a double complex number.
 | |
| %IBM users can fix this problem by changing DBLE to DREAL when the
 | |
| %argument of DBLE is COMPLEX*16.
 | |
| %
 | |
| %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
 | |
| %subprogram definition.  The data type on the first line of the
 | |
| %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
 | |
| %for the following functions:
 | |
| %
 | |
| %\begin{tabbing}
 | |
| %\dent ZLATMOO \= from the test matrix generator library \kill
 | |
| %\dent ZBEG \> from the Level 2 BLAS test program  \\
 | |
| %\dent ZBEG \> from the Level 3 BLAS test program  \\
 | |
| %\dent ZLADIV \> from the LAPACK library \\
 | |
| %\dent ZLARND \> from the test matrix generator library \\
 | |
| %\dent ZLATM2 \> from the test matrix generator library \\
 | |
| %\dent ZLATM3 \> from the test matrix generator library
 | |
| %\end{tabbing}
 | |
| %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
 | |
| %declared DOUBLE COMPLEX.  If that doesn't work, try the declaration
 | |
| %COMPLEX FUNCTION*16.
 | |
| 
 | |
| 
 | |
| \newpage
 | |
| \addcontentsline{toc}{section}{Bibliography}
 | |
| 
 | |
| \begin{thebibliography}{9}
 | |
| 
 | |
| \bibitem{LUG}
 | |
| E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
 | |
| J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
 | |
| S. Ostrouchov, and D. Sorensen, 
 | |
| \textit{LAPACK Users' Guide}, Second Edition,
 | |
| {SIAM}, Philadelphia, PA, 1995.
 | |
| 
 | |
| \bibitem{WN16}
 | |
| E. Anderson and J. Dongarra,
 | |
| \textit{LAPACK Working Note 16: 
 | |
| Results from the Initial Release of LAPACK},
 | |
| University of Tennessee, CS-89-89, November 1989.
 | |
| 
 | |
| \bibitem{WN41}
 | |
| E. Anderson, J. Dongarra, and S. Ostrouchov,
 | |
| \textit{LAPACK Working Note 41: 
 | |
| Installation Guide for LAPACK},
 | |
| University of Tennessee, CS-92-151, February 1992 (revised June 1999).
 | |
| 
 | |
| \bibitem{WN5}
 | |
| C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
 | |
| S. Hammarling, and D. Sorensen,
 | |
| \textit{LAPACK Working Note \#5:  Provisional Contents},
 | |
| Argonne National Laboratory, ANL-88-38, September 1988.
 | |
| 
 | |
| \bibitem{WN13}
 | |
| Z. Bai, J. Demmel, and A. McKenney,
 | |
| \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
 | |
| Eigenvalue Problem:  Theory and Software}, 
 | |
| University of Tennessee, CS-89-86, October 1989.
 | |
| 
 | |
| \bibitem{XBLAS}
 | |
| X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
 | |
| W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
 | |
| and D. J. Yoo, \textit{Design, implementation and testing of extended
 | |
|   and mixed precision BLAS},
 | |
| \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
 | |
|  
 | |
| \bibitem{BLAS3}
 | |
| J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
 | |
| ``A Set of Level 3 Basic Linear Algebra Subprograms,''
 | |
| \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
 | |
| %Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
 | |
| 
 | |
| \bibitem{BLAS3-test}
 | |
| J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
 | |
| ``A Set of Level 3 Basic Linear Algebra Subprograms:
 | |
| Model Implementation and Test Programs,''
 | |
| \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
 | |
| %Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
 | |
| 
 | |
| \bibitem{BLAS2}
 | |
| J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
 | |
| ``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
 | |
| \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
 | |
| 
 | |
| \bibitem{BLAS2-test}
 | |
| J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
 | |
| ``An Extended Set of Fortran Basic Linear Algebra Subprograms:
 | |
| Model Implementation and Test Programs,''
 | |
| \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
 | |
| 
 | |
| \bibitem{BLAS1}
 | |
| C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
 | |
| ``Basic Linear Algebra Subprograms for Fortran Usage,''
 | |
| \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
 | |
| 
 | |
| \end{thebibliography}
 | |
| 
 | |
| \end{document}
 |