Saturday, November 14, 2009

Notes Intel Fortran


NotesIntelFortran
=====================

Contents
=========
msvcrtd.dll issue
dumpbin, editbin, stack
Fortran routines for DLL
Example Fortran routines for DLL called by C#
Fortran C# wrappers and data compatibility
Setup IMSL
Setup MKL
Setup - Intel Fortran / IMSL Environment Variables
Intel Fortran 10.1 and VS. Net 2005
Intel Fortran 11.0
Managed Code
BLAS, IMSL and MKL
Building DLLs (Fortran DLLs used in Fortran apps)
!DEC$ ATTRIBUTES directives
Passing module variables and functions in DLL
Best Practice
Errors - Debugging
How to Add Version and other Metadata to DLL or EXE
Using VTune
Compiler Options
Build Macros (eg $(OUTDIR))
Using MKL
Using LAPACK95 & General Comment on DLLs, LIBs, Mod files
Mixed language programming
Stack Checking
Enable Vectorization and Report
Enable OpenMP
Using Thread Profiler
Using Thread Checker
Profile Guided Optimization
Using code coverage


msvcrtd.dll issue
====================

***************
Performance Tools for Software Developers
libmmdd.dll is dependent on msvcrtd.dll which is no longer distributed.

Symptom(s):
Note: This only applies to the compilers for Intel® Extended Memory 64 Technology and for the Itanium® Architecture.
Applications or DLL's that are built with /MDd or directly link against Intel's libmmdd.dll may emit the runtime error.
This application has failed to start because msvcrtd.dll was not found. Re-installing the application may fix this problem.

Cause:
The Platform SDK distributed with Microsoft* Visual Studio* 2005 does not contain msvcrtd.dll. Using /MDd links against the Intel math library libmmdd.dll which has a dependency on msvcrtd.dll.

Solution:
This is a known issue that may be resolved in a future product release. As a work-around, use the msvcrtd.dll distributed with the Microsoft* Platform SDK available at http://www.microsoft.com/downloads/details.aspx?FamilyId=0BAF2B35-C656-4969-ACE8-E4C0C0716ADB&displaylang=en  † .
***************

May need to get msvcrtd.dll from somewhere to be put into c"\windows\system32"




dumpbin, editbin, stack
========================
To run these command line tools,
- go to "Start" button -> "All Programs"
  -> "Intel Software Development Tools"
  -> "Intel Compiler 8.0"
  -> "Build Environment for Fortran IA-32 Applications"

To check the stack size of a program.
Run "dumpbin /headers executable_file", and you can see the "size of stack reserve" information in "optional header values".

To enlarge the stack of a program:
Run "editbin /STACK: program.exe"



Alternatively
http://www.atalasoft.com/cs/blogs/rickm/archive/2008/04/22/increasing-the-size-of-your-stack-net-memory-management-part-3.aspx
The Easiest Way ( .NET 2.0 )
In .NET 2.0 and newer you can simply specify thread size in a thread’s constructor. Unfortunately, this method is only compatible only with Windows XP and newer operating systems. You can specify this parameter on those platforms but it will have no effect; the stack size in the binary header will be used.
using System.Threading;

Thread T = new Thread(threadDelegate, stackSizeInBytes);
T.Start();


Fortran routines for DLL
=========================

  Interface
    subroutine my_sub(I)
        !DEC$ ATTRIBUTES C, ALIAS:"_My_Sub" :: my_sub
        integer i
    end subroutine
  end interface


- Case Sensitive: Fortran is not, C/C++ is.
- Arrays are always passed by reference
- ATTRIBUTES for a routine may be: C, STDCALL, REFERENCE, VARYING
- ATTRIBUTES for an argument may be: VALUE, REFERENCE
- C or STDCALL makes passing all arguments by value, except arrays.
- the VALUE or REFERENCE argument options, overide the routine option
  of C or STDCALL.
- for IA-32 system, need to put underscore for routine to be called by C.
- cannot call internal procedures from outside the program unit that contains them.
- To pass Variable number of arguments, need C and VARYING, not STDCALL


Example Fortran routines for DLL called by C#
================================================
! Public wrapper for status_msg_get_code
integer*4 pure function StatusMsgGetCode(msg)
!DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_StatusMsgGetCode' :: StatusMsgGetCode
!DEC$ ATTRIBUTES REFERENCE :: msg
    StatusMsgGetCode = status_msg_get_code(msg)


- uses STDCALL (can't handle optional argument (see Intel F User Guide)
- uses alias with leading underscore
- wrap and rename code to get rid of underscore in function name, eg.
    status_msg_get_code  --> StatusMsgGetCode
- uses REFERENCE to pass arguments

Fortran C# wrappers and data compatibility
=============================================
This is best illustrated by example:

Fortran function in Fort.dll:
    subroutine Foo_dll()
    !DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_Foo_dll'  :: Foo_dll
    !DEC$ ATTRIBUTES REFERENCE ::
.........
    end subroutine

C# declaration
#if x64
        [DllImport("Fort.dll", EntryPoint = "_Foo_dll")]
#else
        [DllImport("Fort.dll", EntryPoint = "Foo_dll")]
#endif
        private static extern void Foo_dll();

public static void CallFoo_dll()
        {
........
Foo_dll();
        }

Below lists the Fortran to C# data type declarations, with X as the variable name:
Fortran C# C# C#
integer(4) X [In, Out] ref int    X ref int      X ref X
real(8)    X [In, Out] ref double X ref double X ref X
real(8)    X(N)     [In, Out] double[] X ref double[] X              X



Setup IMSL
===========
Documentation -
1. Start -> Programs -> IMSL Fortran Library 5.0. This contains:
QuickStart, Readme, User's Guide
2. PDF docs contains
Math Library V1, V2, Statistical Libraries, Special Functions

IMSL is not Thread safe. It is still safe to use, provided that calls to the
IMSL routines are made from a single thread.

VS.Net integration
1. In VS.Net, goto Tools -> Options -> Intel(R) Fortran -> Project Directories ->
type in the Include and Libraries directory path.
2. Specify the following include statements;
   include 'link_f90_static.h'
   include 'link_f90_dll.h'
   include 'link_f90_static_smp.h'
   include 'link_f90_dll_smp.h'
or go to Projects -> Add Existing Item ... browse to add the library.

The link*.h files contain directives to point to certain *.dll files. For example,
link_f90_dll.h contents are:
!dec$objcomment lib:'imsl_dll.lib'
!dec$objcomment lib:'imslscalar_dll.lib'
!dec$objcomment lib:'imslblas_dll.lib'

3. Inside the code, in addition to the include directives in step 2, need to include
some USE statements. For example, to use the random number generator rnun, we need:
i) use rnun_int; or
ii) use imsl_libraries; or
iii) use numerical_libraries

iii - is used to provide backward compatibility with previous IMSL libraries and Fortran77
version of the library. It may not be necessary to use iii and calling the functions as before
will continue to work.

Using ii provides access to all the IMSL functions, so individual use statements are not needed.
However, some may choose to use i because it shows explicitly which functions are called.

Using BLAS
1. Intel MKL Blas library used automatically when IMSL is linked with the
SMP (ie. multiprocessing) option.
2. See ia32 or ia64 Readme to link 3rd party blas with IMSL.

IMSL version 6.0
- IMSL is now THREAD SAFE
- Env Var - run ia32\bin\fnlsetup.bat .
- MUST remove old references, eg. include 'link_f90_dll.h'   (because new headers have diff name)
- MUST rename directory of older installations of IMSL, so that any old env vars cannot
  accidentally point to it.
- Add include statement in the relevant source files:
     include 'link_fnl_shared.h'       ! for dynamic dlls
include 'link_fnl_shared_hpc.h'   ! for dynamic dlls and SMP (OpenMP)
- Add include directory in VS.Net
    Project - Properties - Fortran - Include Directories: $(FNL_DIR)\ia32\include\dll
- Add library directory in VS.Net
    Project - Properties - Fortran - Library Directories: $(FNL_DIR)\ia32\lib
- Run the ASSURANCE tests provided by IMSL in ...\examples\eiat. Note that
  in run_test.bat, need to use %LINK_FNL_STATIC%


Setup MKL
==========
Linking to MKL can be done either statically *.lib or dynamically *.dll

For ia32 apps, when linking statically, link to mkl_c.lib or mkl_s.lib
For ia32 apps, when linking dynamically, link to these STATIC libs:
   mkl_c_dll.lib or mkl_s_dll.lib
that will provide interfaces to the correct DLLs.

For MKL v 10.0
- Major changes, MKL divided into layers: Interface, Threading, Computation, RTL.
- Support for 64bit via ILP64/LP64.
- Use of OpenMP for threading and MPI and Scalapack for distriubuted computing.
- Env Vars: The following variables would have been set by running tool/environment/mklvars32.bat
  $(MKLPATH) = root location of MKL directory - eg D:\programs\Intel\MKL\10.0...
  lib=$(MKLPATH)\ia32\lib
  include=$(MKLPATH)\include
  bin=$(MKLPATH)\ia32\bin
  LIBRARY_PATH=$(MKLPATH)\ia32\lib
  CPATH=$(MKLPATH)\include
  FPATH=$(MKLPATH)\include
- Visual Studio config
  Project -> Properties -> Linker -> General -> Add additional Library Directories
       $(MKLPATH10)\ia32\lib
  Project -> Properties -> Linker -> General -> Add additional Include Directories
$(MKLPATH10)\include
$(MKLPATH10)\interfaces\lapack95

- Linking
  Intel advises to link libguide and libiomp dynamically even if others are linked statically.
  Link items to consider:
      Interface:            Threading:              Computation:         RTL:         Description:
 mkl_intel_c_dll.lib   mkl_sequential_dll.lib      mkl_core_dll.lib           Dynamic, non-parallel, 32bit

  Actual linking done in code by using the !$dec attributes such as:
        !dec$objcomment lib:'mkl_intel_c_dll.lib'
   !dec$objcomment lib:'mkl_sequential_dll.lib'  
!dec$objcomment lib:'mkl_core_dll.lib'
   !dec$objcomment lib:'mkl_lapack95.lib'
- You are advised to link with libguide and libiomp dynamically even if other libraries are
linked statically. (MKL user guide, Chap 5)

- To use THREADED / PARALLEL / OPENMP Intel MKL, it is highly recommended to compile your code with the /MT
option. The compiler driver will pass the option to the linker and the latter will load
multi-thread (MT) run-time libraries.
- For multi-threading based on Intel OpenMP
      Interface:          
lib\mkl_intel_c_dll.lib
 Threading:        
lib\mkl_intel_thread_dll.lib,
bin\mkl_intel_thread.dll,  
 Computation:        
lib\mkl_core_dll.lib,
(many many bins....)
 RTL:        
lib\libguide40.lib, OR lib\libiomp5md.lib,
bin\libguide40.dll, OR bin\libiomp5md.dll









Setup - Intel Fortran / IMSL Environment Variables
=====================================================
user defined:
INCLUDE
C:\Program Files\VNI\CTT5.0\CTT5.0\INCLUDE\IA32;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\include\
LIB
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Lib\
PATH
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%PATH%;d:\DATA\UsercheeOnD\tools\NixTools\bin;%MSNET_C%\bin;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin;%PStill%;D:\Program\UnderstandF90\bin\pc-win95

system variables:
CTT_DIR
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%PATH%;d:\DATA\UsercheeOnD\tools\NixTools\bin;%MSNET_C%\bin;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin;%PStill%;D:\Program\UnderstandF90\bin\pc-win95
CTT_EXAMPLES
"C:\Program Files\VNI\CTT5.0\CTT5.0\examples\IA32"
CTT_FORTRAN_COMPILER
Intel(R) Fortran Compiler for 32-bit applications, Version 8.1
CTT_OS_VERSION
Microsoft Windows XP/2000/2003
F90
ifort
F90FLAGS
/w /I:"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32" /fpe:3 /nologo
FC
ifort
FFLAGS
/w /I:"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32" /fpe:3 /nologo
FP_NO_HOST_CHECK
NO
INCLUDE
C:\Program Files\VNI\CTT5.0\CTT5.0\INCLUDE\IA32;%INTEL_FORTRAN80%\ia32\include;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\include\
INCLUDE_DIR
"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32"
INTEL_FORTRAN80
C:\Program Files\Intel\Fortran\Compiler80
INTEL_LICENSE_FILE
C:\Program Files\Common Files\Intel\Licenses
KMP_DUPLICATE_LIB_OK
TRUE
LIB
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%INTEL_FORTRAN80%\ia32\lib;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Lib\
LIB_ARCH
IA32
LINK_F90
imsl_dll.lib imslscalar_dll.lib imslblas_dll.lib
LINK_F90_DLL
imsl_dll.lib imslscalar_dll.lib imslblas_dll.lib
LINK_F90_DLL_SMP
/Qopenmp /F6000000 /fpp imsl_dll.lib imslsmp_dll.lib mkl_c_dll.lib /link /nodefaultlib:libc.lib
LINK_F90_SMP
/Qopenmp /F6000000 /fpp imsl_dll.lib imslsmp_dll.lib mkl_c_dll.lib /link /nodefaultlib:libc.lib
LINK_F90_STATIC
imsl.lib imslscalar.lib imslblas.lib imsls_err.lib
LINK_F90_STATIC_SMP
/Qopenmp /F6000000 /fpp imsl.lib imslsmp.lib mkl_c_dll.lib imsls_err.lib /link /nodefaultlib:libc.lib
Path
d:\Program\Intel\VTune\CGGlbCache;d:\Program\Intel\VTune\Analyzer\Bin;d:\Program\Intel\VTune\Shared\Bin;C:\Program Files\PC Connectivity Solution\;c:\program files\vni\ctt5.0\ctt5.0\lib\ia32;%systemroot%\system32;%systemroot%;%systemroot%\system32\wbem;c:\program files\ibm\trace facility;c:\program files\personal communications;c:\program files\ati technologies\ati control panel;c:\program files\common files\adaptec shared\system;c:\program files\ibm\trace facility\;c:\program files\intel\fortran\idb80\bin;%intel_fortran80%\ia32\bin;c:\program files\host integration server\system;c:\program files\ibm\personal communications\;c:\progra~1\ca\shared~1\scanen~1;c:\program files\ca\sharedcomponents\scanengine;c:\program files\ca\sharedcomponents\caupdate\;c:\program files\ca\sharedcomponents\thirdparty\;c:\program files\ca\sharedcomponents\subscriptionlicense\;c:\progra~1\ca\etrust~1;C:\Program Files\MATLAB\R2007a\bin;C:\Program Files\MATLAB\R2007a\bin\win32;C:\Program Files\Common Files\Roxio Shared\DLLShared\;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE

VNI_DIR
C:\Program Files\VNI\CTT5.0\CTT5.0\..
VNI_F90_MSG
C:\Program Files\VNI\CTT5.0\CTT5.0\BIN\IA32


Intel Fortran 10.1 and VS. Net 2005
=====================================
Manually add this to SYSTEM VARIABLE -> Path from Control Panel

D:\Program\VNI\imsl\fnl600\IA32\LIB;
C:\Program files\MPICH2\bin;
D:\Program\Intel\Compiler\Fortran\10.1.013\Ia32\Bin;
C:\Program Files\Common Files\Intel\Shared Files\Ia32\Bin;
D:\Program Files\Microsoft Visual Studio 8\Common7\IDE;
D:\Program Files\Microsoft Visual Studio 8\VC\BIN;
D:\Program Files\Microsoft Visual Studio 8\Common7\Tools;
D:\Program Files\Microsoft Visual Studio 8\Common7\Tools\bin;
D:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\bin;


Manually add this to SYSTEM VARIABLE -> Lib from Control Panel
C:\Program files\MPICH2\LIB;%IFORT_COMPILER10%Ia32\Lib;%MSVS8%\VC\atlmfc\lib;%MSVS8%\VC\lib;%MSVS8%\VC\PlatformSDK\lib;%FNL_DIR%\IA32\lib;



Intel Fortran 11.0
===================
1. New: Floating Point Model, some are not compatible with Floating Point Speculation
2. New: OpenMP 3.0 standard included
3. New: Fortran 2003 features included
4. Some functions may fail -> use macro like CBAEXPMODTEST=1 to mark out certain things.
5. See Fortran User / Ref Guide -> Building Apps -> Using Libraries -> Using IMSL
6. IMSL Readme.txt -> KAPMR does not behave in thread safe manner.
        Use OpenMP critical region around KAPMR to be safe.









Managed Code
=============
Mixed-Language Programming and Intel Visual Fortran Project Types
This version of Intel Visual Fortran produces only unmanaged code, which is architecture-specific
code. You cannot create an Intel Visual Fortran main program that directly calls a
subprogram implementing managed code. To call managed code, you can call an unmanaged
code subprogram in a different language that does support calling managed code.


BLAS, IMSL and MKL
===================
Blas is implemented by IMSL - details are found in Chapter 9: Basic Matrix/Vector Operations.
Blas is also implemented by the hardware vendor - in this case Intel - in Intel's MKL library,
which may be written in machine code.

The BLAS API, i.e. the calling convention of the routines, are the same whether they are
implemented by MKL or IMSL. For example, SDOT is the routine that finds the dot product of two
vectors.

To use different implementation, the program has to link with different libraries.
For IMSL: imslblas_dll.dll
For MKL: mkl_p4.dll

By default, when using link_f90_dll.h, it include's IMSL's BLAS (see section "Setup IMSL")
By default, when using link_f90_dll_smp.h, it include's MKL's BLAS (see section "Setup IMSL")

If we want to use MKL without the SMP (parallel processing) feature, then instead of using
link_f90_dll.h, we have to manually add the directives and point to the correct BLAS, eg:

!dec$objcomment lib:'imsl_dll.lib'
!dec$objcomment lib:'imslscalar_dll.lib'
!dec$objcomment lib:'mkl_ia32.lib'

The DLL (*.dll) can be placed anywhere the system knows of, eg:
c:\windows\system32\ mkl_def.dll, mkl_p3.dll, mkl_p4.dll
(IMSL provides these 3 dlls from the MKL package)

The mkl_ia32.lib contain STATIC INTERFACES to dlls including BLAS, cblas, FFTs, VML.
However, there is no corresponding single mkl_ia32.dll. Instead it is spread over a few DLLs,
such as mkl_def.dll, mkl_vml_def.dll, mkl_lapack32.dll, etc.

If a function (eg vsinv from VML package of MKL) is included in the library mkl_ia32.lib,
but the dll does not exist, then the code WILL COMPILE. But during runtime, a fatal error
would occur because it cannot find and use the dll.

NOTE: the IMSL dlls and libs are installed in
C:\Program Files\VNI\CTT5.0\CTT5.0\lib\IA32



Building DLLs (Fortran DLLs used in Fortran apps)
=================================================

Note:
When a DLL is built the output are two files:
1) *.dll - has the library's executable code
2) *.lib - the import library providing interface between the program and the dll.

The notes here presents two cases:
Case A: DLL to be created in its own separate VS solution, called solnA, in project projA.
        The two generated output will be projA.dll and projA.lib
Case B: DLL to be created in a project (projB) in the same solution (solnB) , as the
        application project (projC).
(The application project contains the code that uses the DLL.)
The two generated output will be projB.dll and projB.lib


1. Build DLL project in its own solution
- Say we call this Solution solnFoo, and Project projFoo

Case A:
- From VS.Net - in new solution, create a new DLL project by:
  File -> New -> Project -> Intel Fortran projects -> Dynamic link library

Case B:
- From VS.Net - in existing solution, create a new DLL project by:
  File -> New -> Project -> Intel Fortran projects -> Dynamic link library


2. Write a subroutine and expose it, eg:
subroutine hello()
  !DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_hello' :: hello
    (do blah blah)
end subroutine hello

- put this subroutine by itself into a file (eg. hello.f90) or into
a module (eg hello_mod.f90)

- DLLEXPORT needed to expose the name of the routine
- alias is needed for compatibility with Intel Fortran and VS.NET environment


3. Build the DLL in VS.NET by:
- Build (menu) -> Build or Build Solution
- Copy the *.lib and *.dll files and put them into same directory as the
executable code for the application; i.e. same directory as projC.exe

4. Link the DLL via the lib file by:
- Go to the application project "program" file or "module" file and put this near the
start of the file:
CASE A:       !dec$objcomment lib:'projA.lib'
CASE B:       !dec$objcomment lib:'projB.lib'

CASE B only:
- Ensure that the dependencies eg projB is UNchecked in the Project Dependency dialog box of ProjC

- in the solution explorer in VS.NET, click on the application's
project name, eg projC.
- From the Project menu or right clicking on the project, go to "Add existing item ..."
- Browse and choose "projB.lib" to add. The lib file should appear under solution explorer.
- From the Project menu or right clicking on the project, go to "Project Dependencies..."
- Alternative to the "Add Existing item..." way is to specify through the linker by:
  with the project name highlighted, go to Project menu -> Properties -> Linker
  -> "Additional Library Directories" -> type in dir path where *.lib is located.

5. Add interface to DLL routine in the application.
- goto into the subroutine of the application and add the following:

module projC_app
    contains
    subroutine app()
interface
            subroutine hello()
            !DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_hello' :: hello
            end subroutine hello
        end interface
     end subroutine
end module

- DO NOT ADD the interface on the top level, eg DO NOT add in the starting part of a module. Instead
add the interface inside the module's subroutine that makes the call to the DLL routine.

- compile and run. Ensure that building mode is RELEASE, not DEBUG.


!DEC$ ATTRIBUTES directives
============================
1. C vs STDCALL - for controlling the stack of passed variables.
- both of these will try to make variables pass by value, rather than the Fortran default of
passing by reference.
- arrays are always passed by reference
- C -> the calling routine controls the stack. larger code.
- C -> possible to call variable number of arguments, MUST use "C, VARYING" to let
  Fortran know that multiple arguments are called.
- C -> is default in C/C++ code. to use with fortran code, either
  i) change the c code to STDCALL; or
     extern void __stdcall foo_f90(int n);
  ii) change the f90 code to use C convention
     !DEC$ ATTRIBUTES C :: foo_f90
- STDCALL -> the called routine controls the stack.

2. VALUE vs REFERENCE
- for fortran, C or STDCALL will change default to passing by value, except arrays which will
  be passed by reference
- But, each argument of the subroutine can be declared with VALUE or REFERENCE to override the
  default mode, eg:
     subroutine foo(a, b)
     !DEC$ ATTRIBUTES VALUE :: a
     !DEC$ ATTRIBUTES REFERENCE :: b


Passing module variables and functions in DLL
==============================================
Consider passing the variable 'foo' and calling method fooA() defined in a module 'mod_foo'

1. Expose the variable foo and fooA()
!DEC$ ATTRIBUTES DLLEXPORT :: foo
!DEC$ ATTRIBUTES DLLEXPORT :: fooA

Do not use ALIAS.

2. Build and Copy the following files from the DLL build directory to the application directory.
mod_foo.dll, mod_foo.lib, mod_foo.mod

3. In the application that uses 'foo', add the statement:
use mod_foo

This technique is only useful when both application and DLL are written in Fortran. The variable
names will have leading underscore "_". This is transparen to the user who uses "use mod_foo".
Such DLL are not convenient for DLLs that are to be used with other languages because of the leading
underscore on variable names.


Best Practice
==============
1. For optimized code:
- use /fast
- use "Configuration Properties -> Fortran -> Optimization -> Require Intel Processor Extension

2. To check for stack overflow
- /Qfpstkchk
- /Ge, /Gsn, /Fn

3. Fortran DLL structure
- Put constant data into a module, say mod_consts, and expose to data as:
!DEC$ ATTRIBUTES DLLEXPORT :: eps4
  Note: do not ALIAS
- Put subroutines into another module, say mod_funcs and expose data:
use mod_consts
subroutine blah()
!DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_testInt4'  :: testInt4
  Note: use alias so it is accessible outside
- Construct interface modules for application:
        module interface_mod
   use mod_consts
   interface
subroutine blah()
!DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_testInt4'  :: testInt4
      ............
- Include interface in the application
         use interface_mod

This technique allows other Fortran projects to make use of both data and functions in DLLs.
However, other languages will not be able to make use of the data directly (may need to have underscore
for variable names in the other languages calling this Fortran DLL).


Errors - Debugging
===================

General Sources:
"List of Run-Time Error Messages", Intel Visual Fortran compiler doc
- from Building Applications -> Error Handling -> Handling Run Time Errors ->

Cryptic LNK errors
1. When using a function from another place, eg DLL, etc; ensure that an "interface" block is written
for at the code which calls the function.
2. Ensure the library path is defined. Eg. In VS.Net -> right click project -> Properties -> Linker
-> General ->  "Additional Library Directories"

Access Violation
1. Passing integer*4 into a subroutine with parameter declared as integer*8
2. Subroutine A in a module is DLL exported. Another subroutine within the same project uses subroutine A from another module WILL cause a CONFLICT. Since it is being used within the same project, subroutine A need a wrapper which is NOT DLL exported. This wrapper can be called by other module subroutines within the same project.
3. When an ARRAY of derived type contains components which are also derived types, then it must be declared with fixed size (i.e. hardcode dimension) or the variable must be a dynamic array (i.e. declared ALLOCATABLE). It cannot be declared with size specified by a parameter.
eg.
function foo(a, b)
  real :: NestedDerivedTypeA(4)                ! GOOD
  real, allocatable :: NestedDerivedTypeB(:)   ! GOOD
  real :: NestedDerivedTypeA(b)                ! BAD
4. Crash pointing to problem with allocatable arrays which are used in OpenMP region. Message: "Subscript #x of the array has value xxxx which is greater than the upper bound of ..."
Reason: Known bug in Intel Fortran Compiler that occurs when code compiled using the /check:pointer option (under the Runtime category in project properties).


Derived Data Type - Nested
1. Complicated derived data types that involves nested derived types will not be able to be displayed in the debuger / variable watch space. The displayed numbers are grossly in error.


DLL not exposed properly -
When calling a function in a dll, but that function has not been exposed, then the following error may occur:
"The procedure entry point ..... could not be located in the dynamic link library ....dll"


VSL/MKL errors
Message:
MKL ERROR : Parameter 2 was incorrect on entry to vslNewStre
Cause:
using MKL, VSL, VML routines from intel, and having directives like:    
!dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'
   are missing the path to the ...mkl\ia32\lib
Solution:
In VS.Net, within the dll/console project that uses them, add the path to the library files in:
Project -> Properties -> Linker -> General -> Additional Library Directories

IMSL Errors
Message:
Error: There is no matching specific subroutine for this generic subroutine call.
Cause:
   IMSL documentation shows Fortran90 version with D_RNCHI, but unless using somehow, still obeying
Fortran77. So use Fortran77 name which is DRNCHI.
Solution:
        Instead of using Fortran90 style -> D_RNCHI
we use -> DRNCHI

ThreadChecker Errors:
Problem Description:We recently received several problem reports. If the size of user's application is extreme big, the user complained that the application (launched by Thread Profiler) ran slowly.
Cause:Thread Profiler's engine uses 600MB (default) in the heap. If the application also needs to consume higher memory space in the heap and the user works on lower hardware (memory) configuration, it causes this problem
Resolution:  Use Configure -> Modify -> "Execution" tab -> "Limit the size of the heap of the heap used by the analysis engine to [ ] MB", adjust to smaller number. Note when Thread Checker reaches the memory limit, it may discard older statistics, causing some loss of results.



How to Add Version and other Metadata to DLL or EXE
=====================================================
Assume platform is Intel Fortran 8.1 and VS.Net 2003, but may apply to later versions too.
1. Go to Solutions Explorer and right click on the project name.
2. Choose Add New Item. In the Add New Item dialog, choose resource. A resourceX.rc will be created in the "Resource Files" folder directly under the project directory. Perhaps if this file already exist, we can skip to the next step.
3. Double click to open the resourceX.rc file.
4. In the resourceX.rc file, right click on the name resourceX.rc and choose "Add resource..."
5. In the "Add Resource" dialog, choose Version.
6. Fill in the relevant versioning and metadata info that is required.
7. Then build the project.
8. Check by right-clicking on the dll or exe file.


Using VTune
============
To use VTune, the following needs to be set up:
1. From VS.NET -> Project -> Properties -> Linker -> Debug -> Generate Program Database File
.... ensure this pdb file is defined.
From VS.NET -> Project -> Properties -> Fortran -> Debugging -> Debug Information Format
.... Full(/Zi)


2. Put this "/FIXED:NO" in:
VS.NET -> Project -> Properties -> Linker -> Command Line -> Additional Options
.... this is to ensure that VTune's Call Graph can be used. This only applies to the executable project.

3. Application to Launch - select and app or driver/dll that is already running.
Call Graph - must specify application to Launch.
Sampling and Counter may select "No App to launch"

4. Counter Monitor - Intel recommend using this first.
- uses native Windows performance counters, eg. processor queue, memory, processor time
- Has the following info:
   - the Logged Data view
   - the Legend
   - the Summary view
   - the Data Table - click on Logged Data View first to access
- Two main monitors to check are:
   - %Processor Time: The closer to 100% the better. This is calculated by taking amount
     of time spent in the Idle thread and subtracting from 100%
   - System Processor Queue length - There is a single queue for processor time even on
     multiprocessor systems. This counter should be less than 2. It measures how many
threads are waiting to execute.
- Intel Tuning Advice - to get the advice, from the Logged Data View, highlight the
  section of the graph of interest. Then click on the Tuning Assistant button.
- Drill Down to Correlated Sampling Data View.
   - To use sampling data, need to collect sampling data when collecting counter data.

5. Sampling Mode
- Look at Samples or Events of CPU_CLK_UNHALTED.CORE --- CPU cycles when a core is active
This shows where most cpu cycles are used.

     Definitions:
CPU_CLK_UNHALTED.CORE
Event Code: Counted by fixed counter number 1
Category: Basic Performance Tuning Events;Multi-Core Events;
Definition: Core cycles when core is not halted.
Description: This event counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.
In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency.

INST_RETIRED.ANY
Event Code: Counted by fixed counter number 0
Category: Basic Performance Tuning Events;
Definition: Instructions retired.
Description: This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.

Clocks per Instructions Retired - CPI
Equation: CPU_CLK_UNHALTED.CORE / INST_RETIRED.ANY
Category: Basic Performance Tuning Ratios; Ratios for Tuning Assistant Advice;
Definition: High CPI indicates that instructions require more cycles to execute than they should. In this case there may be opportunities to modify your code to improve the efficiency with which instructions are executed within the processor. CPI can get as low as 0.25 cycles per instructions.

SAV = Sample After Value
This is the sampling frequency used for the sampling process. Typically it is 2,000,000.


Compiler Options
===================
/iface:[no]mixed_str_len_arg
Default: /iface:nomixed_str_len_arg

Specifies the type of argument-passing conventions used for general arguments and for hidden-length character arguments.
Possible values are:
/iface:mixed_str_len_arg: The hidden lengths should be placed immediately after their corresponding character argument in the argument list, which is the method used by Microsoft* Fortran PowerStation.
/iface:nomixed_str_len_arg: The hidden lengths should be placed in sequential order at the end of the argument list. When porting mixed-language programs that pass character arguments, either this option must be specified correctly or the order of hidden length arguments changed in the source code.

See also Programming with Mixed Languages Overview and related sections.

Compiling - Diagnostics.
To perform diagnostics such as using Vtune, Thread Profiler or Thread Checker, some of these options may be needed:
/Zi - include symbols   = /debug:full
/Od - disable debugging  
/fixed:no - linked to make code relocatable
/MDd - to build with thread safe libraries =   /libs:dll /threads /dbglibs




Build Macros (eg $(OUTDIR))
============================
See Intel Visual Fortran - User Guide - Volume I: - Building apps from MS Visual Studio.Net - Supported Build Macros.

Example:
   In the project properties - Linker - Output File, the value is "$(OUTDIR)/xxx.dll".
   The macro $(OUTDIR) has a value defined in:
       project properties - Output Directory
   Similarly $(INTDIR) is defined in
       project properties - Intermediate Directory

Using MKL
===========
1. CBA desktop PC - Pentium 4 CPU 3.8GHz
   - from intel website:
       CPU No.: 670;   90 nm;   Cache: 2 MB L2;   Clock Speed: 3.80 GHz;  FSB: 800 MHz
  Hyperthreading, Enhanced SpeedStep, Intel64 (need Bios and OS), ExecuteBit Enabled

2. Installation Directories:
   - c:\Program Files\intel\mkl\8.1.1
   - tools\environment -> mklvarsXXX.bat to build environment variables.
   - 3 options: ia32, em64t, ia64; within these are dlls and libs files
   - for the ia32 option, ia32\bin contain:
mkl_lapack_YY.dll, mkl_XXX.dll, mkl_vml_XXX.dll, mkl_ias.dll, libguide40.dll    
YY = 32,664
XXX = def, p3,p4, p4p, p4m
   - for the ia32 option, ia32\lib contain:
mkl_X.lib, mkl_X_dll.lib, mkl_lapack.lib, mkl_solver.dll, mkl_ia32.lib, libguide40.lib, libguide.lib
X = c (for c), s (for Fortran)

3. Configuring to use MKL
- at installation time, say yes to add vars to PATH, LIB, INCLUDE.
- alternatively, run mklvars32.bat

4. Using Fortran95 BLAS or LAPACK
    - Need to build from Intels sources, go to mkl\8.1.1\interfaces\blas95,lapack95
- nmake PLAT=win32 lib -> a *.mod file will be created
- or go to INCLUDE directory and: ifort -c mkl_lapack|blas.f90
- Or to make it in the user's directory:
 1. copy mkl\8.1.1\interfaces\blas95,lapack95 into
 2. copy from INCLUDE to these files: mkl_lapack|blas.f90
 3. run in the blas,lapack directories: nmake PLAT=win32 INTERFACE=mkl_blas|lapack.f90 lib
for 64 bit
    - nmake can be found in C:\Program Files\Microsoft Visual Studio 8\VC\bin\
- from the Start Menu, open Intel Visual Fortran Build Environment using Intel 64.
- nmake PLAT=win32e lib
- mod files will be automatically copied to ..../em64t


5. Linking to library:
a) see "Linking your application with Intel MKL" in "Getting Started with the Intel Math
Kernel Library 8.1.1 for Windows" for reference.
b) In VS.Net, go to Project menu -> Properties -> Linker -> General -> Additional Library Directories
   and put:
C:\Program Files\Intel\MKL\8.1.1\ia32\lib

6. Errors
a) Compile error:
SortProj1  error LNK2019: unresolved external symbol _VSLNEWSTREAM referenced in function _MAIN__.L
   Solution:
   1. put the following in the code at start of module or program, NOT subroutine or function
    use MKL_VSL_TYPE
    use MKL_VSL
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'
   2. Could also be sometimes need DLLIMPORT rather than DLLEXPORT, especially in RELEASE version????
   3. If the function is a Fortran95 function, such as gemv, then the solution is to "call dgemv.." rather
      than "call gemv..."

b) Runtime error:
MKL ERROR: Parameter 2 was incorrect on entry to vslNewStre
   Solution:
   In VS.Net, go to Project menu -> Properties -> Linker -> General -> Additional Library Directories
   and put:
C:\Program Files\Intel\MKL\8.1.1\ia32\lib

7. Prerequisite Directories - these need to be put in Project -> Properties or command line or etc...
  1. Include Directories: C:\Program Files\Intel\MKL\8.1.1\include
  2. Library Directories: C:\Program Files\Intel\MKL\8.1.1\ia32\lib
  3. Put the following line in the start of one of the source code, before the program or module keyword.
 include 'mkl_vsl.fi'    ! This is a full-fledged module by MKL
  4. Put the following at the start of a module or program, not within a function or subroutine
    use MKL_VSL_TYPE
    use MKL_VSL
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'    implicit none


Using LAPACK95 & General Comment on DLLs, LIBs, Mod files
==========================================================
   To illustrate the usage of Lapack functions with Fortran95 interface,
 suppose we want to use subroutine GESV
Fortran77 call: sgesv, dgesv, cgesv, zgesv
Fortran95 call: gesv

gesv is an Interface in mkl_lapack.f90(module MKL95_LAPACK)
gesv interface overloads wrappers like DGESV_MKL95, etc....

Only two items are needed by the user -> *.lib and *.mod

DLL
- not needed because we will be using explicit interfaces.
- Also F95 lapack routines have optional arguments which REQUIRE interfaces (eg gesv).

LIB
- mkl_lapack95.lib needed (created once off by administrator or first user)
- Use in the code as:
!dec$objcomment lib:'mkl_lapack95.lib'
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'  
- Don't need
!dec$objcomment lib:'mkl_lapack.lib'
- must be linked during compile time either
    i) ifort ..... mkl_lapack95.lib; or
ii) specify the path in "Additional Library Directories"

MOD
- mkl95_lapack.mod needed (created once off by administrator or first user from mkl_lapack.f90)
- contains the collection of interfaces to be used in the code by having:
USE MKL95_LAPACK
- must be present during compile time in the directory path of either:
    i) same location as application source files.f90
ii) INCLUDE directories as specified in VS.Net as "Additional Include Directories"


Mixed language programming
============================
Hi Clinton,
It looks like library format incompatibility problem. We adhere to microsft format.
Please follow following steps as a work-around ;
Once you generate .dll from intel FORTRAN compiler; follow the following steps,

1. D:\>pedump /exp MatlabFunctions.dll > MatlabFunctions.exp

D:\>notepad MatlabFunctions.exp (Edit this file and replace MATEXP with _MATEXP)

D:\>buildlib MatlabFunctions.exp MatlabFunctions.lib

D:\> lcc hello.c

D:\>lcclnk hello.obj MatlabFunctions.lib

D:\>hello.exe

Stack Checking
===============
Checking and Setting Space
The following options perform checking and setting space for stacks (these options are supported on Windows only):

The /Gs0 option enables stack-checking for all functions.
The /Gsn option checks by default the stack space allocated for functions with more than 4KB.
The /Fn option sets the stack reserve amount for the program. The /Fn  option passes /stack:n to the linker.



Enable Vectorization and Report
================================
To enable automatic vectorization, use these switches:
   /Qx...  or /Qax....
To enable report, use:
   /Qvec-report....


Enable OpenMP
==============
1. To enable openMP;
  by Command line: /Qopenmp /Qfpp
  by VS.net:  Project -> Properties -> Preprocessor -> OpenMP conditional compilation -> Yes
              Project -> Properties -> Preprocessor -> Preprocess source file -> Yes (/fpp)
              Project -> Properties -> Language -> Process OpenMP directives -> Generate Parallel code (/Qopenmp)

Note: preprocessor must be enabled for the OpenMP directives to be processed.

2. For diagnostic report:
   by Command line: /Qopenmp-report

3. Compile OpenMP but in sequential mode;
   by Command line: /Qopenmp-stubs

or to Compile for single thread, use the preprocessor /Qfpp, but not the OpenMP /Qopenmp.

4. DO NOT USE /Qprof-genx with OpenMP - spurious errors like array out of bounds will result.

5. To use OpenMP functions like, omp_get_num_threads(), instead of using
     include "omp_lib.h",
   better to use:
        external omp_get_num_threads
        integer omp_get_num_threads


Using Thread Profiler
=======================
1. Compiler options to enable Thread Profiling:
a) /Zi         - full debugging format
b) /fixed:no   - linker option to make code relocatable
c) /MDd        - option tells the linker to search for unresolved references
               in a multithreaded, debug, dynamic-link (DLL) run-time library.
               This is the same as specifying options /libs:dll /threads /dbglibs.
d) /Qopenmp-profile - enable profiling of OpenMP.
   WARNING: this option should not be used with IMSL since IMSL will link to libguide or libguide40, but
   this option creates code that will link to libguide_stats or libguide40_stats


Using Thread Checker
=====================

Add the following library path:
     .....VTune\Analyzer\Lib
without this compiling error occurs stating that libassuret40.lib was not found.

Options for Thread Checker
/ZI, /Z7 (Fortran - General - Debug Information Format - Full)
/Od (Fortran - Optimization - Optimization - Disable)
/libs:dll /threads /dbglibs (Fortran - Libraries - Runtime Library - debug Multithreaded Dll)
/Qtcheck - to enable use by Thread Checker

To run Intel Thread Checker, run VTune first in a NEW project. When the VTune is finished analysing, then run thread checker from the SAME project, by running as a NEW Activity.

Troubleshooting:
- ensure /Qtcheck is only on the EXECUTABLE, not other dlls.
- check working directory is correct.
- when EXECUTABLE has /Qtcheck, it cannot be run from console mode.


Profile Guided Optimization
============================
This is a 3 step process:
1. Compile with /Qprof-gen. Using /Qprof-genx allows Code Coverage tool to be used.
   DO NOT USE /Qprof-genx WITH OPENMP.
   Note: For Code Coverage, new option is /Qprof-gen:srcpos

2. Run the code one or many times with different data sets.
   This will create .dyn files.
3. Compile with /Qprof-use. This uses the .dyn file created in step 2.
4. Usually specify /O2 in step 1, and more aggresive /Qipo in step 3.
5. Need the following:
C:\Program Files\Microsoft Visual Studio 8\Common7\IDE;
C:\Program Files\Microsoft Visual Studio 8\VC\BIN;
C:\Program Files\Microsoft Visual Studio 8\Common7\Tools;
C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\bin;
C:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\bin;
msvcr80d.dll -> C:\Program Files\Microsoft Visual Studio 8\VC\redist\Debug_NonRedist\x86\Microsoft.VC80.DebugCRT


Using code coverage
====================
Ref: Intel_compiler_code-coverage.pdf

To use code coverage which is available for Intel compilers, the code needs to be prepared during compilation, then the application need to be run. The following is the general method.

1. Compile source code with /Qprof-gen:srcpos option.
By default, pgopti.spi, a static profile file is created. This name can be changed using the -prof-file option.
2. Run the application. This will create multiple dyn files for dynamic profile information.
3. Use the profmerge tool to merge dyn files into pgopti.dpi file.
     profmerge -prof_dpi
4. Run code coverage using both static and dynamic files
     codecov -spi -dpi
5. The results are published into CODE_COVERAGE.HTML

Note that these commands should run in the same directory as the source code and execution directory.



NOTES IIS web server


IIS - Internet Information Services

Contents
=========
Install
Configuration
Start-Stop
Tutorials
Test IIS Working



Install
========
1. Installed as part of operating system. Go to:
Control Panel -> Add / Remove Programs -> Add Windows components -> IIS

2. Versions - IIS version is tied to the Operating System.
IIS 5 - is associated with Windows 2000 (all versions)
IIS 5.1 - is associated with Windows XP Professional
IIS 6 - is associated with Windows .NET Server
IIS versions 3 and 4 are designed for Windows NT 4.0,
(technical support for this is expected to be terminated by the end of 2002)[1].

3. Security Issues (Hardening)
Ref: http://www.windowsecurity.com/articles/Installing_Securing_IIS_Servers_Part1.html

4. On a Command Console:
- goto c:\Windows\Microsoft.Net\Framework\vx.xxxx
- type: aspnet_regiis -i


Configuration
==============
1. To open the configuration box:
i) Control Panel -> Administrative services -> Internet Information Services
ii) c:\windows\system32\inetsrv\iis.msc

2. For each ASP.NET web application, ensure folders have Read and Execute permissions

Start-Stop
===========
To control the start/stop/automatic-startup ->
Control panel -> administrative tools -> Component Services -> services -> IIS Admin


Tutorials
==========
IIS online getting started: http://localhost/iishelp/iis/misc/default.asp

Test IIS Working
=================
To test that IIS is working with asp pages:
1. Open browser.
2. Browse to URL: http://localhost/localstart.asp
3. If the Welcome to IIS page is not displayed, then check that IIS is working.
4. IIS control panel: system32\inetsrv\iis.msc
5. Under IIS -> ComputerName -> Websites: Check that "Default Website" is "running" state.

Notes Ant


Introduction
Install
Running Ant
Structure
Example
Using Ant with Eclipse


Introduction
============
- Ant is like a Makefile. The configuration is called by default as
"build.xml"
- Ant is a Java program.
- Requires xml parsers and Java installed


Reference: http://ant.apache.org/manual/index.html



Install
==========
1. Unpack ANT into a directory eg. c:\ant.
2. Set Env variables:

Windows and OS/2
Assume Ant is installed in c:\ant\. The following sets up the environment:

set ANT_HOME=c:\ant
set JAVA_HOME=c:\jdk1.2.2
set PATH=%PATH%;%ANT_HOME%\bin

(Start - > Control Panal -> System -> Advanced ->Env Variables)

Unix (bash)
Assume Ant is installed in /usr/local/ant. The following sets up the environment:

export ANT_HOME=/usr/local/ant
export JAVA_HOME=/usr/local/jdk-1.2.2
export PATH=${PATH}:${ANT_HOME}/bin

(/etc/profile.d/*.sh)


Running Ant
============
To run:
ant -buildfile

ant -buildfile test.xml -Dbuild=build/classes dist
runs Ant using the test.xml file in the current directory,
on the target called dist, setting the build property to
the value build/classes.


Structure
==========
Structure of build.xml:
attribs: name, default (target), basedir
Children: , ,

attribs: name, depends, if, unless, description
Children: various tasks

Tasks
attribs: id

attribs: name, value, file, location, url, refid, resource,
environment, classpath, classpathref, prefix
Example
========
simple example build file



The following allows ant to make use of local environment variables







Using Ant with Eclipse
=======================
1. Create a new xml in Eclipse, to be the ant build file, eg. build.xml
2. Open the build.xml file with Eclipse editor
3. Edit the xml file, note there is context sensitive help in Eclipse.
4. To run the build.xml: Run As -> Ant Build, then select targets
to be build.

Notes 64bit

Notes64bit

Contents
============
References
Definition
Article - The 64-Bit Advantage
Article - x86: registered offender
Itanium2
Feature Comparison
Registers
AMD K8 vs Conroe FPU
Porting to a 64-bit Intel® architecture
How to check if code is 32bit or 64bit
Limitations
Large Arrays



References
=============
http://www.intel.com/cd/ids/developer/asmo-na/eng/197664.htm?page=5


Definition
===========
Ref: http://en.wikipedia.org/wiki/64-bit
"64-bit" computer architecture generally has integer registers that are 64 bits wide, which allows it to support (both internally and externally) 64-bit "chunks" of integer data.



Size to consider are: registers, address buses, or data buses.


Most modern CPUs such as the Pentium and PowerPC have 128-bit vector registers used to store several smaller numbers, such as 4 32-bit floating-point numbers. A single instruction can operate on all these values in parallel (SIMD). They are 128-bit processors in the sense that they have 128-bit registers and in some cases a 128-bit ALU, but they do not operate on individual numbers that are 128 binary digits in length.


Article - The 64-Bit Advantage
===============================
Ref: http://www.pcmag.com/print_article/0,3048,a=116259,00.asp

The 32-bit Pentium-class chips that dominate today's desktops fetch and execute instructions from system memory in 32-bit chunks; 64-bit chips handle 64-bit instructions. And that's just what the workstation-class Intel Itanium 2 and HP Alpha chips do inside the TeraGrid's clusters.

New desktop-class 64-bit chips, such as the AMD Athlon64 and the Apple/ IBM PowerPC G5, can handle 64-bit instructions as well, but most PC apps—even the few that optimize some operations to exploit 64-bit processing—still rely on 32-bit instructions. A new generation of games and apps will no doubt take fuller advantage of 64-bit chips. But their ability to harness the new architecture fully may be hampered by the need to interact with Windows, since none of the desktop versions of the OS is yet slated for 64-bit optimization.

A major advantage to 64-bit processors over their 32-bit cousins is support for greater amounts of memory. In theory, a 64-bit processor can address exabytes (billions of billions of bytes) of RAM; 32-bit chips can use a maximum of 8GB of RAM. This breakthrough is used to good advantage at the National Center for Supercomputing Applications' (NCSA) TeraGrid, which allocates 12GB of system memory each to half of its 256 Itanium 2 processor nodes. It will be a while before anyone knows how fast Quake would run with that much memory, since PC motherboards don't exceed 8GB of RAM.

Future 64-bit apps will be able to chew on a class of computations known as floating-point operations far faster than 32-bit apps can. Necessary for 3-D rendering and animation of everything from molecular models to Halo aliens, floating-point calculations are so essential to complex scientific analysis that FLOPS (floating-point operations per second) are used as the unit of supercomputing performance. The ability of 64-bit chips to process floating-point operations faster and far more precisely than their 32-bit counterparts make them powerhouses for simulations and visualization.


Article - x86: registered offender
===================================
Ref: http://techreport.com/reviews/2005q1/64-bits/index.x?pg=2

"Another problem with the x86 ISA is the number of general-purpose registers (GPRs) available. Registers are fast, local slots inside a processor where programs can store values. Data stored in registers is quickly accessible for reuse, and registers are even faster than on-chip cache. The x86 ISA only provides eight general-purpose registers, and thus is generally considered register-poor. Most reasonably contemporary ISAs offer more. The PowerPC 604 RISC architecture, to give one example, has 32 general-purpose registers. Without a sufficient number of registers for the task at hand, x86 compilers must sometimes direct programs to spend time shuffling data around in order to make the right data available for an operation. This creates overhead that slows down computation.

To help alleviate this bottleneck, the x86-64 ISA brings more and better registers to the table. x86-64 packs 8 more general-purpose registers, for a total of 16, and they are no longer limited to 32-bit values—all 16 can store 64-bit datatypes. In addition to the new GPRs, x86-64 also includes 8 new 128-bit SSE/SSE2 registers, for a total of 16 of those. These additional registers bring x86 processors up to snuff with the competition, and they will quite likely bring the largest performance gains of any aspect of the move to the x86-64 ISA.

What is the magnitude of those performance gains? Well, it depends. Some tasks aren't constrained by the number of registers available now, while others will benefit greatly when recompiled for x86-64 because the compiler will have more slots for local data storage. The amount of "register pressure" presented by a program depends on its nature, as this paper on 64-bit technical computing with Fortran explains:

The performance gains from having 16 GPRs available will vary depending on the complexity of your code. Compute-intensive applications with deeply nested loops, as in most Fortran codes, will experience higher levels of register pressure than simpler algorithms that follow a mostly linear execution path. "

Summary -
x86 - 8x 32-bit General Purpose Registers
x86-64 - 16x 64-bit General Purpose Registers
Fortran - more do loops need bigger and more GPRs


Itanium2
=========
Ref: http://www.itmanagersjournal.com/feature/8611

Intel's Itanium 2, or IA64, is unlike any of the other 64-bit processors in production. It uses a Very Long Instruction Word (VLIW) design that depends on the software's compiler for performance. When the compiler creates program binaries for the Itanium 2, it predicts the most efficient method of execution, so the processor does less work when the program is running -- the software schedules its own resources beforehand, rather than forcing the hardware to do it on the fly. IA64 is used in the same kinds of workstations that UltraSPARC processors are used in, and can also scale up to 128 processors in high-powered servers. Silicon Graphics and Hewlett-Packard both sell computers based on the Itanium 2. GNU/Linux is generally the operating system of choice for IA64-based systems, but HP-UX and Windows 2003 Server will work on HP Itanium 2 servers.



Feature Comparison
==================

Size of Fetch and Execute Instructions - 64bit vs 32bit chunks
Number of General Purpose Registers (Fetch Registers?)
Memory Access - 18.4x10^9 GB vs 4GB
Floating Point Operations - faster with 64bit than 32bit

Vector Registers - eg Pentium has 128bit data registers which store up to 4 32bit data register.
ALU
FPU


Itanium2
- good for FP processing
- 2FPU (=1 FMAC or 2multiplication and 1 add), plus additional 2 FMACs for 3D processing.
- 64 bit address space
- a derivative of VLIW, dubbed Explicitly Parallel Instruction Computing (EPIC). It is theoretically capable of performing roughly 8 times more work per clock cycle than a non-superscalar CISC or RISC architecture due to its Parallel Computing Microarchitecture.
- support 128 integer, 128 floating point, 8 branch and 64 predicate registers (for comparison, IA-32 processors support 8 registers and other RISC processors support 32 registers


UltraSparc T1
- 8x Integer Cores share 1 FPU
- good for integer processing compared to Itanium


Throughout its history, Itanium has had the best floating point performance relative to fixed-point performance of any general-purpose microprocessor. This capability is not needed for most enterprise server workloads. Sun's latest server-class microprocessor, the UltraSPARC T1 acknowledges this explicitly, with performance dramatically skewed toward the improvement of integer processing at the expense of floating point performance (eight integer cores share a single FPU). Thus Itanium and Sun appear to be addressing separate subsets of the market. By contrast, IBM's cell microprocessor, with a single general-purpose POWER core controlling eight simpler cores optimized for floating point, may eventually compete against Itanium for floating-point workloads.


Registers
==========
Integer - can be used to store pointers
Floating Point - most CPUs also have FPUs
Other

examples:
x86 - has x87 FPU with 8 x 80bit registers
x86 with SSE - 8x 128bit FP registers
x86-64 - has SSE with 16x 128bit FP registers
Alpha - has 32x 64bit FP registers and 32x 64bit integer registers.
Itanium2 - 128x 64bit GPRs, 128x 82bit FPregisters, 64x 1bit predicates, 8x 64bit branch registers

AMD K8 vs Conroe FPU
======================
Possibly better floating point performance of K8 processors

http://www.xbitlabs.com/articles/cpu/display/amd-k8l_5.html


Porting to a 64-bit Intel® architecture
============================================
(ref: http://www.developers.net/intelisnshowcase/view/358)

Porting Application Source Code
The most significant issues that software developers should face in porting source code to the 64-bit world concern the changes in pointer size and fundamental integer types. As such, these differences should appear most prominently in C and C++ programs. Code written in Fortran, COBOL, Visual Basic, and most other languages (except assembly language, which must be completely rewritten), will need no modification. A simple recompilation is often all that is needed. Java code should not even need recompilation; Java classes should execute the same on a 64-bit JVM as on any 32-bit virtual machine.

C (from here on, C++ is included in all discussions of C) code, however, by allowing casting across types and direct access to machine-specific integral types will need some attention.

The first aspect is the size of pointers; 64-bit operating systems use 64-bit pointers. This means that the following will equal eight (and no longer four):

sizeof (ptrdiff_t)

As a result, structures that contain pointers will have different sizes as well. As such, if data laid out for these structures is stored on disk, reading it in or writing it out will cause errors. Likewise, unions with pointer fields will have different sizes and can cause unpredictable results.

The greatest effect, though, is felt wherever pointers are cast to integral types. This practice, which has been condemned for years as inimical to portability, will come back to haunt programmers who did not abandon it. The problems caused by it are traceable to the different widths used by pointers, integers and longs on the various platforms. Let's examine these.


How to check if code is 32bit or 64bit
========================================
use the dumpbin utility and look for the output under FILE HEADER VALUES.
eg.
dumpbin /headers hello.exe

Results if 64 bit:
FILE HEADER VALUES
8664 machine (x64)

Results if 32 bit:
FILE HEADER VALUES
14C machine (x86)

Limitations
=============
Virtual Address Limit - theoretical 16EB
Virtual Address Limit - practical
i) Windows use 44bits -> 16TB, apparently allow only 8TB to be used.


Large Arrays
=============
http://episteme.arstechnica.com/eve/forums/a/tpc/f/6330927813/m/420003239831/r/308002539831

BigArray, getting around the 2GB array size limit
http://blogs.msdn.com/joshwil/archive/2005/08/10/450202.aspx