Upload
vutuyen
View
237
Download
0
Embed Size (px)
Citation preview
Fortran 90/95/2003.Comentarios y Trucos.
Carlos Labra os invita al Café de CIMNE:
16 de Julio de [email protected]
Introducción
Principales Diferencias
Lenguaje
Estandarización de código
Cosas importantes
Cosas útiles
Fortran-C++
Optimización
Futuro
Condiciones para el Código
Fácil de leer
Fácil de modificar (flexible)
�
Multi-plataforma
Eficiente
...
Principales Diferencias
Formato del archivo
Manejo dinámico de memoria
Interfases
Tipos definidos por usuario
Módulos
Módulos intrínsecos
Tipos de rutinas
Fortran 77 vs. Fortran 95/2003SUBROUTINE RUN77(NCOORD)
�
REAL*8 COORD(3,NCOORD), POINT(3), PMIN(3), PMAX(3), & &DISTS(NCOORD), DIST, RADIUS, T1, T2, A, B, C
INTEGER*4 RESS(NCOORD), RES, NRES, I, J, NSRCH, NSRCHNCHARACTER*256 LINE
OPEN(UNIT=22,FILE='data250000b.pts')
�
READ(22,*) IDO I = 1, NCOORD READ(UNIT=22,FMT=*,ERR=31) COORD(1,I), COORD(2,I), COORD(3,I)
�
END DO
RADIUS = 16.0
DO I = 1, 3 POINT(I) = (PMIN(I)+PMAX(I)) / 2.0END DO
CALL BCRT3D(BIN,NCOORD,COORD)
�
WRITE(*,100) T1100 FORMAT('BINS_77 - GENERATION :',F12.6)
�
NSRCH = 10000
CALL TIMBEG()
�
DO I = 1, NSRCH CALL BSRD3D(BIN,POINT,RADIUS,RESS,DISTS,NCOORD,NRES)
�
END DOCALL TIMEND(T1)
�
WRITE(*,101) T1,NRES101 FORMAT('BINS_77 - SEARCH IN RADIUS :',F12.6,' (NRES:',I6,')')
�
END SUBROUTINE
Fortran 77 vs. Fortran 95/2003use kinds, only: dpuse class_point, only: point_tuse kd_tree ! kd_tree_t, create, search_radius
integer :: max_points, n_searchtype(kd_tree_t) :: treetype(point_t) :: pointtype(point_t), dimension(:), allocatable :: pointstype(point_t), pointer :: nearest_pointinteger, dimension(:), allocatable :: resultsreal(dp), dimension(:), allocatable :: distances
open(unit=fid,file=trim(filename),action="read",iostat=stat)oif(stat/=0) then stop 'file not found.'end if
read(fid,*) max_pointsallocate( points(max_points), results(max_points), distances(max_points), iresults(max_points))
do i = 1, max_points read(fid,*) points(i)%coordend do
call create(tree, max_points, points, bucket_size=20)cwrite(*,”('BINS_90 – GENERATION :',F12.6)”) T1
n_search = 10000call cpu_time(t1)cdo i = 1, n_search nres = search_radius( tree, point, radius2, results, distances, max_points )nend docall cpu_time(t2)cwrite(*,”('BINS_90 – SEARCH IN RADIUS :',F12.6)”) t2-t1
Formato de archivo
Formato libre132 caracteres por linea
Nombres de variables31 caracteres por nombre
Mas de una rutina por archivo -> agrupar por usos
... ... call get_stress_variable( variable_1, variable_2 )c ... ...
Interfases
Definición explicita de rutinasEquivalente a prototipos de C/C++
Ayuda a compilación ( tipo de argumentos )A
No siempre necesaria
Sobrecarga de rutinasInterfases genéricas
Definición de operadores
interface subroutine get_stress_variable(variable_1,variable_2) import :: dp real(dp), dimension(:), intent(out) :: variable_1 real(dp), dimension(:), pointer :: variable_2 end subroutine get_stress_variableend interface
interface get_variable module procedure get_variable_integer module procedure get_variable_real4 module procedure get_variable_real8end interfaceinterface operator(+)i module procedure sum_sparse_matrixend interface
Tipos definidos por usuario
Usos específicos
Puede ayudar a performanceCálculos con variables locales
Menos argumentos en rutinas
type sparse_matrix_t [ private / sequence ] integer :: n integer :: nnz real(dp), dimension(:), allocatable :: values integer, dimension(:), allocatable :: i_index integer, dimension(:), allocatable :: j_indexend type sparse_matrix_t
Módulos
Encapsulamiento de variables y
rutinasEquivalencia a clase de C++?
Interfases automáticas
Definición de operadores
Variables globales
Posibilidad de rutinas locales
privadas
module NAME use other_module implicit none private/public [ variables ] [ interfaces ] contains [ routines ] end module [NAME]
module sparse_matrix use kinds, only: dp, zero implicit none private type sparse_matrix_t ... end type sparse_matrix_t
interface asignment(=)
�
module procedure sparse_to_dense end interface interface operator(+)
�
... interface allocate ... real(dp), save :: memory_tolerance = 0.2_dp ... public:: sparse_matrix_t, memory_tolerance public:: allocate, deallocate, resize, & matmul, maxval, maxloc, operator(+), & operator(*), assignment(=)
�
contains subroutine sp_allocate(...)
�
subroutine sp_deallocate(...)
�
subroutine sparse_to_dense(...)
�
function maxval(...)
�
function maxloc(...)
�
... end module sparse_matrix_t
Módulos
Módulos intrinsecos:ISO_C_BINDING
• Combinación de lenguajes (C_TYPES, C_PTR, C_FUNPTR, C_LOC, …)
ISO_FORTRAN_ENV• Utilizades para I/O (INPUT_UNIT,ERROR_UNIT,IOSTAT_EOR,IOSTAT_END,…)
IEEE_ARITHMETIC
IEEE_EXCEPTIONS
IEEE_FEATURES• Manejo de operaciones en punto flotantes
Tipos de rutinas
RecursivePuntero a rutina
PureCálculos simples
Sin punteros
Array como argumento
ElementalOperaciones elementales
de array
Solo escalares
Rutinas implicitas
recursive subroutine get_node( tree, value1 ) ... call get_node( tree%left, value1 )c ...end subroutine
pure subroutine calculate_mass( volumen, rho )
�
... ...end subroutine
elemental subroutine swap( value1, value2 )
�
real(dp), intent(inout) :: value1, value2 real(dp) :: tmp tmp = value1; value1 = value2; value2 = tmpend subroutine
... call swap( array1(1:n), array2(1:n) )
�
...
Manejo de array Memoria dinámica
Tipos:Scalar pointer real(dp), pointer :: scalar
Array pointer real(dp), pointer, dimension(:,:) :: array
Allocatable real(dp), allocatable, dimension(:,:) :: array
Comportamientos distintos !!Tiempo de ejecución distinto ( Cálculos mejor en estático )
�
Estructura interna distinta ( Descriptores, Necesidad de interfaces )
�
Velocidad [ menor -> mayor ]
Pointer -> Allocatable -> Estático
Manejo de array Descriptor
! private types for fortran descriptors ( intel compiler )!
! file: iso_c_binding.f90
TYPE, PRIVATE :: for_desc_triplet
INTEGER(C_INTPTR_T) :: extent
INTEGER(C_INTPTR_T) :: mult ! multiplier for this dimension
INTEGER(C_INTPTR_T) :: lowerbound
END TYPE for_desc_triplet
TYPE, PRIVATE :: for_array_descriptor
INTEGER(C_INTPTR_T) :: base
INTEGER(C_INTPTR_T) :: len ! len of data type
INTEGER(C_INTPTR_T) :: offset
INTEGER(C_INTPTR_T) :: flags
INTEGER(C_INTPTR_T) :: rank
INTEGER(C_INTPTR_T) :: reserved1
TYPE(for_desc_triplet) :: diminfo(for_desc_max_rank)T
END TYPE for_array_descriptor
Manejo de array Argumentos en rutina I
SUBROUTINE foo( array )
�
! tamaño implicito ( pasa descriptor )
�
real(dp) :: array(:,:)
�
! puntero ( pasa descriptor )
�
real(dp), pointer :: array(:,:)
�
! allocatable ( descriptor )
�
real(dp), allocatable :: array(:,:)
�
! explicito ( pasa dirección primer elemento, conocidos lower y upper )
�
real(dp) :: array(n,n) ! mejor cuando es estático o allocatable (EFICIENTE)(
Manejo de array Argumentos en rutina II
real(dp), pointer :: array(:,:)
�
real(dp), target :: target_array(n1,n2)
�
array => target_array
CALL foo( n1, n2, array ) ! eficiente
array => target_array(:,1:2)
�
CALL foo( n1, 2, array ) ! eficiente (depende del compilador)
�
array => target_array(1,:)
�
CALL foo( 1, n2, array ) ! ineficiente!! copia a array temporal
! mejor pasar descriptor (puntero)
�
Manejo de array Loops
DO, FORALL y WHERE
do i = 1, n if( value(i) .ne. 0.0_dp ) then inv_value(i) = 1.0_dp / value(i)
�
end ifend do
forall ( i =1:n, value(i) .ne. 0.0_dp ) inv_value(i) = 1.0_dp / value(i)
�
end forall
where ( value .ne. 0.0_dp )
�
inv_value = 1.0_dp / valueend where
o
where(value/=zero) inv_value = value
outer : forall (i=1:n)
�
inner : forall ( j=1:n, i.ne.j )
�
a(i,j)=a(j,i)
�
end forall inner
end forall outer
o
forall(i=1:n,j=1:n,i.ne.j) a(i,j) = a(j,i)
�
elemental function calc_algo(val)
�
...end function calc_algo......forall (i=1:100, index(i)>0)
�
array(i) = cal_algo( array2(i) )
�
end forall
where( index > 0 ) & array = calc_algo( array2 )
�
Side effect
Rutinas elementales
Manejo de array Mas comentarios
Arrays temporales o auxiliaresEvitar allocate/deallocate innecesarios
Tener arrays para uso como temporales
integer, allocatable :: int_array(:,:) ! ndim, npoint
real(dp), allocatable :: real_array(:,:) ! ndim, npoint
Usos útiles del lenguaje
Interfases genéricas para estandarizar códigoFácil modificación en módulos
Rutinas clásicas para claseset, create, allocate, operator(*), assignment(=), add, ...
Uso de módulos:Agrupar rutinas y variables globales
Definiciones (tipos y operadores)
�
Public/Private
Solo acceder a información necesaria
Usos útiles del lenguaje Módulos
Module material_1
use kinds, only: dp, zero
use material_type ! material_t, DENSITY
private
type material_1
...
end type
interface read
interface calculate_forces
public:: material_1, read, initialize, calculate_forces
contains
subroutine read_material_1
subroutine some_local_calculation
subroutine other_local_calculation
elemental subroutine calculate_forces_material_1
End Module material_1
Usos útiles del lenguaje Parámetros
Macros o Enum en C/C++
Manejo de precisióninteger, parameter :: double_k = 8 ! select_real_kind(#)
Z
real(double_k), parameter :: zero = 0.0_double_k
Definición de casos o tiposinteger, parameter :: MATERIAL_ELASTIC= 1, MATERIAL_PLASTIC= 2
select case( material_type )
�
case( MATERIAL_ELASTIC )
case( MATERIAL_PLASTIC )
�
end select
Tamaño de charactercharacter(len=name_size) :: name
Usos útiles del lenguaje Rutinas como argumentos
subroutine calculate( mesh, temperature_function )
�
type(mesh_t), intent(inout) :: mesh
interface
real(8) function temperature_function( element )
�
type(element_t), intent(in) :: element
end function temperature_function
end interface
...
temp(i) = temperature_function( element(i) )
�
...
end subroutine calculate
Reutilización de código (truco?)
�
Usos útiles del lenguaje Códigos portables
Multi – compiladorEvitar llamadas a librerías propias del compilador
Si es necesarias, agrupar para fácil modificación
Uso de macros
Multi – plataformaManejo de ficheros (directorios)M
Llamadas a sistema
Usar opciones de verificación de standard
Cosas importantes Tipos globales (malla, solver,...)
�
Performance : Programación por capas (truco?)
�
type adt_type
real(dp), dimension(:,:,:), allocatable :: Ex, Ey, Ez, Hx, Hy, Hz
integer :: n
end type adt_type
subroutine update_adt(adt)s ! 3D Maxwell equations
type(adt) :: a
do k=1,a%n-1, do j=1,a%n-1; i=1,a%n-1d
a%Hx(i,j,k) = a%Hx(i,j,k)+((a%Ey(i,j,k+1)-a%Ey(i,j,k))*C+(a%Ez(i,j,k)-a%Ez(i,j+1,k))*D)
�
a%Hy(i,j,k) = a%Hy(i,j,k)+((a%Ez(i+1,j,k)-a%Ez(i,j,k))*C+(a%Ex(i,j,k)-a%Ex(i,j,k+1))*D)
�
a%Hz(i,j,k) = a%Hz(i,j,k)+((a%Ex(i,j+1,k)-a%Ex(i,j,k))*C+(a%Ey(i,j,k)-a%Ey(i+1,j,k))*D)
�
end do; end do; end do
do k=1,n-1; do j=1,n-1; do i=1,n-1
a%Hx(i,j,k) = a%Hx(i,j,k)+((a%Ey(i,j,k+1)-a%Ey(i,j,k))*C+(a%Ez(i,j,k)-a%Ez(i,j+1,k))*D)
�
...
end do; end do; end do
end subroutine update_adt
Cosas importantes Tipos globales (malla, solver,...)
Performance : Programación por capas (truco?)P
SUBROUTINE UPDATE77(n,Ex,Ey,Ez,Hx,Hy,Hz)S ! 3D Maxwell equations
INTEGER n, Ex(n), Ey(n), Ez(n), Hx(n), Hy(n), Hz(n)I
DO k=1,n-1; DO j=1,n-1; DO i=1,n-1
Hx(i,j,k) = Hx(i,j,k) + ((Ey(i,j,k+1)-Ey(i,j,k))*C + (Ez(i,j,k)- Ez(i,j+1,k))*D)H
Hy(i,j,k) = Hy(i,j,k) + ((Ez(i+1,j,k)-Ez(i,j,k))*C + (Ex(i,j,k)-Ex(i,j,k+1))*D)H
Hz(i,j,k) = Hz(i,j,k) + ((Ex(i,j+1,k)-Ex(i,j,k))*C + (Ey(i,j,k)-Ey(i+1,j,k))*D)H
END DO; END DO; END DO
DO k=2,n; DO j=2,n; DO i=2,n
Ex(i,j,k) = Ex(i,j,k) + ((Hz(i,j,k)-Hz(i,j-1,k))*C + (Hy(i,j,k-1)-Hy(i,j,k))*D)
�
Ey(i,j,k) = Ey(i,j,k) + ((Hx(i,j,k)-Hx(i,j,k-1))*C + (Hz(i-1,j,k)-Hz(i,j,k))*D)
�
Ez(i,j,k) = Ez(i,j,k) + ((Hy(i,j,k)-Hy(i-1,j,k))*C + (Hx(i,j-1,k)-Hx(i,j,k))*D)
�
END DO; END DO; END DO
END
Cosas importantes Tipos globales (malla, solver,...)
Performance : Programación por capas (truco?)P
Time update : 0.4400
Time update_adt : 0.5520
subroutine update_adt_layer(a)s
type(adt) :: a
call update(a%n,a%Ex,a%Ey,a%Ez,a%Hx,a%Hy,a%Hz)c
End suborutine update_adt_layer
Time update_adt_layer : 0.4401
Cosas importantes Tipos locales (nodo, elemento,...)
type node_t
real(dp) :: coord(3)r
real(dp) :: veloc(3)r
real(dp) :: force(3)r
integer :: id
integer :: flag
end type node_t
Importante:
Orden de las variable (manda tamaño de 1a variable )O
Compilador reordena (cuidado al combinar con C++)C
MEJOR NO USAR ARRAYS DINAMICOS
Cosas importantes Tipos locales (nodo, elemento,...)
type node2_t
real(dp) :: coord(2), veloc(2), force(2)r
integer :: id
end type node2_t
type node2b_t
real(dp) :: coord(2), veloc(2), force(2)r
integer :: id
integer :: flag
end type node2b_t
sizeof( node2_t ) = 56 (2*8) + (1*4) + (1*4) ! multiplo de 8
sizeof( node2b_t ) = 56 (2*8) + (1*4) + (1*4)(
TRUCO! sizeof := size(transfer(<data>,(/ch(1)/)))s
Cosas importantes Tipos locales (nodo, elemento,...)
type node2_t
real(dp) :: coord(2), veloc(2), accel(2)r
integer :: id
end type node_t
type nodeN_t
real(dp), pointer :: coord(:), veloc(:), accel(:)r
Integer :: id
End type
sizeof( node2_t ) = 56 (56)
sizeof( node3_t ) = 80 (80)
sizeof( nodeN_t ) = 112 (224)
type coord_t real(dp), pointer :: coord(:)rend type coord_t
sizeof( coord_t ) = 36 (72)! tamaño del descriptor
Cosas importantes Tipos locales (nodo, elemento,...)
Calculos locales
Ejemplo: LibAtoms -> subroutine AdvanceVerlet()E
Case 1: ! real(8), allocatable :: array(:,:)!
! position (1:3,:), veloc (4:6,:), acceler (7:9,:), mass (10,:)!
real(8), allocatable :: posit(:,:), veloc(:,:), accel(:,:), mass(:)r
Case 2: type atom
real(8) :: pos(3), veloc(3), accel(3), mass
end type
type(atom), allocatable :: array(:) ! Atom
Tiempos: 107 atomoscase 1 = 1.70 s
case 2 = 0.45 s
Programación genérica Templates en Fortran? Truco!
Ejemplo: lista enlazada
Module List_Type_1
use Types, only: type_1
Type List
type(type_1) :: stuff
type(List), pointer :: next => null()t
End Type List
interface add
module procedure add_item_type_1
end interface
Contains
subroutine add_item_type_1(this,item)s
type(List), intent(inout) :: this
type(type_1), intent(in) :: item
...
End Module List_Type_1
Module List_Type
Type List
type(<anything>) :: stuff
type(List), pointer :: &
next => null()n
Contains
procedure :: add
procedure :: delete
End Type List
Contains
Subroutine add(this,item)S
Subroutine delete(this,item)S
End Module List_Type
Programación genérica Templates en Fortran? Truco!
!Module List_Type
!use Types, only: type_1
Type List
type(dummy) :: stuff
type(List), pointer :: next => null()t
End Type List
interface add
module procedure add_item_type_1
end interface
Contains
subroutine add_item_type_1(this,item)s
type(List), intent(inout) :: this
type(dummy), intent(in) :: item
...
! End Module List_Type
File: list.inc
Module List_Type_1
Use Types, only: dummy => type_1
include “list.inc”
End Module List_Type_1
Module List_Type_2
Use Types, only: dummy => type_2
inlclude “list.inc”
End Module List_Type_2
subroutine foo(...)
�
use List_Type_1, List_t1 => List
use List_Type_2, List_t2 => List
...
end subroutine foo
Lectura de datos Namelist
CONTROL
END_TIME= 1.0
TIME_STEP= 0.001
OUTPUT_STEP= 0.1
END CONTROL
MATERIAL
TYPE= ELASTIC
DENSITY= 2650
YOUNG= 3.4E+10
COULOMB= 0.89
...
END MATERIAL
...
...
File: input.dat
Bloque de datos
Lectura opcional de variables
Rutinas de lectura mas simples
No importa mayúsculas/minúsculas
namelist /bloque/ var1, var2
read(unit,nml=bloque,...)r
Lectura de datos Namelist
&CONTROL
END_TIME= 1.0
TIME_STEP= 0.001
OUTPUT_STEP= 0.1
\
&MATERIAL
TYPE= ELASTIC \
&PROPERTIES
! definicion de propiedades
DENSITY= 2650
YOUNG= 3.4E+10
COULOMB= 0.89 \
...
...
File: new_input.dat
real(dp) :: end_time,time_step,output_time
character(name_size) :: type, subtype
real(dp) :: density,young,coulomb
namelist /CONTROL/ end_time,time_step, &
output_time
namelist /material/ type, subtype
read(unit=fid,nml=control,iostat=io_status)r
if(io_status/=0) stop 'error leyendo CONTROL'
read(unit=fid,nml=material,iostat=io_status)r
if(io_status/=0) stop 'error leyendo CONTROL'
select case( type )s
case( MATERIAL_ELASTIC )c
call read_material_elastic()c
...
Combinar Fortran y C++
Módulo intrinsecouse ISO_C_BINDING, only: C_INT_T, C_PTR, ...
Definición de simbolosSubroutine foo() BIND(C,'cpp_foo')S
Typos equivalentestype node_t
sequence
...
end type
InterfacesDefinición de argumentos (value o reference)DCambio de forma de argumentos (siempre punteros) Truco!
TrucoArray de punteros en C/C++: integer(C_INTPTR_T), dimension(n) :: c_array
Compiladores Notas importantes
En debug activar warnings (-check all -warn all)ECopia de variables
Mejor busqueda de bugs
...
Optimización para arquitecturaMayor performance
Optimización mas agresiva
Reporte de optimización ( intel compiler )RAyuda a entender que está pasando y buscar mejoras
• Unroll de loops
• Inline de rutinas
• ...
Compiladores Reporte de optimización
Ejemplo de aplicación
...
subroutine try_ia_work(a,size)s
use type_vars !dp, ip, one, two, three
implicit none
integer(ip) :: size
real(dp),dimension(3,size) :: a
integer(ip) :: i
do i=1,size
a(1,i) = one
a(2,i) = two
a(3,i) = three
end do
do i=1,size
a(3,i) = a(1,i) + a(2,i)a
end do
end subroutine try_ia_work
...
Compilation:
ifort -O3 -ipo -static -xT -opt-report-file=opt.txt ...
Time try_ia_work = 21.0356 s
...
6 ntimes = 1000
7 size = 1000000
...
143 cpu_time(t1)
�
144 do i = 1, ntimes
145 call try_ia_work(a,size)
�
146 end do
147 call cpu_time(t2)
�
...
Compiladores Reporte de optimización opt.txt
...
> for_cpusec_t(EXTERN)
> __resetsp_inlined(EXTERN)
> INLINE: tryi_mp_try_ia_work_(16) (isz = 52) (sz = 59 (24+35))
> _alloca(EXTERN)
> __getsp_inlined(EXTERN)
> for_cpusec_t(EXTERN)
...
test1.F90(144:1144:1):VEC:MAIN__: loop was not vectorized: not inner loop
test1.F90(145:6145:6):VEC:MAIN__: loop was not vectorized: not inner loop
loop was not vectorized: vectorization possible but seems inefficient
loop was not vectorized: vectorization possible but seems inefficient
test1.F90(150:1150:1):VEC:MAIN__: loop was not vectorized: existence of vector dependence
test1.F90(159:1159:1):VEC:MAIN__: loop was not vectorized: not inner loop
LOOP WAS VECTORIZED
loop was not vectorized: not inner loop
loop was not vectorized: vectorization possible but seems inefficient
...
<test1.F90;145:145;hlo_unroll;MAIN__;0>
Loop at line 145 completely unrolled by 3
Loop at line 145 unrolled without remainder by 2
...
Compiladores Reporte de optimización
Ejemplo de aplicación
...
subroutine try_ia_work(a,size)s
use type_vars !dp, ip, one, two, three
implicit none
integer(ip) :: size
real(dp),dimension(3,size) :: a
integer(ip) :: i
forall(i=1:size) !do i=1,size
a(1,i) = one
a(2,i) = two
a(3,i) = three
end forall !end do
forall(i=1:size) !do i=1,size
a(3,i) = a(1,i) + a(2,i)a
end forall !end do
end subroutine try_ia_work
...
Time try_ia_work = 10.6007 s
new_opt.txt
...
test1.F90(144:1144:1):VEC:MAIN__:
loop was not vectorized: not inner loop
test1.F90(145:6145:6):VEC:MAIN__:
FUSED LOOP WAS VECTORIZED
test1.F90(150:1150:1):VEC:MAIN__:
loop was not vectorized: existence of vector dependence
...
Fused Loops: ( 176 176 )F
Fused Loops: ( 145 145 )F
Fused Loops: ( 145 145 )F
...
El futuro Fortran 2003/2008
Clases y subclases
Tipos parametrizados
Nuevo tipo intrinseco: BITS
Mácros inteligentes
Paralelización en lenguaje (co-array)
�
Módulos parametrizados
Mas funciones matemáticas
Mayor flexibilidad de punteros