Design Details

The interface makes use of the iso_c_binding introduced in Fortran 2003 standard. This allows interoperability between C and Fortran. Meanwhile, Cython can call C functions based on given C declarations. Hence, the Fortran subroutines can be called as a C function based on its declarations in Cython.

One of the biggest benefit of using Cython instead of f2py is that C declarations conform to C standard and allow us to avoid undefined behaviors.

pyPDAF wrapper to Fortran call

In pyPDAF.PDAF, .pxd files define Fortran subroutines as external C functions. For example, the following Fortran subroutine interface:

subroutine c__PDAF_get_state(steps, time, doexit, &
                             U_next_observation, &
                             U_distribute_state, &
                             U_prepoststep, flag) bind(c)
      ! Flag and number of time steps
      INTEGER(c_int), INTENT(inout) :: steps
      ! current model time
      REAL(c_double), INTENT(out) :: time
      ! Whether to exit from forecasts
      INTEGER(c_int), INTENT(inout) :: doexit
      ! Status flag
      INTEGER(c_int), INTENT(inout) :: flag
      ! Provide time step and time of next observation
      procedure(c__next_observation_pdaf) :: U_next_observation 
      ! Routine to distribute a state vector
      procedure(c__distribute_state_pdaf) :: U_distribute_state 
      ! User supplied pre/poststep routine
      procedure(c__prepoststep_pdaf) :: U_prepoststep
end subroutine c__PDAF_get_state

can be translated into:

cdef extern void c__pdaf_get_state (int* steps, double* time, int* doexit,
                                    void (*c__next_observation_pdaf)(int*,   int*,
                                                                     int*, double*),
                                    void (*c__distribute_state_pdaf)(int*, double*),
                                    void (*c__prepoststep_pdaf)(int*, int*, int*,
                                                                int*, int*, double*,
                                                                double*, double*, int*),
                                    int* flag);

Here, all arguments in the C declaration are pointers because Fortran is by default pass by reference, and user-defined procedures are provided with explicit interface in the C declaration. This is a good benefits compared to the f2py approach, where more tuning is required in the signature file to enable user-supplied external functions.

To provide a more Pythonic interface, the above C call is wrapped by a Python definition:

get_state (int steps, int doexit,
           py__next_observation_pdaf,
           py__distribute_state_pdaf,
           py__prepoststep_pdaf,
           int flag
          )

such that the user can have:

import pyPDAF.PDAF as PDAF
PDAF.get_state(...)

Treatment of arguments

There are input and output arguments in Fortran subroutines. In pyPDAF, input arguments with intent(in) or intent(inout) are declared in Python function arguments, and arguments with intent(inout) or intent(out) are treated as returned values. For example, double* time is not present in the get_state Python function argument as it is an intent(out) argument. They are defined as cdef double time before calling c__pdaf_get_state.

For some detailed explanation of the following sections, it is recommended to read Cython documentation for memory view, and syntax.

Treatment of scalar arguments

In Cython, int steps declares the argument as integer value, which can be directly passed to C function calls by accesing its reference using &.

Treatment of array arguments

In pyPDAF, all input arrays are numpy arrays. These arrays have to be convert to memoryview in Cython and then passed their reference to Fortran regardless of the shape of the array. To avoid problems with the different treatment of C ordering and Fortran ordering of multidimensional arrays, they’re all raveled using Fortran order, e.g. np.ravel(a, order='F').

The same applies to the returned array, where we reshape the memoryview back to its original shape using np.reshape(a, order='F').

Treatment of procedure arguments (user-supplied routines)

One of the most important aspects of PDAF is the user-supplied routines. This provides the flexibility to different models and observations. In pyPDAF.UserFunc, C functions are defined and declared with dummy Python functions. For example,

cdef void c__init_ens_pdaf (int* filtertype, int* dim_p,
                            int* dim_ens, double* state_p,
                            double* uinv, double* ens_p,
                            int* flag
                           );

the above C function calls the dummy Python function defined as:

def py__init_ens_pdaf(filtertype, dim_p, dim_ens, state_p, uinv, ens_p, flag):
    raise RuntimeError('...Wrong py__init_ens_pdaf is called!!!...')

Hence, pyPDAF can raise an error if required user-supplied function is not given. In pyPDAF.PDAF, the user-supplied function has to be given to pyPDAF.UserFunc. Taking the previous example for c__PDAF_get_state, we have the following treatment:

import pyPDAF.UserFunc as PDAFcython
cimport pyPDAF.UserFunc as c__PDAFcython
# giving user-supplied functions to UserFunc
PDAFcython.py__next_observation_pdaf = py__next_observation_pdaf
PDAFcython.py__distribute_state_pdaf = py__distribute_state_pdaf
PDAFcython.py__prepoststep_pdaf = py__prepoststep_pdaf
# call the actual functions
c__pdaf_get_state (&steps,
                   &time,
                   &doexit,
                   c__PDAFcython.c__next_observation_pdaf,
                   c__PDAFcython.c__distribute_state_pdaf,
                   c__PDAFcython.c__prepoststep_pdaf,
                   &flag
                  )

Treatment of array input in user-supplied routines

Because user-supplied routines are given as C pointers, we need to translate them into numpy array for Python function in a similar manner as Treatment of array arguments, e.g. state_p_np = np.asarray(<double[:np.prod((dim_p[0]))]> state_p).reshape((dim_p[0]), order='F').

Caveat
  1. Fortran subroutines use pass by reference by default

  2. Python always pass/assignment by reference The above two points can lead to undefined behavior. For example, in the user-supplied routines, we have Fortran interface:

subroutine c__collect_state_pdaf(dim_p, state_p) bind(c)
  use iso_c_binding, only: c_double, c_int
  implicit none
  ! pe-local state dimension
  integer(c_int), intent(in) :: dim_p
  ! local state vector
  real(c_double), intent(inout) :: state_p(dim_p)
end subroutine c__collect_state_pdaf

If we provide the following Python routine:

def collect_state_pdaf(model, assim_dim, dim_p, state_p):
    state_p = model.field_p.reshape(assim_dim.dim_state_p, order='F')
    return state_p

The Python function changes the reference of state_p as the reshape method generates a new numpy array instance. Hence, it is necessary to add a layer of safety in pyPDAF.UserFunc, where we pass the return value to the original input numpy array. We further safeguard this behavor with an assertion.

state_p_np_tmp = py__collect_state_pdaf(dim_p[0], state_p_np)
state_p_np[:] = state_p_np_tmp[:]
cdef double[::1] state_p_view = state_p_np.ravel(order='F')
assert state_p == &state_p_view[0], 'reference (memory address) of state_p has changed in c__collect_state_pdaf.'

Note

In pyPDAF/fortran, pyPDAF provides bind(C) wrapper for common PDAF public subroutines with iso_c_binding arguments. Some subroutine uses assumed size arrays. The interoperability with C is recently supported in Fortran 2018 standard via ISO_Fortran_binding.h in C. To avoid unexpected issues for these new features, the dimension sizes are still given explicitly in the Fortran wrapper.

The same issue applies to allocatable array in derived types in PDAFomi feature, where obs_f and obs_l are used. Hence, an array of the derived type are defined in the fortran wrapper with setters for many class members. These can be changed later when Fortran 2018 standard is better supported.