xy124/Parflow_debugging.md

## Parflow_debugging.md

      
    Raw
  

              Parflow_debugging.md
            
          
    Compiling Parflow


When building with configure specially if not using the gnu compilers this might pose problems as CXX compiling is not introduced correctly in the configure script. As PF's configure is going to be deprecated soon my recommendation is to use cmake.
Still cmake has errors too especially when cross compiling for ibm's blue gene / Q architecture.

Debugging


compile with -g -fbounds-check (second option is especial of interest for the fortran compiler)


start in debugger (gdb $PARFLOW_DIR/bin/parflow <ProblemName>)


simplify the simulation problem as much as possible!


Start from a working simulation (e.g. Testcases) and add the things one by one you want in your simulation


Use the PFBChecker to check all inputs

check dimensions
subgrids
...


Testdriven development when creating input files!


When having segmentation faults this list might be quite helpful: https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors


PF-CLM couplings


CLM root zone

apply rootzone changes to

pfsimulator/clm/drv_clmini.F90 (Add/Remove lines of dzlak and zlak)
pfsimulator/clm/clm_varpar.F90 (And change nlevsoi and nlevlak)
the tcl: pfset Solver.CLM.RootZoneNZ          <your rootzone value>


PF has no units, CLM is in mm/s (for H20 fluxes)

=> hardwired in clm.F90 is a dt*3.6 assuming PF is actually in m/hr
=> So any PF-CLM simulation with PF units differents than m/hr should change that value AND the CLM passing back to PF, in pf_couple.F90


When the input files get to large, it segfaults. We tried e.g. to put all forcings of one year into one file resulting in pfbs of 253 MB. This results in segfaults while distribution of the input files and simulation. 64 MB per input file works without problems.


Format of Met File 1D


Should be written in float. It is read by this function: amps_SFBCast(amps_CommWorld, metf1d, invoice);
following the invoice given by

amps_NewInvoice("%d%d%d%d%d%d%d%d", &sw, &lw, &prcp, &tas, &u, &v, &patm, &qatm)
This python line should work:
forc.to_csv(filenameOUT,float_format='%6f', sep='\t', header=False, index=False)
End of File bug


Error Message:

At line 88 of file parflow/pfsimulator/clm/drv_readvegtf.F90 (unit = 2, file = 'drv_vegm.dat')
Fortran runtime error: End of file


Solution:  sed -i -e '$a\' LCC.dat or better apply it to all .dat files

restart [WIP]:


see parflow-manual.pdf
CLM restart file: written every day or override everytime step
then to start from a restart, modify 2-> 1 in the drv_clmin in two locations:

startcode       1                             1=restart file,2=defined
clm_ic          1                             1=restart file,2=defined


in the .tcl-file: Solver.CLM.IstepStart should be TimingInfo.StartCount - 1 because PF is implicit from t to t+1 and CLM is explicit, so PF takes CLM output from previous time step;
=> if PF has a dumpintervall of 24hr, that doesn't work, because we may miss a PF value.
=> So either rename PF or CLM to cheat, or output PF at 1hr
Read outputs PF & CLM at 1 hour and write daily restart


Conclusion: If planning to restart ParflowCLM simulations, parameters must be set extremely carful!


Checklist to restart


again: see parflow-manual.pdf.

Specific storage

=> specific storage is applied over all pressure values (including < 0)! probably
to avoid non linear effects (if h>0 do ...). In case of doubts, set specstor to 0 in tcl
pfdist


DZ  and rx, ry, rz for subgrids in pfb files are reset on pfdist.
We figured out that for the clm input files it does not matter how we set DZ. Still it is much cleaner to set it to the timestep (e.g. 0.5 for 0.5 hr) when disting clm forcing files
pfdist is normally compiler independent. So if having multiple versions of parflow compiled with different compilers it is not necessary to redo teh pfdist when changing the executable.

Scaling


BlueGene:

we found that our MAO domain (ca 2.1M cells) runs best
on 512 cores (4074 cells/core) iff writing outputs

(Otherwise even more cores can be used.)


Grids, Subgrids ...

Everything is a vector...


Parameters that are for example defined by Geom in the tcl are put into a vector internally

Changing Vectors


Each process holds one subgrid of each variable

ForSubGridI - loops are run in parallel on multiple processes implicitly TODO: Look this up


Loops can be executed on

all cells
only on the inner cells that are no boundary condition cells: (GrGeomInLoop)

if you want to calculate the water storage this is what you want
patches are boundary condition cells ;)


Example:

// taken from nl_function_evel.c:NlFunctionEval() (line 109...)
  int is;
  Vector * vector;
  Subgrid * subgrid;
  Grid * grid = VectorGrid(pressure);
  ForSubgridI(is, GridSubgrids(grid))
  {
    subgrid = GridSubgrid(grid, is);

    d_sub = VectorSubvector(density, is);
    
    // Subgridgrid resolution
    rx = SubgridRX(subgrid);
    ry = SubgridRY(subgrid);
    rz = SubgridRZ(subgrid);
    
    // Subgrid start index (shows where this grid is located in the real grid)
    ix = SubgridIX(subgrid);
    iy = SubgridIY(subgrid);
    iz = SubgridIZ(subgrid);

    // amount of cells in the Subgrid
    nx = SubgridNX(subgrid);
    ny = SubgridNY(subgrid);
    nz = SubgridNZ(subgrid);

    // their spacing...
    dx = SubgridDX(subgrid);
    dy = SubgridDY(subgrid);
    dz = SubgridDZ(subgrid);

    vol = dx * dy * dz;

    nx_f = SubvectorNX(f_sub);
    ny_f = SubvectorNY(f_sub);
    nz_f = SubvectorNZ(f_sub);

    nx_po = SubvectorNX(po_sub);
    ny_po = SubvectorNY(po_sub);
    nz_po = SubvectorNZ(po_sub);

    dp = SubvectorData(d_sub);
  

    GrGeomInLoop(i, j, k, gr_domain, r, ix, iy, iz, nx, ny, nz,
    {
      ip = SubvectorEltIndex(f_sub, i, j, k);
      ipo = SubvectorEltIndex(po_sub, i, j, k);
      io = SubvectorEltIndex(x_ssl_sub, i, j, grid2d_iz);      

      fp[ip] = (sp[ip] * dp[ip] - osp[ip] * odp[ip]) * pop[ipo] * vol *
           del_x_slope * del_y_slope * z_mult_dat[ip];
    });
  }
Ghost cells


are synchronized between different subgrids once per iteration and are at the subgrids borders.

Definitions ... TODO


Patches are Boundary conditions....


Grid


Subgrid


Region


Subregion


Vector


Subvector


Matrix


Submatrix


Theory:

it seems that the Sub_ is what every process holds of the type _ . When I loop over all e.g. Subgrids every process will just loop over the ones it holds.
vectors have a pointer to their grid
Grids hold subgrids and the background grid (latter holds all points the subgrids lie in)
Vector holds grid, subvectors and pointer to subgrids that describes the grids of the different vectors
subgrids are subregions


TODO: when to call Vector update? only if shape changed? No seems to need to be called everytime after you changed subgrid data!
 /* Pass pressure values to neighbors.  */
handle = InitVectorUpdate(pressure, VectorUpdateAll); 
FinalizeVectorUpdate(handle);


MPI Bus Error (7)

Cannot send after transport endpoint shutdown
Signal: Bus error (7)
Signal code: Non-existant physical address (2)


Are you running out of disc quota?
Did you compile with esoteric flags (as -pg for profiling)?

NetCDF and cmake

Symptoms:
undefined reference ....
Reason/Solution:
there are cmake packages on some systems called

NETCDF
netCDF
NetCDF
leading to the according variables in the build system
to be clean, there now is one place in the root CMakeLists.txt where we look for all of them and then write all into NETCDF_FOUND, NETCDF_LIBRARIES and NETCDF_INCLUDE_DIRS so PLEASE use these and don't do another find_package(nEtCdF) somewhere

NetCDF - files that are half empty


sometimes it happens when trying to compile netcdf-c with cmake that the finnished package does not write correctly.
I have completely no idea why this occurs. See parflow/parflow#97
The solution is to use configure and not cmake to build netcdf. A script doing this can be seen here: https://github.com/xy124/containered-parflowvr/blob/cc5a6bb85b9823aee3157728cd0197cd068506ae/dockerhome/recipes/build-from-recipes.sh#L34
Further you should not link against the static libnetcdf.a as than you need to link against things like curl and hdf5 too...
try

CC=mpicc CPPFLAGS=-I${PREFIX}/include LDFLAGS=-L${PREFIX}/lib \
  ./configure --enable-parallel-tests --prefix=${PREFIX}
(^^ as on https://www.unidata.ucar.edu/software/netcdf/docs/getting_and_building_netcdf.html#build_parallel but without disabling shared)