arbennett/Summary of the CRB hydrologic simulation with SUMMA.md

## Summary of the CRB hydrologic simulation with SUMMA.md

      
    Raw
  

              Summary of the CRB hydrologic simulation with SUMMA.md
            
          
Preprocessing

Forcing data

Disaggregation

The forcing data is generated based on the 1/16 degree Livneh forcing dataset. The original Livneh dataset is obtained at HYDRA:/raid/blivneh/forcings/outputs/CONUS/asc.v.1.2.b/data_${latitude}_${longitude}. A copy of the data has been placed at HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/asc.v.1.2.b. The daily forcing data is disaggregated to a hourly basis by running VIC_4.1.2. The results of disaggregation are placed at HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/workdir.
Converting to SUMMA inputs

The hourly gridded forcing data is converted to the HRU-based grid of the SUMMA model using the areal weights. The areal weights are calculated using GIS tool (QGIS). A CSV files containing the areal weights is saved as HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/summa_forcing/hru_grid_frac.csv. The python scrpt convert_grid_forcing_2_hru.py perform the conversion from binary output of VIC to netcdf automatically.
# -*- coding: utf-8 -*-
"""
Created on Thu Sep 22 21:29:06 2016

@author: Michael Ou
"""

import pandas as pd
import numpy as np
from netCDF4 import Dataset, date2num
from datetime import datetime
import argparse
import os


#%% constants
seconds_per_hour = 3600.0
timeunit = 'hours since 1990-01-01 00:00:00'
# define column names for gridded dataframe
col_names  = 'year', 'month', 'day', 'hour', 'pptrate', 'airtemp', 'windspd', 'spechum', 'SWRadAtm', 'LWRadAtm', 'airpres'
# define column formats for gridded dataframe
col_format = 'i4',   'i4',    'i4',   'i4',  'f4',      'f4',      'f4',      'f4',      'f4',       'f4',       'f4'
# define endian (system-dependent)
endian = 'little' # could be big-endian

#%% set arguments
parser = argparse.ArgumentParser(description = 'Convert gridded forcing data for SUMMA HRU')
parser.add_argument('-f', '--fracfile', help = 'file describing hru and grid intersection', required = True)
parser.add_argument('-d', '--datadir',  help = 'directory of gridded forcing data', required = True)
parser.add_argument('-o', '--outdir',   help = 'directory of output files', required = True)

# assign argument values
args      = parser.parse_args()
frac_file = args.fracfile
data_dir  = args.datadir
out_dir   = args.outdir
if not os.path.exists(out_dir):
    os.makedirs(out_dir)
os.chdir(out_dir)

#%% function read gridded forcing data
def read_vic_bin_forcing(vic_file, col_names, col_format, endian = 'little'):
    # open data file
    with open(vic_file, "rb") as f:
        # read 4 header identifiers
        id_header = [int.from_bytes(f.read(2), byteorder = endian, signed = False) for i in range(4)]
        # get the number of header bytes
        num_header_byte = int.from_bytes(f.read(2), byteorder = endian, signed = False)
        # read data in the binary file and output as a pandas DataFrame
        dt = np.dtype({'names': col_names, 'formats': col_format}, align=True)
        f.seek(num_header_byte,  0)
        df = pd.DataFrame(np.fromfile(f, dt))
        return df


#%% create ncdf file
def to_nc(n, times, df, ihru, nc_name):
    '''
    Inputs:
        n:       number of rows
        time:    time stamps as hours since yyyy-mm-dd HH:MM:SS
        df:      dataframe including pptrate, airtemp, windspd, spechum, airpres, LWRadAtm, SWRadAtm
        ihru:    index of HRU
        nc_name: name of the output netcdf file
    Outputs:
        write a netcdf file named nc_name
    '''

    # each nc file contain one HRU data
    rootgrp = Dataset(nc_name, "w", format="NETCDF4_CLASSIC")

    # define dimensions
    dim_hru  = rootgrp.createDimension("hru",  None)     # unlimited dim for concatanation
    dim_time = rootgrp.createDimension("time", n)       # the time dimension

    # define variables
    time = rootgrp.createVariable("time","f4",("time",),zlib=True)
    time.units = timeunit
    time[:] = times

    data_step = rootgrp.createVariable("data_step","f4",zlib=True)
    data_step.units = 'seconds'
    data_step.assignValue(seconds_per_hour)

    hruId = rootgrp.createVariable("hruId","i4",("hru",),zlib=True)
    hruId[:] = (np.array(ihru))

    pptrate = rootgrp.createVariable("pptrate","f4",("hru","time",),zlib=True)
    pptrate[:] = df['pptrate'].values.reshape([1,n])

    airtemp = rootgrp.createVariable("airtemp","f4",("hru","time",),zlib=True)
    airtemp[:] = df['airtemp'].values.reshape([1,n])

    windspd = rootgrp.createVariable("windspd","f4",("hru","time",),zlib=True)
    windspd[:] = df['windspd'].values.reshape([1,n])

    spechum = rootgrp.createVariable("spechum","f4",("hru","time",),zlib=True)
    spechum[:] = df['spechum'].values.reshape([1,n])

    airpres = rootgrp.createVariable("airpres","f4",("hru","time",),zlib=True)
    airpres[:] = df['airpres'].values.reshape([1,n])

    LWRadAtm = rootgrp.createVariable("LWRadAtm","f4",("hru","time",),zlib=True)
    LWRadAtm[:] = df['LWRadAtm'].values.reshape([1,n])

    SWRadAtm = rootgrp.createVariable("SWRadAtm","f4",("hru","time",),zlib=True)
    SWRadAtm[:] = df['SWRadAtm'].values.reshape([1,n])

    return rootgrp.close()

#%% calculate the areal average of forcing data
if not data_dir.endswith('/'):
    data_dir = data_dir + '/'

# open fraction file
df_frac = pd.read_csv(frac_file)
df_frac.lat = df_frac.lat.astype(str)
df_frac.lon = df_frac.lon.astype(str)
df_frac['latlon'] =  df_frac.lat + '_' + df_frac.lon

# read a sample and extract time infomation
df_sample = read_vic_bin_forcing(data_dir + 'full_data_' + df_frac.latlon[0], col_names, col_format, endian)
nrow = df_sample.shape[0]

# present time in netcdf preferred unit
df_times = pd.to_datetime(df_sample.iloc[:,:4])
times = date2num(df_times.astype(datetime), units=timeunit)
# read a sample

# group by hru
group_hru = df_frac.groupby('hruID')
hru_index = group_hru.hruID.first().values


for hru in hru_index:
    # check if nc file exsit then continue
    if  os.path.isfile(str(hru) + '.nc'):
        continue
    hru_panel = pd.Panel({
                    'frac' + str(row.Index) : read_vic_bin_forcing(data_dir + 'full_data_' + row.latlon, col_names, col_format, endian).iloc[:,4:] * row.frac
                    for row in group_hru.get_group(hru).itertuples() })
    df_hru = hru_panel.sum(axis = 0)
    to_nc(nrow, times, df_hru, hru, str(hru) + '.nc')

print('finish !!!')
The HRU-based forcing data are then saved as HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/summa_forcing/outputs_YYYY/YYYYMM.nc. ncdumping one of the nc files generates:
netcdf \199001 {
dimensions:
        time = UNLIMITED ; // (744 currently)
        hru = 11723 ;
variables:
        float time(time) ;
                time:units = "hours since 1990-01-01 00:00:00" ;
        float data_step ;
                data_step:units = "seconds" ;
        int hruId(hru) ;
        float pptrate(time, hru) ;
        float airtemp(time, hru) ;
        float windspd(time, hru) ;
        float spechum(time, hru) ;
        float airpres(time, hru) ;
        float LWRadAtm(time, hru) ;
        float SWRadAtm(time, hru) ;

// global attributes:
                :NCO = "\"4.6.1-alpha01\"" ;
                :nco_openmp_thread_number = 1 ;
}

Other SUMMA input files

A copy of the SUMMA parameter files is placed at HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/explicitEuler. The master file is filemanager_columbia.txt which includes the file paths of all the file used by SUMMA. If you move this folder to another location, you will need to specify the directories in the master file.
Most parameter values used in this implementation are borrowed from other hydrologic models such as Noah-MP. The distributuiin of the parameters are determined by the land cover and soil types. The meanings of the variables are mostly self-explained in the file. The detailed of the parameter specification will be explained in the paper.
Run SUMMA

Compile SUMMA on HYAK

The source code repository of SUMMA is https://github.com/NCAR/summa.git To compile SUMMA on HYAK, follows the steps below:

Load the needed module

netcdf_fortran+c_4.4.4-icc_17
icc_17-impi_2017


Clone the repository
Edit makefile

F_MASTER = your_summa_directory
FC = ifort
NCDF_PATH = /sw/netcdf-fortran+c-4.4.4_icc-17
LAPK_PATH = ${MKLROOT}
LIBLAPACK = -L$(LAPK_PATH)/lib/intel64 -lmkl_rt


(Optional) Apply fix https://github.com/luckymichael/summa/tree/bugFix/lineWidth
Compile (type make)

Parallelization

The Poor Man's parallelization can be easily activated in SUMMA by automatically subset the simulation. Subsetting can be acomplished in two running modes.


Single-HRU mode

It can be avtivated by the command line option -h followed by the HRU index.


GRU-Parallelization mode

It can be avtivated by the command line option -g followed by the starting GRU index and the number of GRU to run.


The GRU-Parallelization mode is mainly used in this implementation. A bash script (HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/explicitEuler/runGRU_v5.sh) is used to control the runs including subset and restart.
#!/bin/bash

# Created to run a subset of Columbia Model

# print hostname when needed
hostname=`hostname`

settingdir=$1  # setting directory
istart=$2      # starting GRU index
ngru=$3        # number of the GRUs of this subset
outputdir=$4   # output directory
bfnode=yes     # flag if this is a backfill node

cd $settingdir
# check arguments
if [ -z $ngru ]; then ngru=1; fi
if [ -z $outputdir ]; then outputdir=$settingdir; fi

# check the index and number of GRU
maxgru=11723
if [ $istart -gt $maxgru ]; then exit; fi
if [ $((istart + ngru -1 )) -gt maxgru ]; then
  ngru=$(( maxgru-istart+1 ))
fi

# summa executable
summaexe=/usr/lusers/mgou/uwhydro/summaProj/summa/bin/summa_columbia_o2.exe


## 1 create input files and output directory
iend=$((istart+ngru-1))
runname=`printf %05i $istart`'-'`printf %05i ${iend}`
workdir=$outputdir/$runname
mkdir -p $workdir
#rm $workdir/*
echo output dir: $workdir

### file manager
fileman=$workdir/summa_filemanager.txt
cp -f $settingdir/filemanager_columbia_template.txt $fileman
sed -i "s|:outputdir:|$workdir|g" $fileman


## 2 check interrupted run
# check if any restart file
nrestart=`find $workdir/ -name "*summaRestart*" -size +1 | wc -l`
echo 'found number of restart files: '$nrestart
if [ $nrestart -gt 1 ]; then
  # in case it was interrupted when writing the last restart file, 
  # we use the second last restart file when their sizes are different. 
  initfile1=`ls -1 $workdir/*__summaRestart_* | tail -1`
  initfile2=`ls -1 $workdir/*__summaRestart_* | tail -2 | head -1`
  size1=`find $initfile1 -printf '%s'`
  size2=`find $initfile2 -printf '%s'`
  echo $initfile1 is $size1 byte      $initfile2 is $size2 byte
  if [ $size1 -lt $size2 ]; then initfile=$initfile2; else initfile=$initfile1; fi
  # extract the year/month/day/hour
  stmp=${initfile#*Restart_}
  iyy=${stmp:0:4}
  imm=${stmp:5:2}
  idd=${stmp:8:2}
  ihh=${stmp:11:2}
  ihh='00'
  newinitfile=reStart/ini${runname}_${iyy}-${imm}-${idd}.nc
  # fill the netcdf file with the subset output
  $settingdir/catRestart.sh $istart $ngru $initfile $newinitfile
else
  iyy=1915
  imm=01
  idd=01
  ihh=00
  newinitfile=ini_crb.nc
fi

start_time="${iyy}-${imm}-${idd} ${ihh}:00"
echo "starting time is $start_time"
newdecision=decision/decision_${runname}.txt
sed "s/:startTime:/$start_time/g" decision_template.txt > $newdecision
sed -i "s|:decision:|$newdecision|g"  $fileman
sed -i "s|:init:|$newinitfile|g" $fileman

## 4 the forcing file list
newforcelist=forcingList/summa_zForcingFileList_${runname}.txt
echo '! new forcing data' > $newforcelist
for (( iy=iyy; iy<=2011; iy++ )); do
  for im in {01..12}; do
    echo "'"outputs_$iy/$iy$im.nc"'" >> $newforcelist
  done
done
sed -i "s|:forcing:|$newforcelist|g" $fileman

## 5 run model
echo "GRU $istart to $iend running summa at `date`"
$summaexe -m $fileman -g $istart $ngru -p m -r y  #&> $workdir/exprun${iyy}${imm}.log 
A script HYAK:/civil/hydro/michaelou/summaProj/summaData/blivneh/explicitEuler/00_create_job_list.sh is used to created the job list.
# failed GRU index
failedGRU="8523 8670 8689 8737 9039 9862 11130 11132 11204 11484 11724"
# the first GRU index
GRU1=1
maxGRU=11723
# the size of ech subset
nGRU=20
rundir=`pwd`
resultdir="/gscratch/hydro/mgou/summarun/final_run"
> job.list
for GRU2 in $failedGRU; do
  echo $GRU2
  for ((iGRU=GRU1; iGRU<GRU2; iGRU+=nGRU )); do
    break
    kGRU=$(( iGRU + nGRU - 1 ))
    if [ $kGRU -ge $GRU2 ]; then
      n=$(( GRU2 - iGRU ))
    else
      n=$nGRU
    fi
    echo "$rundir/runGRU_v5.sh $rundir $iGRU $n $resultdir &> $rundir/logs/f`printf %05i $iGRU`.log" >> job.list
  done
  if [ $GRU2 -le $maxGRU ]; then
    echo "$rundir/runGRU_v5.sh $rundir $GRU2 1 $resultdir &> $rundir/logs/f`printf %05i $iGRU`.log" >> job.list
  fi
  GRU1=$(( GRU2 + 1 ))
done
After the job.list is created, we can load it to the job databse of parallel-sql by:
cat job.list | psu --load
The job can be submitted to the batch or backfill queue. A sample job script:
#!/bin/bash
##
## !! _NEVER_ remove # signs from in front of PBS or from the line above !!
##
## RENAME FOR YOUR JOB
#PBS -N summa-hydro

## EDIT FOR YOUR JOB
## For 16 core nodes.
## Nodes should _never_ be > 1.
#PBS -l nodes=1:ppn=16

## WALLTIME DEFAULTS TO ONE HOUR - ALWA YS SPECIFY FOR LONGER JOBS
## If the job doesn't finish in 10 minutes, cancel it
#PBS -l walltime=99:59:59

## EDIT FOR YOUR JOB
## Put the STDOUT and STDERR from jobs into the below directory
#PBS -o /usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/summa_run/logs
## Put both the stderr and stdout into a single file
#PBS -j oe

## EDIT FOR YOUR JOB`wc -l < $PBS_NODEFILE`
## Sepcify the working directory for this job bundle
#PBS -d /usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/summa_run

HYAK_SLOTS=`wc -l < $PBS_NODEFILE`

module load parallel_sql
module load netcdf_fortran+c_4.4.4-icc_17
module load icc_17-impi_2017
# load netcdf tools
module load contrib/hydro

export LD_LIBRARY_PATH=/sw/netcdf_intel/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
echo "using $HYAK_SLOTS processes"
parallel-sql --sql -a parallel --exit-on-term -j $HYAK_SLOTS
This file can be submitted for the number of times according the nodes that are available for computation.
Postprocessing

Concatenate SUMMA outputs

In my set up, my outputs are /gscratch/hydro/mgou/summarun/final_run/XXXXX-XXXXX/explicitEuler_YYYY-10-01-00_GXXXXX-XXXXX_24.nc where XXXXX is the GRU index and YYYY is the year. Because there are some failed GRU, I used xarray to concat the outputs to a single netCDF file because xarray will fill the missing data when concatenating files. There are two levels of concatenations. First, I concatenate outputs over time in each batch:
module load epel_packages
cmd=''
rundir=/gscratch/hydro/mgou/summarun/final_run
for batch in `ls -1 $rundir`; do
  n=`ls -1 $rundir/$batch/*_24.nc | wc -l`

  if [ $n -gt 0 ]; then
    ncout="/gscratch/hydro/mgou/summarun/aggregate/batch_output/${batch}.nc"
    if [ -f $ncout ]; then
      #echo "Output file found at $ncout"
      continue
    fi
    echo "Concatenate $batch"
    cmd+="cd $rundir/$batch && ncrcat -O -o $ncout "'*_24.nc \n'
  else
    echo "Found $n files in $batch, bad batch"
    ihru0=${batch:(0):5}
    ihru1=${batch:(6):5}
    nhru=`bc <<< "$ihru1 - $ihru0 + 1"`
    echo /usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/explicitEuler/runGRU_v5.sh \
         /usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/explicitEuler $ihru0 $nhru \
         /gscratch/hydro/mgou/summarun/single_hru \
         '&> /usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/explicitEuler/logs/failed_hru/s'$ihru0'.log' \
         >> failed_batch.job.list

  fi
  #if [ "$batch" > "00100" ]; then break; fi
done
echo -e $cmd | parallel -j 16 #--dry-run
To concatenate over batches:
import xarray as xr
nc_all = xr.open_mfdataset('*.nc', concat_dim='hru')
nc_all.to_netcdf('../summa_crb_all_daily.nc')
Streamflow Routing

The mizuRoute (https://github.com/NCAR/mizuRoute) model is used to perform streamflow routing for the runoff including surface runoff scalarSurfaceRunoff_mean, subsurface runoff scalarSoilBaseflow_mean and baseflow scalarAquiferBaseflow_mean. A runoff netCDF file is the input for mizuRoute:
import xarray as xr
# match hru IDs
nc_arr = xr.open_dataset('/usr/lusers/mgou/uwhydro/summaProj/summaData/blivneh/explicitEuler/summa_zLocalAttributes_columbia_gru.nc')
hid = nc_arr['hruId'].values
hid = np.where(hid > 17900000, hid + 5445, hid)
# sum runoff
nc_all = xr.open_mfdataset('*.nc', concat_dim='hru')
nc_all['runoff'] = (nc_all['scalarSurfaceRunoff_mean'] + \
                    nc_all['scalarSoilBaseflow_mean'] + \
                    nc_all['scalarAquiferBaseflow_mean']) * 1000.0

q = nc_all['runoff'].values
q = np.where((q > 1.0) |(q < 0.0) | np.isnan(q), 0.0, q)
nc_all['runoff'].values = q
nc_all['hruId'] = ('hru', hid)
nc_all[['runoff', 'hruId']].to_dataset().to_netcdf('../summa_crb_runoff_daily.nc')
The mizuRoute master file for the Columbia simulation is /civil/hydro/michaelou/summaProj/summaData/blivneh/route/mizuRoute/route/columbia.control. The inputs are explained in the file.
VIC simulation

The results of the VIC simulation is provided by Oriana. The outputs are aggregrated as a netCDF file at /raid3/oriana/bpa/runs/historical/vic/160912_september2016_calibration/nc/calibrated_all_fluxes.19500101-20111231.nc on HYDRA.
Failed HRUs

Here I list the failed HRUs and their error:
Problematic HRUs

HruId -> hru_index in netcdf

Err message

17008568 -> 8523

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17008725 -> 8670

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not fea
17008744 -> 8689

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17008795 -> 8737

FATAL ERROR: coupled_em/volicePack/layerMerge/layer_combine/problem with enthalpy-->temperature conversion
17009106 -> 9039

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17009945 -> 9862

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17900119 -> 11130�

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not fea
17900121 -> 11132�

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17900193 -> 11204

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/summaSolve/eval8summa_wrapper/eval8summa/computResid/we found some Indian bread (NaN)
17900473 -> 11484�

FATAL ERROR: coupled_em/opSplittin/varSubstep/systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not feasible in explicit Euler (reduce time step)systemSolv/state is not fea�