Introduction

This document is meant to serve as a guideline to perform a feedback file (ff) based verification on any UNIX based system. Given that all the necessary software and scripts are available on the system, the user has to provide a set of namelists, run the verification scripts and can eventually have a look at the results through an interactive browser application. Here, the system requirements, to namelist options and tips on how to run the verification efficiently are provided.

System Requirements

Software

In order to use the verification suite, the following software is required

  • NetCDF
  • R (>= version 3.4.1)
    • sp +rgeos +parallel +data.table +SpecsVerification +matrixStats +RNetCDF +stringr +survival +grid +verification +reshape2 +pcaPP
  • shiny-server (recommended for the visualization of results, on the same system or any other)

The NetCDf library is available on https://www.unidata.ucar.edu/downloads/netcdf/index.jsp

The Rfdbk package serves as interface between ff and R and can be obtained and installed as follows:

# Download and install the FFV2 package
git clone https://gitlab.com/rfxf/ffv2.git
R CMD INSTALL ffv2

Some of the listed packages will be installed during the Rfdbk installation process. If not, each package is available on the CRAN archive https://cran.r-project.org/web/packages/available_packages_by_name.html and can be installed in the same manner as FFV2. This is a non public git project. If you want to access the git, please contact Felix.Fundel[at]dwd.de

The package contains the necessary scripts to run a verification task, either a domain average or station based, for deterministic or ensemble forecasts, for a number of different observation systems. This is a non public git project. If you want permissions to access the git, please contact Felix.Fundel[at]dwd.de

Hardware

The verification can be memory intensive, depending on the amount of observations, number of runs, length of period and desired subdomains. For a typical ICON-D2 DWD routine verification of one month, 40gb of memory should suffice. A monthly verification of the ICON DWD routine runs can use up to 200gb. By design, the station based verification is the most demanding, the domain average verification is much less memory demanding.

Verification jobs can be performed in parallel. The number of parallel processes also depends on the available memory and the scope of the verification task. The user is invited to find its own optimal settings.

Input Data

Observations and forecasts, as well as most of the additionally required information necessary to perform a verification are contained in NetCDF ff. The data contained in ff is typically valid for a certain date or a time window of a view hours around this date. Analysis and forecasts valid at this date are gathered in the ff. An older documentation of the content and structure of ff can be found on http://www.cosmo-model.org/content/model/cosmo/coreDocumentation/cosmoFeedbackFileDefinition.pdf The requirements needed for the production of ff are gathered in a documented as a result of the COSMO CARMA project (http://www.cosmo-model.org/view/repository/wg5/PP-CARMA/Task1).

In order for the verification software to run properly, a few requirements must be met. First the nedd to be one ore more ff available per experiment/model. For each experiment/mode the ff need to be provided in separate directories. The ff for the individual dates, observation systems and forecast types are collected in one directory. The names for the ff are made up of three parts

so that a typical ff would be named verSYNOP.2018100100

Any other naming will also work, however, to ensure full functionality, it is strongly recommended to use the syntax as described and, if necessary add pre- or suffixes to the file names.

The experiment verification checks for identically named ff in the different experiment directories, identical naming therefor is mandatory. Only the ff existing in all experiment directories will be used for the experiment verification (this can be changed by namelist). In case only one experiment is verified (e.g. without reference) all existing ff will be used (if no other restrictions on dates are set in the namlist). If the verification is not performed in the experiment mode, all found ff will be used (if no other restrictions on dates are set in the namlist).

Workflow

This section describes the correct sequence of script calls and the necessary/optional input and arguments that should to be provided.

Domain average verification

The domain average verification produces domain average scores as a function of forecast lead-time for a user defined verification period. Domains can be either the entire set of observations found in the ff or pre-defined subdomains or user defined subdomains (polygons or station lists). In addition to scores averaged over the verification period, time series (score as a function of valid time in the verification period) of the scores, also as a function of the forecast lead-time are produced. In a first step, intermediate scores are calculated. This script requires the namelist, type of observations, type of forecast and number of parallel processes as arguments
Example call

# Calling an deterministic SYNOP/TEMP verification script running on 6 cores
Rscript demo/starter_scores_by_date.R namelist SYNOP DET 6
Rscript demo/starter_scores_by_date.R namelist TEMP  DET 6

# Calling an ensemble SYNOP/TEMP verification script running on 6 cores
Rscript demo/starter_scores_by_date.R namelist SYNOP EPS 6
Rscript demo/starter_scores_by_date.R namelist TEMP  EPS 6

The above script calls results in a number of intermediate score files, one file per valid date of the ff. Already existing intermediate score files will not be recreated, except the underlying ff has changed (i.e. is more recent). These intermediate score file will then be aggregated and a final score files will subsequently be produced. Therefore the script starter_aggregate.R in the package ‘demo’ folder has to be called. It requires a namelist and, optionally, the number of cores for parallelization as arguments. Already existing final score files will be replaced. Example call

# Calling the R aggregation script with arguments namelist and number of cores (deterministic)
Rscript demo/starter_aggregate.R namelist SYNOP DET 6
Rscript demo/starter_aggregate.R namelist TEMP  DET 6

# Calling the R aggregation script with arguments namelist and number of cores (ensemble)
Rscript demo/starter_aggregate.R namelist SYNOP EPS 6
Rscript demo/starter_aggregate.R namelist TEMP  EPS 6

They aggregation scripts produce the final scores as a data.table and save them as Rdata file. Those Rdata files can be loaded with R and saves in many other data formats in order to use the results with other software. The SYNOP verification produces 6 final score files. One for the verification types (continuous, categorical and percent-correct scores), and each as average and time-series of scores. The TEMP verification produces 2 final score files. The resulting score files are meant to be used with a dedicated visualization shiny app.

Station based verification

Additionally a station based verification (scores as a function of observation station) can be called. It is only one script call for each observation and verification type The script take two arguments (namelist and cores (optional)). Example call

# Calling the R station verification script with arguments namelist, observation system, forecast system, verification method and number of processes.
Rscript demo/starter_scores_by_station.R namelist SYNOP DET CONT 6
Rscript demo/starter_scores_by_station.R namelist SYNOP DET CAT 6
Rscript demo/starter_scores_by_station.R namelist SYNOP EPS XXX 6  #XXX is just a placeholder

The categorical verification is available for SYNOP only. This verification produces no intermediate score files, only a final score file that can be used with a dedicated visualization application. This verification will not produce time series of scores.

Namelists

The ff verification is controlled with namelists. Namelists are simple text files that contain key/value pairs which are evaluated in the R verification scripts. Namelists have to be created at least for each forecast type and for each observation system. The intermediate score calculation and the aggregation can be controlled with a single namelist, in some cases however it is advisable to use separate namelists. Syntax errors in the namelist will probably lead to an immediate crash of the verification scripts. The following table should give a overview of the existing namelist keys. Some are mandatory, some have default values, not all keys are used by all verification scripts.

Key Descr. Value Example Fcst type Obs System Mandatory Default
varnoContinuous variable name see example Det./Ens. All Y None
pecthresholds list of thresh. for hit rates see example Det. SY N None
catthresholds list of thresh. for categorical verif. see example Det SY N None
cat2thresholds list of thresh. for categorical verif. using an upper and lower threshold see example Det SY N None
subdomains verification subdomains ‘ALL’,‘GER’,‘SYNOP’,‘WMO’,‘USER’ or combinations Det./Ens. SY/TE Y None
filePattern ff name pattern ‘verSYNOP’ or ‘vepTEMP’ etc Det./Ens. SY/TE Y None
timeSteps valid time binning steps ‘0’ All All Y None
timeBreaks valid time binning breaks ‘30,-30’ All All Y None
veri_run_class ff forecast run class ‘0,2’ All All Y None
veri_run_type ff forecast run type ‘0,4’ All All Y None
state state of observation ‘0,1,5’ All All Y None
r_state ff state of report ‘0,1,5’ All All Y None
sigTest significance testing ‘T’ Det./Ens. SY/TE N T
expType standard or singlerun ‘standard’ or ‘singlerun’ Det. SY/TE Y None
experiment experiment or routine ‘T’ or ‘F’ Det./Ens. SY/TE Y None
expDescr name for intermed. score files ‘Exp1,Exp2’ Det./Ens. SY/TE Y None
fileDescr name for final score files ‘Exp1-Exp2’ Det./Ens. SY/TE Y None
fdbkDirs ff directories ‘/dir/Exp1/,/dir/Exp2/’ Det./Ens. SY/TE Y None
expIds name(s) of experiment(s) ‘Exp1,Exp2’ Det./Ens. SY/TE Y None
outDir output directory ‘/outdir/’ Det./Ens. SY/TE Y None
lonlims bounding lon limits ‘-180,180’ Det./Ens. SY/TE N Null
latlims bounding lat limits ‘-90,90’ Det./Ens. SY/TE N Null
iniTimes runs ‘0,12’ or ‘ALL’ Det./Ens. SY/TE N Null
mimicVersus use Versus quality check ‘F’ or ‘T’ Det. SY/TE N Null
veri_ens_member user defined member(s) ‘1,2,3,4’ Ens./Det. SNOP/TE N Det:-1,Eps:>0
state user defined state ‘0,1’ Det. SY/TE N 0,1,5,11
domainTable file with specifications see below Det./Ens SY/TE N Null
whitelists use whitelist by variable see below Det./Ens. SY N None
blacklists use blacklist by variable see below Det./Ens. SY N None
dateList stratify by list of dates see below Det. SY N None
shinyServer name of the shiny server remote.machine.de All All N None
shinyAppPath shiny application path on server /data/user/shiny/ All All N None
alignObs turn off the data alignment ‘T’ or ‘F’ Det. SY/TE N ‘T’
startIniDate date of first run to use ‘YYYYMMDDHH’ Det./Eps SY/TE N None
stopIniDate date of last run to use ‘YYYYMMDDHH’ Det./Eps SY/TE N None
startValDate date of first valid date to use ‘YYYYMMDDHH’ Det./Eps SY/TE N None
stopValDate date of last valid date to use ‘YYYYMMDDHH’ Det./Eps SY/TE N None
aggMixed aggregate also incomplete score files ‘T’ or ‘F’ Det./Eps SY/TE N ‘F’
customLevels custom level choice ’1000,950,900,850’ Det./Eps TE N sign. levels
nMembers number of EPS members ‘20,40’ Eps SY Y None
useObsBias remove bias correction from obs ‘T’ or ‘F’ Det./Eps SY N ‘F’
veri_forecast_time select forecast times, reffers to veri_forecast_time ‘1200,2400’ Det./Eps ALL N None
instype filter on instype ‘21,22’ All All N None
rejectFlags reject obs if flag bit is set ‘1,2,3’ All All N None
mdlsfc filter mdlsfc bit ‘1,2,3’ All All N None

Example Namelist

# SYNOP deterministic nNamelist (ICON-D2 settings)
varnoContinuous 'T2M,RH,RH2M,PS,TD2M,FF,DD,RAD_GL_1h,RAD_DF_1h,TMIN_12h,TMAX_12h,JJ_12h,N,N_L,N_M,N_H'
pecthresholds   list('PS'=list('lower'=c(-500),'upper'=c(500)),'FF'=list('lower'=c(-3),'upper'=c(3)))
catthresholds   list('GUST_1h'=c(5,10,15),'RR_1h'=c(0.1,1,2))
subdomains      'ALL,GER'
filePattern     'verSYNOP.'
timeSteps       '0,-60,-120,-180'
timeBreaks      '0,-30,-90,-150,-180'
veri_run_type   '0,2'
veri_run_class  '0,2'
state           '0,1,5,7,11'
r_state         '0,1,5,7,11'
sigTest         'T'
expType         'standard'
experiment      'T'
expIds          'Exp1,Exp2'
expDescr        'Exp1,Exp2'
fileDescr       'Exp_Exp1_vs_Exp2_username'
fdbkDirs        '/Directory/Of/Exp1/,/Directory/Of/Exp2/'
outDir          '/Output/Directory/'



# TEMP deterministic namelist (ICON-D2 settings)
varnoContinuous 'Z,T,TD,RH,FF,DD'
subdomains      'ALL,GER'
filePattern     'verTEMP.'
timeSteps       '0'
timeBreaks      '0,-180'
veri_run_type   '0,2'
veri_run_class  '0,2'
state           '0,1,5'
r_state         '0,1,5'
sigTest         'T'
expType         'standard'
experiment      'T'
expIds          'Exp1,Exp2'
expDescr        'Exp1,Exp2'
fileDescr       'Exp_Exp1_vs_Exp2_username'
fdbkDirs        '/Directory/Of/Exp1/,/Directory/Of/Exp2/'
outDir          '/Output/Directory/'



# SYNOP EPS namelist (ICON-D2 settings)
varnoContinuous 'T2M,RH,RH2M,PS,TD2M,FF,RAD_GL_1h,RAD_DF_1h,TMIN_12h,TMAX_12h,N,N_L,N_M,N_H,GUST_1h,RR_1h'
catthresholds   list('GUST_1h'=c(5,10,15),'RR_1h'=c(0.1,1,2))
subdomains      'ALL,GER'
filePattern     'vepSYNOP'
timeSteps       '0,-60,-120,-180'
timeBreaks      '0,-30,-90,-150,-180'
veri_run_type   '0,2'
veri_run_class  '0,3'
state           '0,1,5,7,11'
r_state         '0,1,5,7,11'
sigTest         'T'
iniTimes        '0,12'
expType         'standard'
experiment      'T'
expIds          'Exp,Ref'
expDescr        'Exp_Ref_username'
fileDescr       'Exp_Ref_username'
fdbkDirs        '/Directory/Of/Exp/,/Directory/Of/Ref/'
outDir          '/Output/Directory/'
whitelists      '/path/to/whitelist_clouds'
blacklists      '/path/to/blacklist'
nMembers        '20,40'



# TEMP EPS namelist (ICON-D2 settings)
varnoContinuous 'Z,T,RH,U,V,QV,QV_N'
subdomains      'ALL,GER'
filePattern     'vepTEMP'
customLevels    '1000,950,900,850,800,750,700,650,600,550,500,450,400,350,300,250,200,150,100'
timeSteps       '0'
timeBreaks      '0,-180'
veri_run_type   '0,3'
veri_run_class  '0,2'
state           '0,1,5'
r_state         '0,1,5'
sigTest         'T'
iniTimes        '0,12'
expType         'standard'
experiment      'T'
expIds          'Exp,Ref'
expDescr        'Exp_Ref_username'
fileDescr       'Exp_Ref_username'
fdbkDirs        '/Directory/Of/Exp/,/Directory/Of/Ref/'
outDir          '/Output/Directory/'
whitelists      '/path/to/whitelist_clouds'
blacklists      '/path/to/blacklist'
nMembers        '20,40'

Experiment Definition

The verification allows for an arbitrary number of experiments (>0) to be verified simultaneously. Mandatory is the namelist key expIds that can take on or more comma separated values (e.g exp1,epx2,exp3 for 3 experiments). The experiment names chosen in expIds will be used by the verification to name the experiment. Correspondingly, also mandatory, is the specification of directories that contain the ff of the experiments. This happens with the namelist key fdbkDirs that has as many values as expIds in the same order!

Valid Time Definition

The time granularity of the observations has to be set by the user. The time settings have to be given relative to the valid time of the feedback file. The relevant namelist keys are timeSteps and timeBreaks. The values for these keys have to be given in minutes. In the namelist example the settings used for the routine verification of ICON-D2 at DWD are given. The feedback files in this case are available in intervals of 3 hours. They contain observations from a 3 hour interval prior to the valid date of a feedback file. To get a hourly verification from the 3 hourly feedback files the observations are binned. The bins are labeled by timeSteps, in this case 0,-60,-120,-180 minutes relative to the valid date of the feeback file. The bins range is defined by timeBreaks, in this case with 0,-30,-90,-150,-180 (O included, 180 not included). The length of the breaks has to be length of steps + 1. The binning can result overlapping valid times. The verification will aggregate this correctly. The user has to take care of observations no being used double. Like this a fine time granularity can be achieved even if a feedback file covers a longer period. For all users it is recommended to have a look at the feedback files to set the time parameters correctly.

Forecast and Analysis Definition

The namelist keys veri_run_type and veri_run_class are used to select the type of analysis and forecast. The relevant values can be found in the tables of the feedback file documentation. The class is typically set to 0,2 causing the verification of model data from the main run and the analysis run. With type 0 forecast are used. With type >=1 an analysis is used. The verification can handle only one kind of analysis at once, i.e. the user has to make sure that only one of the available analysis is used by specifying run_type and run_class correctly. Hence veri_run_type should have only one entry besides 0.

Observation State Definition

The namelist keys state and r_state are used to include or exclude observations an reports based on their quality flag. The allowed values can be found in the feedback file documentations. Recommended values to use all not rejected observations are 0,1,5,11 for both. State 7 (rejected) is set for LAM verification in order to use surface wind observations >100m (will become obsolete as soon as those wind observations are no longer rejected in assimilation).

Verification Period Definition

The verification period is a result of the found ff. If the mandatory namelist key experiment is set to TRUE, than only those ff that have a matching file in all experiment directories are used by the verification. If the mandatory namelist key experiment is set to FALSE, than all ff will be used by the verification, no matter if there are matching files in the other experiment directories. The recommended way, if the verification period should be shorter than the period with ff, then experiment should be TRUE and at least for one experiment the not wanted ff should be removed from the directory.

Also the namelist takes the key startValDate and stopValDate that can be used to trim the verification period by valid date. These keys take values of the type YYYYMMDDHH, which only works if the ff are named like *YYYYMMDDHH!

Additionally the verification period can be trimmed for the forecast initialization date. Therefore the keys startIniDate and stopIniDate can be used with values like YYYYMMDDHH.

Variable and Threshold Selection

Variables used for verification are set using the namelist key varno. For variables not in this list, no scores will be calculated. The deterministic SYNOP verification deviates from this syntax. Here the namelist keys varnoContinuous pecthresholds, catthresholds and cat2thresholds define which variables will be used. In case of a SYNOP namelist, the variables have to be provided with their name (ff definition) in case of a TEMP namelist, the variables have to be provides with their number (ff definition)

Using thresholds to perform a categorical or probabilistic verification is only possible with the SYNOP verification. In case of a deterministic forecast, a threshold will result in categorical (contingency table based) scores. The user has the possibility to define thresholds or uncertainty limits for all variables. A simple threshold can be set with the namelist key catthresholds that takes a R list as argument (see example SYNOP namelist). Additionally, if the namlist key pecthresholds is set, hit rates (percent correct forecast) for the forecast to hit the observation within the given limits are calculated. The Ensemble verification evaluates the namelist key thresholds that is following the same syntax as the key for the categorical verification.

Forecast/Analysis Selection

By default the verification will use all forecasts from all runs that are contained in the ff. If wanted, the user can restrict this using the following options.

Run Selection

The namelist key iniTimes can be used to evaluate only those forecast runs with initial times as given in the value. The value can be on or more comma separated initial times, e.g. using 0,12 will result in a verification of the 00UTC and 12UTC runs only.

Member Selection

Members can be selected using the namelist key veri_ens_member. With this, a single member verification of an ensemble prediction system can be performed or the ensemble verification can be restricted to a subset of members. The namelist key veri_ens_member can be used to select one or more members as comma separated list (e.g. 1,2,3,4,5 for member 1-5). In the namelist for the deterministic verification, the keys expIds and fdbkDirs have to be repeated accordingly (e.g. 5x if 5 members are selected) The veri_ens_member key is also evaluated in the ensemble verification, here however the keys expIds and fdbkDirs should not repeat.

Forecast/Analysis Selection

The namelist keys veri_run_type and veri_run_class are used to select the type of analysis and forecast. The relevant values can be found in the tables of the feedback file documentation. The class is typically set to 0,2 causing the verification of model data from the main run and the analysis run. With type 0 forecast are used. With type >=1 an analysis is used. The verification can handle only one kind of analysis at once. Hence veri_run_type should have only one entry besides 0.

Score Selection

There is no possibility to restrict the number of scores calculated.

Data alignment

If two or more experiments are verified simultaneously, the used set of observation/forecast pairs will be the same for all experiments. This allows for true comparability of the scores from the different experiments. The alignment basically checks for station ID and level and used the observation and forecast if they are available for all experiments. The value of the observation is explicitly not used for the alignment, i.e. different experiments may use observations of different values (e.g. due to a bias correction). The data alignent can be turned off by setting the namelist key alignObs to FALSE.

Domain Stratification

The mandatory namelist key subdomains can be used to further divide the verification domain in sub-domains. If this is not wanted, its value should be set to ALL Possible further values are

  • LAT Stratification in NH,TR,SH
  • CEU Domain of what used to be COSMO-EU
  • CDE Domain of what used to be LM1
  • GER Germany (w.o. islands)
  • SYNOP Set of predefined regions used for SYNOP verification (see Appendix), only use with SYNOP namelist
  • WMO Set of predefined TEMP stations (see Appendix), only use with TEMP namelist
  • USER Works only in combination with the key domainTable, either polygons or stations. Values SYNOP or WMO are overridden!

The namelist key domainTable points to an ascii file that contains either polygon or station ID information for one or more sub-domains. Example syntax for a domain stratification using domainTable

# Example polygon domain table
name    lon     lat
NORD    8       50.001
NORD    15      50.001
NORD    15      55
NORD    8       55
SUED    8       45
SUED    15      45
SUED    15      50
SUED    8       50

or

# Example station domain table
name    id
DE      Q887
DE      10837
DE      10184
CH      06670
CH      06612
CH      06610

Important to note:

  • Station sets or polygons must not overlap.
  • Polygons can be arbitrarily complex.
  • Table columns are tab separated.
  • Do not use subdomain key words ( i.e. ALL,LAT,SYNOP,WMO,CEU,CDE,GER ) in the name column.

Using Whitelists

One or several user defined whitelists can be used to filter observations by variable. Name and path of the whitelists have to be specified in the namelist using the whitelists key. If more than one whitelists should be used, the need to be separated by comma in the namelist value. The whitelist has to have to named columns. The first column named “statid” gives the quoted station ID. The second column named “varno” gives the quoted variable name (see ff documentation). By default all observations are used, only if a variable name is part of a whitelist it is filtered accordingly.

# Example whitelist causing to use total cloud cover and high clouds from 3 stations only, excluding the rest.
"statid","varno"
"07024","N"
"07412","N"
"07134","N"
"07791","N_H"
"07770","N_H"
"07666","N_H"

Using Blacklists

Backlists can be used to exclude suspect observations from stations.

# Example blacklist removing RH observations from 3 stations.
"statid","varno"
"G334","RH"
"G315","RH"
"G297","RH"

Conditional Verification

This is so far only implemented in the deterministic and ensemble SYNOP verification.

# Example conditions on cloud cover observations and forecast error
# NOTE: Conditions must not overlap!!!
condition1  "list(N='obs==0',N='abs(veri_data-obs)<1')"
condition2  "list(N='obs==0',N='abs(veri_data-obs)>=1')"
condition3  "list(N='obs%between%c(1,4)',N='abs(veri_data-obs)<1')"
condition4  "list(N='obs%between%c(1,4)',N='abs(veri_data-obs)>=1')"
condition5  "list(N='obs%between%c(5,7)',N='abs(veri_data-obs)<1')"
condition6  "list(N='obs%between%c(5,7)',N='abs(veri_data-obs)>=1')"
condition7  "list(N='obs==8',N='abs(veri_data-obs)<1')"
condition8  "list(N='obs==8',N='abs(veri_data-obs)>=1')"

Features:

  • Arbitrary number of conditions (given as R-lists) is possible.
  • Arbitrary number of sub-conditions by condition (list elements) is possible.
  • Conditions have to be written as properties of variables of the obsevation system.
  • Only reports that contain all variables used in a condition will be used.
  • Conditions are not aware of each other, i.e. overlap in used data will not cause an error and should be prevented.
  • Properties of variables that can be used to define a condition are: “obs”,“veri_data”,“lon”,“lat”,“statid”,“z_station”,“veri_initial_date”.
  • Using “lon”,“lat”,“statid”,“z_station”,“veri_initial_date” might cause interferences with built-in stratifications.
  • Be careful with R functions like %between% as a shortcut for (x >= left & x <= right) that can cause overlapping conditions.

Hints:

  • don’t stratify on the forecast (veri_data) in case of an ensemble verification

Example:

Verification results for the 8 example conditions

Verification results for the 8 example conditions

Stratification by Date

A list of dates (YYYYMMDD) can be used to further divide the verification results in 2 subsections. The list contains all the dates in one subsection, all the other dates of the verification period will be attributed to the other section. This refinement happens in the aggregation part, where the experiment name is extended by a + (dates in dateList) or - (dates not in dateList). A possible applications can be the division of the verification period by weather regime. To utilize a datelist use the namelist key dateLlist with the name of the list as value (including path).

# Example dateList
# All dates below will be used for subset +, all other dates will be in subset -.
20170410
20170413
20170415
20170417
20170422
20170423
20170425
20170429
20170509
20170512

Significance Testing

Significance test for score differences between 2 experiments can be performed optionally by setting the namelist key sigTest to TRUE. So far, this is implemented for the domain average, deterministic SYNOP and TEMP verification as well as the domain average ensemble TEMP verification. The significance test is executed in the aggregation scripts. This test (t-test) searches for a significant difference in the mean of the distributions of scores from 2 experiments within the verification period. The significance level of the test is set to 95%. To avoid autocorrelation, each run time is evaluated individually. Testing of all scores for all possible combination of experiments can take a significant amount of time.

Example:

Verification results showing the results of the significance test by differently coloured circles. Red: significant difference, Gray: no significant difference, White: no test possible. The test results are shown only for scores of a specific ini-time and two experiments.

Verification results showing the results of the significance test by differently coloured circles. Red: significant difference, Gray: no significant difference, White: no test possible. The test results are shown only for scores of a specific ini-time and two experiments.

Hindcast & Standard Verification

The verification distinguishes between to types, namely standard and hindcast experiments. Standard experiments consist of several runs, each with a certain lead-time (e.g. one month of ICON-D2 runs with 27 hours). Hindcast experiments consist of one run with a very long lead-time (e.g. one ICON-D2 run with a 744 hour forecast). Technically this does not make any difference in the verification except that in hindcast mode the lead-time is mapped to the time of the day. Hindcast experiments therefore appear as experiments with lead-time from 1h-24h in the verification. The type of verification has to be set by nameslist with the key expType that can take the values standard or singlerun.

Single Member Verification

The members from an EPS can be verified deterministically by some namelist modifcations and subsequently calling the deterministic verification scripts. First the namlist key filePattern has to be changed so that ensemble feedback files are found (e.g. from verSYNOP to vepSYNOP). In order to not interfere with the ensemble verification make sure that the namelist key expDescr differs in the deterministic and the ensemble namelist. The namelist key expIDs has to be repeated so that each member is given an individual name (e.g. mem1,mem2,mem3,...). The namelist key fdbkDirs also needs to be repeated for each member. The nameilst key veri_ens_member needs to be introduced and give the member number for each expID (e.g. 1,2,3,...). In that manner the single member verification can also be performed for more than one experiment.

Example

Verification results of a single member evaluation of one experiment (of 20 members)

Verification results of a single member evaluation of one experiment (of 20 members)

Some examples on how namelist keys can be used to utilize different members for deterministic or ensemble verification tasks.

Deterministic verification of 2 EPS experiments using only member 1-3

expIds          'exp1_m1,exp1_m2,exp1_m3,exp2_m1,exp2_m2,exp2_m3'
fdbkDirs        '/path/exp1/ff/,/path/exp1/ff/,/path/exp1/ff/,/path/exp2/ff/,/path/exp2/ff/,/path/exp2/ff/'
veri_ens_member '1,2,3,1,2,3'

Ensemble verification of 2 EPS experiments using only members 1-3 of each

expIds          'exp1,exp2'
fdbkDirs        '/path/exp1/ff/,/path/exp2/ff/'
veri_ens_member '1,2,3,1,2,3'
nMembers        '3,3'

Ensemble verification of 1 EPS experiments with different member subsets

expIds          'exp1a,exp1b'
fdbkDirs        '/path/exp1/ff/,/path/exp1/ff/'
veri_ens_member '1,2,3,10,11,12'
nMembers        '3,3'

Bias Corrected Observations

Observations might be subject to bias correction. In this case it might be relevant to verify against observation with ans without the bias correction applied. The verification can take account of that by setting the namelist key useObsBias to TRUE. In this case, the individual observation bias corrections will be made undone. The scores will be computed against the bias corrected observation and against the raw observation. Both variants will show up in the score file and in the visualization. This option is, so far, implemented for the SYNOP deterministic and ensemble verification.

Example

Verification against bias corrected T2M and raw T2M_raw observations

Verification against bias corrected T2M and raw T2M_raw observations

Verification Output

The verification output is saved in the directory set with the namelist key outDir. The optional keys shinyServer and shinyAppPath are meant to copy (scp) the final score files to the correct shiny application directory.

Domain average verification

The domain average verification produces intermediate score files, one for each ff. These files are used by the aggregation script, which should return one or more final score files.

Station based verification

The station based verification scripts produce one finale score directly from the ff. There is no intermediate file production.

Running the Verification

# Deterministic SYNOP/TEMP DET/EPS verification by valid-date, 6 dates in parallel
Rscript starter_scores_by_date.R namelist SYNOP DET 6
Rscript starter_scores_by_date.R namelist TEMP  DET 6
Rscript starter_scores_by_date.R namelist SYNOP EPS 6
Rscript starter_scores_by_date.R namelist TEMP  EPS 6
# Aggregation of determ. SYNOP/TEMP scores (only file reading is parallelized)
Rscript starter_aggregate.R namelist SYNOP DET 6
Rscript starter_aggregate.R namelist TEMP  DET 6
Rscript starter_aggregate.R namelist SYNOP EPS 6
Rscript starter_aggregate.R namelist TEMP  EPS 6
# Station based SYNOP/TEMP verification
Rscript starter_scores_by_station.R namelist SYNOP DET CONT 6
Rscript starter_scores_by_station.R namelist SYNOP DET CAT 6
Rscript starter_scores_by_station.R namelist TEMP  DET CONT 6
Rscript starter_scores_by_station.R namelist SYNOP EPS CONT 6
Rscript starter_scores_by_station.R namelist TEMP  EPS CONT 6

Quality Control Within the Verification

A general rule of the ff based verification is that all forecast/observation pairs that fullfill the default or user defined demands are used in the verification. This is mainly due to the assumption that the qualtiy control through data assimilation is sufficient. However, some exceptions exist that are listed here.

Mimicing VERSUS Quality Check

There is an option to mimic the quality check as performed within VERSUS with the ff verification. This can be done by setting the namelist key mimicVersus to TRUE. In this case the qualtiy flags contained in the ff will be ignored, i.e. also observations rejected by data assimilation will be used. Observations will only be discarded if

|fcst-obs| >= 50 m/s for surface windspeed
|fcst-obs| >= 25 hPa for surface pressure
|fcst-obs| >= 30 K for 2m dew-point or 2m temperature

There is no further quality control of in the TEMP verification.

Visualization

Shiny-server applications

The visualization is done by shiny (http://shiny.rstudio.com) applications. Shiny applications basically are R scripts (ui.R and server.R) that can started from within a R session or with the shiny-server. It is recommended to use shiny-server, in order to ensure the permanent availability of the applications. The free version of shiny-server has been found to perform very well in the case of providing verification results to an entire meteorological office. A detailed description of the shiny application is not (yet) part of this document. To accommodate yourself with the concept and possibilities of shiny visit https://shiny.rstudio.com/gallery/

The motivation behind using shiny applications for visualization is the high flexibility in browsing and composing the numerous results of the verification. Instead of producing a plot for every calculated score, the desired plot is generated interactively, on demand. Besides that, smaller data modifications can be performed within the app (e.g. differences between scores, blending out/in of data, smoothing, etc.) The applications can be used rather intuitively. Generally the user has to decide for and load a score file from a selection. Than, the user can decide for the run(s), valid-time(s), variable(s), score(s), subdomain(s), height interval(s), experiment(s) and the according plot is generated on the fly. The apps generally have a “Bookmark” button that allows to reload the app with the current settings. Plots can be exported by copy/paste or by right-click & save as png. Besides the score plot a summary plot is provided serves as a short summary of the experiments outcome. Also the data behind the plot can be exported in different formats from inside the app.

The drawback of this interactivity is a small temporal delay due to data handling an rendering, in most cases however, plot generation is sufficiently quick.
Example of the deterministic SYNOP verification visualization application

To get a first impression (outside DWD) on http://www.cosmo-model.org/shiny/users/fdbk/

Other

The final score files saved in the Rdata (binary) format (https://bookdown.org/ndphillips/YaRrr/rdata-files.html). They files contain one or more tables of scores. The tables can be loaded in R, saved in various other formats and then be used by with other plotting tools. For faster data handling in R it is recommended to load the data.table package in beforehand.

# Loading a verification output file in R
require(data.table)
load("/path/to/scoreFileName")
# Print file structure
str(scores)

Further Tips

Generating Polygon Domain Tables for Countries

Download the required sp shape file (level 0) from https://gadm.org/download_country_v3.html

require(data.table)
# load shape file, example Romania
shapeFile=readRDS("~/gadm36_ROU_0_sp.rds")
# Romania seems to have 4 individual polygons, the main one is number 4 (others might be islands)
regions = shapeFile@polygons[[1]]@plotOrder
# Bind all regions to a single table, regions separated by NAs
poly=c()
for (region in regions){
    poly = rbind(poly,shapeFile@polygons[[1]]@Polygons[[region]]@coords)
    poly = rbind(poly,c(NA,NA))
    print(dim(poly))
}
#Test
#plot(poly,type="l")
#Save to FFV conform domain table
DT = data.table(name="Romania",lon=poly[,1],lat=poly[,2])
fwrite(DT,file="~/polyRomania.txt",sep="\t")

Additional Tools

Appendix

Scores

deterministic continuous

  • ME - mean error, bias - \(ME=\frac{1}{n}\sum_{t=1}^{n}F_t - O_t\)
  • MAE - mean absolute error - \(MAE=\frac{1}{n}\sum_{t=1}^{n}|F_t - O_t|\)
  • RMSE - root mean squared error - \(RMSE=\sqrt{\frac{1}{n}\sum_{t=1}^{n}(F_t - O_t)^2}\)
  • SD - standard deviation of error - \(SD={\sqrt {{\frac {1}{n-1}}\sum _{{i=1}}^{n}{(X_{i}-{\bar {X}})^{2}}}}\)
  • R2 - correlation coefficient - \(R^2=({\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}})^{2}\)
  • TCC - tendency correlation - \(TCC = \frac{\sum ((F-C)-(\overline{F-C}))((O-C)-(\overline{O-C}))}{\sqrt{\sum ((F-C)^{2})) }\sqrt{\sum (O-C)^{2}}}\)
  • LEN - number of observations

deterministic categorical

  • Numerous scores based on contingency tables (POD,FAR,ETS,FBI,OR,…)

ensemble

  • CRPS - continuous ranked probability score - \(CRPS=\frac{1}{N}\sum_{i=1}^{N}\int_{-\infty }^{\infty }(F_{i}^{f}(x)-F_{i}^{o}(x))dx\)
  • CRPSF - fair crps, accounting for the limited number of EPS members
  • IGN - ignorance score - \(IGN = -log_{2}(f_{i})\) (only TEMP)
  • OUTLIER - percentage of observations outside ensemble range (only TEMP)
  • SKILL - RMSE of ensemble mean
  • SPREAD - mean ensemble spread
  • ME - mean error of ensemble mean
  • SD - standard deviation of ensemble mean error
  • SPREAD/SKILL
  • Talagrand (rank) histogram (only SYNOP)
  • Reliability diagram (forecast probability vs. observed frequency) (only SYNOP)
  • ROC Curve (FAR vs POD) (only SYNOP)
  • Value score vs Cost/Loss Ratio (only SYNOP)

probabilistic

  • Brier Score - \(BS= \frac{1}{N}\sum_{i=1}^{N}\left ( p_{i}-o_{i} \right )^{2}\)
  • Brier skill score - \(BSS = 1-\frac{BS_{fcst}}{BS_{ref}}\)
  • Brier rel. - \(BS_{rel}=\frac{1}{N}\sum_{k=1}^{K}n_{k}\left (p_{k}-\bar{o}_{k} \right )^{2}\)
  • Brier res. - \(BS_{res}=\frac{1}{N}\sum_{k=1}^{K}n_{k}\left (\bar{o}_{k}-\bar{o} \right )^{2}\)
  • Uncertainty - \(BS_{unc}=\bar{o}(1-\bar{o})\)
  • Roc area - Integral of curve obtained from plotting POD vs FAR for different probabilities
  • Events - Number of observed events
  • ROC curves
  • Relative economic value as fct. of the cost/loss ratio

Subdomains

The verification offers a set of pre-defined subdomains, they can be used to further stratify the scores. The use is optional, the user can substitute these subdomains with her own definition (see here).

SYNOP sub-domains

Namelist entry: sudomains SYNOP

TEMP sub-domains

Namelist entry: sudomains WMO

TEMP sites used at DWD in the ICON domain on an arbitrary day

TEMP sites used at DWD in the ICON domain on an arbitrary day

Observations

Short Name Long Name varno
PS surface pressure [Pa] 110
T2M 2 metre temperature [K] 39
T upper air temperature [K] 2
TD2M 2 metre dew point [K] 40
TD uper air dew point [K] 59
RH2M 2 metre relative humidity [0..1] 58
RH upper air relative humidity [0..1] 29
FF wind speed [m/s] 112
U10M 10 metre u-wind speed [m/s] 41
U upper air u-wind speed [m/s] 3
V10M 10 metre v-wind speed [m/s] 42
V upper air v-wind speed [m/s] 4
DD wind direction [deg.] 110
GUST_Xh max. wind gust in the last X hour(s) [m/s] 242
RR_Xh total precipitation in the las X hour(s) [mm] 80
N(_L/M/H) cloud cover (total, low, mid, high) [oct.] 91/67/93/94
JJ_Xh maximum temperature in the last X hour(s) [K] 81
TMIN_Xh minimum temperature in the last X hour(s) [K] 243
RAD_DF_Xh diffuse radiation in the last X hour(s) [J/m^2] 238
RAD_GL_Xh global radiation in the last X hour(s) [J/m^2] 237

SYNOP sites (LAM)

Observation sites used at DWD in the COSMO-D2 domain on an arbitrary day

Observation sites used at DWD in the COSMO-D2 domain on an arbitrary day

SYNOP sites (GLOBAL)

Observation sites used at DWD in the ICON domain on an arbitrary day

Observation sites used at DWD in the ICON domain on an arbitrary day

SYNOP sites (EU Nest)

Observation sites used at DWD in the ICON-EU nest domain on an arbitrary day

Observation sites used at DWD in the ICON-EU nest domain on an arbitrary day