This document is meant to serve as a guideline to perform a feedback file (ff) based verification on any UNIX based system. Given that all the necessary software and scripts are available on the system, the user has to provide a set of namelists, run the verification scripts and can eventually have a look at the results through an interactive browser application. Here, the system requirements, to namelist options and tips on how to run the verification efficiently are provided.
In order to use the verification suite, the following software is required
The NetCDf library is available on https://www.unidata.ucar.edu/downloads/netcdf/index.jsp
The Rfdbk package serves as interface between ff and R and can be obtained and installed as follows:
# Download and install the FFV2 package
git clone https://gitlab.com/rfxf/ffv2.git
R CMD INSTALL ffv2
Some of the listed packages will be installed during the Rfdbk installation process. If not, each package is available on the CRAN archive https://cran.r-project.org/web/packages/available_packages_by_name.html and can be installed in the same manner as FFV2. This is a non public git project. If you want to access the git, please contact Felix.Fundel[at]dwd.de
The package contains the necessary scripts to run a verification task, either a domain average or station based, for deterministic or ensemble forecasts, for a number of different observation systems. This is a non public git project. If you want permissions to access the git, please contact Felix.Fundel[at]dwd.de
The verification can be memory intensive, depending on the amount of observations, number of runs, length of period and desired subdomains. For a typical ICON-D2 DWD routine verification of one month, 40gb of memory should suffice. A monthly verification of the ICON DWD routine runs can use up to 200gb. By design, the station based verification is the most demanding, the domain average verification is much less memory demanding.
Verification jobs can be performed in parallel. The number of parallel processes also depends on the available memory and the scope of the verification task. The user is invited to find its own optimal settings.
Observations and forecasts, as well as most of the additionally required information necessary to perform a verification are contained in NetCDF ff. The data contained in ff is typically valid for a certain date or a time window of a view hours around this date. Analysis and forecasts valid at this date are gathered in the ff. An older documentation of the content and structure of ff can be found on http://www.cosmo-model.org/content/model/cosmo/coreDocumentation/cosmoFeedbackFileDefinition.pdf The requirements needed for the production of ff are gathered in a documented as a result of the COSMO CARMA project (http://www.cosmo-model.org/view/repository/wg5/PP-CARMA/Task1).
In order for the verification software to run properly, a few requirements must be met. First the nedd to be one ore more ff available per experiment/model. For each experiment/mode the ff need to be provided in separate directories. The ff for the individual dates, observation systems and forecast types are collected in one directory. The names for the ff are made up of three parts
so that a typical ff would be named verSYNOP.2018100100
Any other naming will also work, however, to ensure full functionality, it is strongly recommended to use the syntax as described and, if necessary add pre- or suffixes to the file names.
The experiment verification checks for identically named ff in the different experiment directories, identical naming therefor is mandatory. Only the ff existing in all experiment directories will be used for the experiment verification (this can be changed by namelist). In case only one experiment is verified (e.g. without reference) all existing ff will be used (if no other restrictions on dates are set in the namlist). If the verification is not performed in the experiment mode, all found ff will be used (if no other restrictions on dates are set in the namlist).
This section describes the correct sequence of script calls and the necessary/optional input and arguments that should to be provided.
The domain average verification produces domain average scores as a
function of forecast lead-time for a user defined verification period.
Domains can be either the entire set of observations found in the ff or
pre-defined subdomains or user defined subdomains (polygons or station
lists). In addition to scores averaged over the verification period,
time series (score as a function of valid time in the verification
period) of the scores, also as a function of the forecast lead-time are
produced. In a first step, intermediate scores are calculated. This
script requires the namelist, type of observations, type of forecast and
number of parallel processes as arguments
Example call
# Calling an deterministic SYNOP/TEMP verification script running on 6 cores
Rscript demo/starter_scores_by_date.R namelist SYNOP DET 6
Rscript demo/starter_scores_by_date.R namelist TEMP DET 6
# Calling an ensemble SYNOP/TEMP verification script running on 6 cores
Rscript demo/starter_scores_by_date.R namelist SYNOP EPS 6
Rscript demo/starter_scores_by_date.R namelist TEMP EPS 6
The above script calls results in a number of intermediate score
files, one file per valid date of the ff. Already existing intermediate
score files will not be recreated, except the underlying ff has changed
(i.e. is more recent). These intermediate score file will then be
aggregated and a final score files will subsequently be produced.
Therefore the script starter_aggregate.R
in the package
‘demo’ folder has to be called. It requires a namelist and, optionally,
the number of cores for parallelization as arguments. Already existing
final score files will be replaced. Example call
# Calling the R aggregation script with arguments namelist and number of cores (deterministic)
Rscript demo/starter_aggregate.R namelist SYNOP DET 6
Rscript demo/starter_aggregate.R namelist TEMP DET 6
# Calling the R aggregation script with arguments namelist and number of cores (ensemble)
Rscript demo/starter_aggregate.R namelist SYNOP EPS 6
Rscript demo/starter_aggregate.R namelist TEMP EPS 6
They aggregation scripts produce the final scores as a data.table and save them as Rdata file. Those Rdata files can be loaded with R and saves in many other data formats in order to use the results with other software. The SYNOP verification produces 6 final score files. One for the verification types (continuous, categorical and percent-correct scores), and each as average and time-series of scores. The TEMP verification produces 2 final score files. The resulting score files are meant to be used with a dedicated visualization shiny app.
Additionally a station based verification (scores as a function of observation station) can be called. It is only one script call for each observation and verification type The script take two arguments (namelist and cores (optional)). Example call
# Calling the R station verification script with arguments namelist, observation system, forecast system, verification method and number of processes.
Rscript demo/starter_scores_by_station.R namelist SYNOP DET CONT 6
Rscript demo/starter_scores_by_station.R namelist SYNOP DET CAT 6
Rscript demo/starter_scores_by_station.R namelist SYNOP EPS XXX 6 #XXX is just a placeholder
The categorical verification is available for SYNOP only. This verification produces no intermediate score files, only a final score file that can be used with a dedicated visualization application. This verification will not produce time series of scores.
The ff verification is controlled with namelists. Namelists are simple text files that contain key/value pairs which are evaluated in the R verification scripts. Namelists have to be created at least for each forecast type and for each observation system. The intermediate score calculation and the aggregation can be controlled with a single namelist, in some cases however it is advisable to use separate namelists. Syntax errors in the namelist will probably lead to an immediate crash of the verification scripts. The following table should give a overview of the existing namelist keys. Some are mandatory, some have default values, not all keys are used by all verification scripts.
Key | Descr. | Value Example | Fcst type | Obs System | Mandatory | Default |
---|---|---|---|---|---|---|
varnoContinuous | variable name | see example | Det./Ens. | All | Y | None |
pecthresholds | list of thresh. for hit rates | see example | Det. | SY | N | None |
catthresholds | list of thresh. for categorical verif. | see example | Det | SY | N | None |
cat2thresholds | list of thresh. for categorical verif. using an upper and lower threshold | see example | Det | SY | N | None |
subdomains | verification subdomains | ‘ALL’,‘GER’,‘SYNOP’,‘WMO’,‘USER’ or combinations | Det./Ens. | SY/TE | Y | None |
filePattern | ff name pattern | ‘verSYNOP’ or ‘vepTEMP’ etc | Det./Ens. | SY/TE | Y | None |
timeSteps | valid time binning steps | ‘0’ | All | All | Y | None |
timeBreaks | valid time binning breaks | ‘30,-30’ | All | All | Y | None |
veri_run_class | ff forecast run class | ‘0,2’ | All | All | Y | None |
veri_run_type | ff forecast run type | ‘0,4’ | All | All | Y | None |
state | state of observation | ‘0,1,5’ | All | All | Y | None |
r_state | ff state of report | ‘0,1,5’ | All | All | Y | None |
sigTest | significance testing | ‘T’ | Det./Ens. | SY/TE | N | T |
expType | standard or singlerun | ‘standard’ or ‘singlerun’ | Det. | SY/TE | Y | None |
experiment | experiment or routine | ‘T’ or ‘F’ | Det./Ens. | SY/TE | Y | None |
expDescr | name for intermed. score files | ‘Exp1,Exp2’ | Det./Ens. | SY/TE | Y | None |
fileDescr | name for final score files | ‘Exp1-Exp2’ | Det./Ens. | SY/TE | Y | None |
fdbkDirs | ff directories | ‘/dir/Exp1/,/dir/Exp2/’ | Det./Ens. | SY/TE | Y | None |
expIds | name(s) of experiment(s) | ‘Exp1,Exp2’ | Det./Ens. | SY/TE | Y | None |
outDir | output directory | ‘/outdir/’ | Det./Ens. | SY/TE | Y | None |
lonlims | bounding lon limits | ‘-180,180’ | Det./Ens. | SY/TE | N | Null |
latlims | bounding lat limits | ‘-90,90’ | Det./Ens. | SY/TE | N | Null |
iniTimes | runs | ‘0,12’ or ‘ALL’ | Det./Ens. | SY/TE | N | Null |
mimicVersus | use Versus quality check | ‘F’ or ‘T’ | Det. | SY/TE | N | Null |
veri_ens_member | user defined member(s) | ‘1,2,3,4’ | Ens./Det. | SNOP/TE | N | Det:-1,Eps:>0 |
state | user defined state | ‘0,1’ | Det. | SY/TE | N | 0,1,5,11 |
domainTable | file with specifications | see below | Det./Ens | SY/TE | N | Null |
whitelists | use whitelist by variable | see below | Det./Ens. | SY | N | None |
blacklists | use blacklist by variable | see below | Det./Ens. | SY | N | None |
dateList | stratify by list of dates | see below | Det. | SY | N | None |
shinyServer | name of the shiny server | remote.machine.de | All | All | N | None |
shinyAppPath | shiny application path on server | /data/user/shiny/ | All | All | N | None |
alignObs | turn off the data alignment | ‘T’ or ‘F’ | Det. | SY/TE | N | ‘T’ |
startIniDate | date of first run to use | ‘YYYYMMDDHH’ | Det./Eps | SY/TE | N | None |
stopIniDate | date of last run to use | ‘YYYYMMDDHH’ | Det./Eps | SY/TE | N | None |
startValDate | date of first valid date to use | ‘YYYYMMDDHH’ | Det./Eps | SY/TE | N | None |
stopValDate | date of last valid date to use | ‘YYYYMMDDHH’ | Det./Eps | SY/TE | N | None |
aggMixed | aggregate also incomplete score files | ‘T’ or ‘F’ | Det./Eps | SY/TE | N | ‘F’ |
customLevels | custom level choice | ‘1000,950,900,850’ | Det./Eps | TE | N | sign. levels |
nMembers | number of EPS members | ‘20,40’ | Eps | SY | Y | None |
useObsBias | remove bias correction from obs | ‘T’ or ‘F’ | Det./Eps | SY | N | ‘F’ |
veri_forecast_time | select forecast times, reffers to veri_forecast_time | ‘1200,2400’ | Det./Eps | ALL | N | None |
instype | filter on instype | ‘21,22’ | All | All | N | None |
rejectFlags | reject obs if flag bit is set | ‘1,2,3’ | All | All | N | None |
mdlsfc | filter mdlsfc bit | ‘1,2,3’ | All | All | N | None |
roundLatLon | turn on/off rounding to 2nd digit | ‘T’ or ‘F’ | All | All | N | True |
qSave | save as qs instead of Rdata if True | ‘T’ or ‘F’ | All | All | N | False |
veri_model | filter on veri_model | ‘ICON,COSMO’ | All | All | N | None |
useSSOSTDH | use sso_stdh to filter obs | ‘T’ or ‘F’ | All | All | N | True |
useRejectedWinds | use wind of state=rejected (7) | ‘T’ or ‘F’ | All | All | N | False |
# SYNOP deterministic nNamelist (ICON-D2 settings)
varnoContinuous 'T2M,RH,RH2M,PS,TD2M,FF,DD,RAD_GL_1h,RAD_DF_1h,TMIN_12h,TMAX_12h,JJ_12h,N,N_L,N_M,N_H'
pecthresholds list('PS'=list('lower'=c(-500),'upper'=c(500)),'FF'=list('lower'=c(-3),'upper'=c(3)))
catthresholds list('GUST_1h'=c(5,10,15),'RR_1h'=c(0.1,1,2))
subdomains 'ALL,GER'
filePattern 'verSYNOP.'
timeSteps '0,-60,-120,-180'
timeBreaks '0,-30,-90,-150,-180'
veri_run_type '0,2'
veri_run_class '0,2'
state '0,1,5,7,11'
r_state '0,1,5,7,11'
sigTest 'T'
expType 'standard'
experiment 'T'
expIds 'Exp1,Exp2'
expDescr 'Exp1,Exp2'
fileDescr 'Exp_Exp1_vs_Exp2_username'
fdbkDirs '/Directory/Of/Exp1/,/Directory/Of/Exp2/'
outDir '/Output/Directory/'
# TEMP deterministic namelist (ICON-D2 settings)
varnoContinuous 'Z,T,TD,RH,FF,DD'
subdomains 'ALL,GER'
filePattern 'verTEMP.'
timeSteps '0'
timeBreaks '0,-180'
veri_run_type '0,2'
veri_run_class '0,2'
state '0,1,5'
r_state '0,1,5'
sigTest 'T'
expType 'standard'
experiment 'T'
expIds 'Exp1,Exp2'
expDescr 'Exp1,Exp2'
fileDescr 'Exp_Exp1_vs_Exp2_username'
fdbkDirs '/Directory/Of/Exp1/,/Directory/Of/Exp2/'
outDir '/Output/Directory/'
# SYNOP EPS namelist (ICON-D2 settings)
varnoContinuous 'T2M,RH,RH2M,PS,TD2M,FF,RAD_GL_1h,RAD_DF_1h,TMIN_12h,TMAX_12h,N,N_L,N_M,N_H,GUST_1h,RR_1h'
catthresholds list('GUST_1h'=c(5,10,15),'RR_1h'=c(0.1,1,2))
subdomains 'ALL,GER'
filePattern 'vepSYNOP'
timeSteps '0,-60,-120,-180'
timeBreaks '0,-30,-90,-150,-180'
veri_run_type '0,2'
veri_run_class '0,3'
state '0,1,5,7,11'
r_state '0,1,5,7,11'
sigTest 'T'
iniTimes '0,12'
expType 'standard'
experiment 'T'
expIds 'Exp,Ref'
expDescr 'Exp_Ref_username'
fileDescr 'Exp_Ref_username'
fdbkDirs '/Directory/Of/Exp/,/Directory/Of/Ref/'
outDir '/Output/Directory/'
whitelists '/path/to/whitelist_clouds'
blacklists '/path/to/blacklist'
nMembers '20,40'
# TEMP EPS namelist (ICON-D2 settings)
varnoContinuous 'Z,T,RH,U,V,QV,QV_N'
subdomains 'ALL,GER'
filePattern 'vepTEMP'
customLevels '1000,950,900,850,800,750,700,650,600,550,500,450,400,350,300,250,200,150,100'
timeSteps '0'
timeBreaks '0,-180'
veri_run_type '0,3'
veri_run_class '0,2'
state '0,1,5'
r_state '0,1,5'
sigTest 'T'
iniTimes '0,12'
expType 'standard'
experiment 'T'
expIds 'Exp,Ref'
expDescr 'Exp_Ref_username'
fileDescr 'Exp_Ref_username'
fdbkDirs '/Directory/Of/Exp/,/Directory/Of/Ref/'
outDir '/Output/Directory/'
whitelists '/path/to/whitelist_clouds'
blacklists '/path/to/blacklist'
nMembers '20,40'
The verification allows for an arbitrary number of experiments
(>0) to be verified simultaneously. Mandatory is the namelist key
expIds
that can take on or more comma separated values (e.g
exp1,epx2,exp3
for 3 experiments). The experiment names
chosen in expIds
will be used by the verification to name
the experiment. Correspondingly, also mandatory, is the specification of
directories that contain the ff of the experiments. This happens with
the namelist key fdbkDirs
that has as many values as
expIds
in the same order!
The time granularity of the observations has to be set by the user.
The time settings have to be given relative to the valid time of the
feedback file. The relevant namelist keys are timeSteps
and
timeBreaks
. The values for these keys have to be given in
minutes. In the namelist example the settings used for the routine
verification of ICON-D2 at DWD are given. The feedback files in this
case are available in intervals of 3 hours. They contain observations
from a 3 hour interval prior to the valid date of a feedback file. To
get a hourly verification from the 3 hourly feedback files the
observations are binned. The bins are labeled by timeSteps
,
in this case 0,-60,-120,-180 minutes relative to the valid date of the
feeback file. The bins range is defined by timeBreaks
, in
this case with 0,-30,-90,-150,-180 (O included, 180 not included). The
length of the breaks has to be length of steps + 1. The binning can
result overlapping valid times. The verification will aggregate this
correctly. The user has to take care of observations no being used
double. Like this a fine time granularity can be achieved even if a
feedback file covers a longer period. For all users it is recommended to
have a look at the feedback files to set the time parameters
correctly.
The namelist keys veri_run_type
and
veri_run_class
are used to select the type of analysis and
forecast. The relevant values can be found in the tables of the feedback
file documentation. The class is typically set to 0,2 causing the
verification of model data from the main run and the analysis run. With
type 0 forecast are used. With type >=1 an analysis is used. The
verification can handle only one kind of analysis at once, i.e. the user
has to make sure that only one of the available analysis is used by
specifying run_type and run_class correctly. Hence
veri_run_type
should have only one entry besides 0.
The namelist keys state
and r_state
are
used to include or exclude observations an reports based on their
quality flag. The allowed values can be found in the feedback file
documentations. Recommended values to use all not rejected observations
are 0,1,5,11 for both. State 7 (rejected) is set for LAM verification in
order to use surface wind observations >100m (will become obsolete as
soon as those wind observations are no longer rejected in
assimilation).
The verification period is a result of the found ff. If the
mandatory namelist key experiment
is set to
TRUE
, than only those ff that have a matching file in all
experiment directories are used by the verification. If the mandatory
namelist key experiment
is set to FALSE
, than
all ff will be used by the verification, no matter if there are matching
files in the other experiment directories. The recommended way, if the
verification period should be shorter than the period with ff, then
experiment
should be TRUE
and at least for one
experiment the not wanted ff should be removed from the directory.
Also the namelist takes the key startValDate
and
stopValDate
that can be used to trim the verification
period by valid date. These keys take values of the type
YYYYMMDDHH
, which only works if the ff are named like
*YYYYMMDDHH
!
Additionally the verification period can be trimmed for the forecast
initialization date. Therefore the keys startIniDate
and
stopIniDate
can be used with values like
YYYYMMDDHH
.
Variables used for verification are set using the namelist key
varno
. For variables not in this list, no scores will be
calculated. The deterministic SYNOP verification deviates from this
syntax. Here the namelist keys varnoContinuous
pecthresholds
, catthresholds
and
cat2thresholds
define which variables will be used. In case
of a SYNOP namelist, the variables have to be provided with their name
(ff definition) in case of a TEMP namelist, the variables have to be
provides with their number (ff definition)
Using thresholds to perform a categorical or probabilistic
verification is only possible with the SYNOP verification. In case of a
deterministic forecast, a threshold will result in categorical
(contingency table based) scores. The user has the possibility to define
thresholds or uncertainty limits for all variables. A simple threshold
can be set with the namelist key catthresholds
that takes a
R list as argument (see example SYNOP namelist). Additionally, if the
namlist key pecthresholds
is set, hit rates (percent
correct forecast) for the forecast to hit the observation within the
given limits are calculated. The Ensemble verification evaluates the
namelist key thresholds
that is following the same syntax
as the key for the categorical verification.
By default the verification will use all forecasts from all runs that are contained in the ff. If wanted, the user can restrict this using the following options.
The namelist key iniTimes
can be used to evaluate only
those forecast runs with initial times as given in the value. The value
can be on or more comma separated initial times, e.g. using
0,12
will result in a verification of the 00UTC and 12UTC
runs only.
Members can be selected using the namelist key
veri_ens_member
. With this, a single member verification of
an ensemble prediction system can be performed or the ensemble
verification can be restricted to a subset of members. The namelist key
veri_ens_member
can be used to select one or more members
as comma separated list (e.g. 1,2,3,4,5
for member 1-5). In
the namelist for the deterministic verification, the keys
expIds
and fdbkDirs
have to be repeated
accordingly (e.g. 5x if 5 members are selected) The
veri_ens_member
key is also evaluated in the ensemble
verification, here however the keys expIds
and
fdbkDirs
should not repeat.
The namelist keys veri_run_type
and
veri_run_class
are used to select the type of analysis and
forecast. The relevant values can be found in the tables of the feedback
file documentation. The class is typically set to 0,2 causing the
verification of model data from the main run and the analysis run. With
type 0 forecast are used. With type >=1 an analysis is used. The
verification can handle only one kind of analysis at once. Hence
veri_run_type
should have only one entry besides 0.
There is no possibility to restrict the number of scores calculated.
If two or more experiments are verified simultaneously, the used set
of observation/forecast pairs will be the same for all experiments. This
allows for true comparability of the scores from the different
experiments. The alignment basically checks for station ID and level and
used the observation and forecast if they are available for all
experiments. The value of the observation is explicitly not used for the
alignment, i.e. different experiments may use observations of different
values (e.g. due to a bias correction). The data alignent can be turned
off by setting the namelist key alignObs
to
FALSE
.
The mandatory namelist key subdomains
can be used to
further divide the verification domain in sub-domains. If this is not
wanted, its value should be set to ALL
Possible further
values are
domainTable
, either polygons or stations. Values
SYNOP
or WMO
are overridden!The namelist key domainTable
points to an ascii file
that contains either polygon or station ID information for one or more
sub-domains. Example syntax for a domain stratification using
domainTable
# Example polygon domain table
name lon lat
NORD 8 50.001
NORD 15 50.001
NORD 15 55
NORD 8 55
SUED 8 45
SUED 15 45
SUED 15 50
SUED 8 50
or
# Example station domain table
name id
DE Q887
DE 10837
DE 10184
CH 06670
CH 06612
CH 06610
Important to note:
One or several user defined whitelists can be used to filter
observations by variable. Name and path of the whitelists have to be
specified in the namelist using the whitelists
key. If more
than one whitelists should be used, the need to be separated by comma in
the namelist value. The whitelist has to have to named columns. The
first column named “statid” gives the quoted station ID. The second
column named “varno” gives the quoted variable name (see ff
documentation). By default all observations are used, only if a variable
name is part of a whitelist it is filtered accordingly.
# Example whitelist causing to use total cloud cover and high clouds from 3 stations only, excluding the rest.
"statid","varno"
"07024","N"
"07412","N"
"07134","N"
"07791","N_H"
"07770","N_H"
"07666","N_H"
Backlists can be used to exclude suspect observations from stations.
# Example blacklist removing RH observations from 3 stations.
"statid","varno"
"G334","RH"
"G315","RH"
"G297","RH"
This is so far only implemented in the deterministic and ensemble SYNOP verification.
# Example conditions on cloud cover observations and forecast error
# NOTE: Conditions must not overlap!!!
condition1 "list(N='obs==0',N='abs(veri_data-obs)<1')"
condition2 "list(N='obs==0',N='abs(veri_data-obs)>=1')"
condition3 "list(N='obs%between%c(1,4)',N='abs(veri_data-obs)<1')"
condition4 "list(N='obs%between%c(1,4)',N='abs(veri_data-obs)>=1')"
condition5 "list(N='obs%between%c(5,7)',N='abs(veri_data-obs)<1')"
condition6 "list(N='obs%between%c(5,7)',N='abs(veri_data-obs)>=1')"
condition7 "list(N='obs==8',N='abs(veri_data-obs)<1')"
condition8 "list(N='obs==8',N='abs(veri_data-obs)>=1')"
Features:
%between%
as a
shortcut for (x >= left & x <= right)
that can
cause overlapping conditions.Hints:
Example:
Verification results for the 8 example conditions
A list of dates (YYYYMMDD) can be used to further divide the
verification results in 2 subsections. The list contains all the dates
in one subsection, all the other dates of the verification period will
be attributed to the other section. This refinement happens in the
aggregation part, where the experiment name is extended by a + (dates in
dateList) or - (dates not in dateList). A possible applications can be
the division of the verification period by weather regime. To utilize a
datelist use the namelist key dateLlist
with the name of
the list as value (including path).
# Example dateList
# All dates below will be used for subset +, all other dates will be in subset -.
20170410
20170413
20170415
20170417
20170422
20170423
20170425
20170429
20170509
20170512
Significance test for score differences between 2 experiments can be
performed optionally by setting the namelist key sigTest
to
TRUE
. So far, this is implemented for the domain average,
deterministic SYNOP and TEMP verification as well as the domain average
ensemble TEMP verification. The significance test is executed in the
aggregation scripts. This test (t-test) searches for a significant
difference in the mean of the distributions of scores from 2 experiments
within the verification period. The significance level of the test is
set to 95%. To avoid autocorrelation, each run time is evaluated
individually. Testing of all scores for all possible combination of
experiments can take a significant amount of time.
Example:
Verification results showing the results of the significance test by differently coloured circles. Red: significant difference, Gray: no significant difference, White: no test possible. The test results are shown only for scores of a specific ini-time and two experiments.
The verification distinguishes between to types, namely standard and
hindcast experiments. Standard experiments consist of several runs, each
with a certain lead-time (e.g. one month of ICON-D2 runs with 27 hours).
Hindcast experiments consist of one run with a very long lead-time
(e.g. one ICON-D2 run with a 744 hour forecast). Technically this does
not make any difference in the verification except that in hindcast mode
the lead-time is mapped to the time of the day. Hindcast experiments
therefore appear as experiments with lead-time from 1h-24h in the
verification. The type of verification has to be set by nameslist with
the key expType
that can take the values
standard
or singlerun
.
The members from an EPS can be verified deterministically by some
namelist modifcations and subsequently calling the deterministic
verification scripts. First the namlist key filePattern
has
to be changed so that ensemble feedback files are found (e.g. from
verSYNOP
to vepSYNOP
). In order to not
interfere with the ensemble verification make sure that the namelist key
expDescr
differs in the deterministic and the ensemble
namelist. The namelist key expIDs
has to be repeated so
that each member is given an individual name
(e.g. mem1,mem2,mem3,...
). The namelist key
fdbkDirs
also needs to be repeated for each member. The
nameilst key veri_ens_member
needs to be introduced and
give the member number for each expID (e.g. 1,2,3,...
). In
that manner the single member verification can also be performed for
more than one experiment.
Example
Verification results of a single member evaluation of one experiment (of 20 members)
Some examples on how namelist keys can be used to utilize different members for deterministic or ensemble verification tasks.
Deterministic verification of 2 EPS experiments using only member 1-3
expIds 'exp1_m1,exp1_m2,exp1_m3,exp2_m1,exp2_m2,exp2_m3'
fdbkDirs '/path/exp1/ff/,/path/exp1/ff/,/path/exp1/ff/,/path/exp2/ff/,/path/exp2/ff/,/path/exp2/ff/'
veri_ens_member '1,2,3,1,2,3'
Ensemble verification of 2 EPS experiments using only members 1-3 of each
expIds 'exp1,exp2'
fdbkDirs '/path/exp1/ff/,/path/exp2/ff/'
veri_ens_member '1,2,3,1,2,3'
nMembers '3,3'
Ensemble verification of 1 EPS experiments with different member subsets
expIds 'exp1a,exp1b'
fdbkDirs '/path/exp1/ff/,/path/exp1/ff/'
veri_ens_member '1,2,3,10,11,12'
nMembers '3,3'
Observations might be subject to bias correction. In this case it
might be relevant to verify against observation with ans without the
bias correction applied. The verification can take account of that by
setting the namelist key useObsBias
to TRUE
.
In this case, the individual observation bias corrections will be made
undone. The scores will be computed against the bias corrected
observation and against the raw observation. Both variants will show up
in the score file and in the visualization. This option is, so far,
implemented for the SYNOP deterministic and ensemble verification.
Example
Verification against bias corrected T2M and raw T2M_raw observations
The verification output is saved in the directory set with the
namelist key outDir
. The optional keys
shinyServer
and shinyAppPath
are meant to copy
(scp) the final score files to the correct shiny application directory.
A passwordless ssh login from the verification server to the shiny app
server is recommended in order for scp to work without interruption
(instructions: http://www.linuxproblem.org/art_9.html). If only
shinyAppPath
is given it is assumed that verification and
shiny apps run on the same server and the final score files are moved to
‘shinyAppPath’ by cp not scp (i.e. no passwordless ssh login is
required).
The domain average verification produces intermediate score files, one for each ff. These files are used by the aggregation script, which should return one or more final score files.
The station based verification scripts produce one finale score directly from the ff. There is no intermediate file production.
# Deterministic SYNOP/TEMP DET/EPS verification by valid-date, 6 dates in parallel
Rscript starter_scores_by_date.R namelist SYNOP DET 6
Rscript starter_scores_by_date.R namelist TEMP DET 6
Rscript starter_scores_by_date.R namelist SYNOP EPS 6
Rscript starter_scores_by_date.R namelist TEMP EPS 6
# Aggregation of determ. SYNOP/TEMP scores (only file reading is parallelized)
Rscript starter_aggregate.R namelist SYNOP DET 6
Rscript starter_aggregate.R namelist TEMP DET 6
Rscript starter_aggregate.R namelist SYNOP EPS 6
Rscript starter_aggregate.R namelist TEMP EPS 6
# Station based SYNOP/TEMP verification
Rscript starter_scores_by_station.R namelist SYNOP DET CONT 6
Rscript starter_scores_by_station.R namelist SYNOP DET CAT 6
Rscript starter_scores_by_station.R namelist TEMP DET CONT 6
Rscript starter_scores_by_station.R namelist SYNOP EPS CONT 6
Rscript starter_scores_by_station.R namelist TEMP EPS CONT 6
A general rule of the ff based verification is that all forecast/observation pairs that fullfill the default or user defined demands are used in the verification. This is mainly due to the assumption that the qualtiy control through data assimilation is sufficient. However, some exceptions exist that are listed here.
There is an option to mimic the quality check as performed within
VERSUS with the ff verification. This can be done by setting the
namelist key mimicVersus
to TRUE
. In this case
the qualtiy flags contained in the ff will be ignored, i.e. also
observations rejected by data assimilation will be used. Observations
will only be discarded if
|fcst-obs| >= 50 m/s for surface windspeed
|fcst-obs| >= 25 hPa for surface pressure
|fcst-obs| >= 30 K for 2m dew-point or 2m temperature
There is no further quality control of in the TEMP verification.
The visualization is done by shiny (http://shiny.rstudio.com) applications. Shiny applications basically are R scripts (ui.R and server.R) that can started from within a R session or with the shiny-server. It is recommended to use shiny-server, in order to ensure the permanent availability of the applications. The free version of shiny-server has been found to perform very well in the case of providing verification results to an entire meteorological office. A detailed description of the shiny application is not (yet) part of this document. To accommodate yourself with the concept and possibilities of shiny visit https://shiny.rstudio.com/gallery/
The motivation behind using shiny applications for visualization is the high flexibility in browsing and composing the numerous results of the verification. Instead of producing a plot for every calculated score, the desired plot is generated interactively, on demand. Besides that, smaller data modifications can be performed within the app (e.g. differences between scores, blending out/in of data, smoothing, etc.) The applications can be used rather intuitively. Generally the user has to decide for and load a score file from a selection. Than, the user can decide for the run(s), valid-time(s), variable(s), score(s), subdomain(s), height interval(s), experiment(s) and the according plot is generated on the fly. The apps generally have a “Bookmark” button that allows to reload the app with the current settings. Plots can be exported by copy/paste or by right-click & save as png. Besides the score plot a summary plot is provided serves as a short summary of the experiments outcome. Also the data behind the plot can be exported in different formats from inside the app.
The drawback of this interactivity is a small temporal delay due to data handling an rendering, in most cases however, plot generation is sufficiently quick.Example of the deterministic SYNOP verification visualization application
To get a first impression (outside DWD) on http://www.cosmo-model.org/shiny/users/fdbk/
The final score files saved in the Rdata (binary) format (https://bookdown.org/ndphillips/YaRrr/rdata-files.html). They files contain one or more tables of scores. The tables can be loaded in R, saved in various other formats and then be used by with other plotting tools. For faster data handling in R it is recommended to load the data.table package in beforehand.
# Loading a verification output file in R
require(data.table)
load("/path/to/scoreFileName")
# Print file structure
str(scores)
filePattern
key in the namelist so that the EPS files are
found.filePattern
search patterns can be expanded using the
AND operator, e.g. to find files from one season use
filePattern 'verSYNOP.202312|verSYNOP.202401|verSYNOP.202402'
Download the required sp shape file (level 0) from https://gadm.org/download_country_v3.html
require(data.table)
# load shape file, example Romania
shapeFile=readRDS("~/gadm36_ROU_0_sp.rds")
# Romania seems to have 4 individual polygons, the main one is number 4 (others might be islands)
regions = shapeFile@polygons[[1]]@plotOrder
# Bind all regions to a single table, regions separated by NAs
poly=c()
for (region in regions){
poly = rbind(poly,shapeFile@polygons[[1]]@Polygons[[region]]@coords)
poly = rbind(poly,c(NA,NA))
print(dim(poly))
}
#Test
#plot(poly,type="l")
#Save to FFV conform domain table
DT = data.table(name="Romania",lon=poly[,1],lat=poly[,2])
fwrite(DT,file="~/polyRomania.txt",sep="\t")
The verification offers a set of pre-defined subdomains, they can be used to further stratify the scores. The use is optional, the user can substitute these subdomains with her own definition (see here).
Namelist entry: sudomains SYNOP
Namelist entry: sudomains WMO
TEMP sites used at DWD in the ICON domain on an arbitrary day
Short Name | Long Name | varno |
---|---|---|
PS | surface pressure [Pa] | 110 |
T2M | 2 metre temperature [K] | 39 |
T | upper air temperature [K] | 2 |
TD2M | 2 metre dew point [K] | 40 |
TD | uper air dew point [K] | 59 |
RH2M | 2 metre relative humidity [0..1] | 58 |
RH | upper air relative humidity [0..1] | 29 |
FF | wind speed [m/s] | 112 |
U10M | 10 metre u-wind speed [m/s] | 41 |
U | upper air u-wind speed [m/s] | 3 |
V10M | 10 metre v-wind speed [m/s] | 42 |
V | upper air v-wind speed [m/s] | 4 |
DD | wind direction [deg.] | 110 |
GUST_Xh | max. wind gust in the last X hour(s) [m/s] | 242 |
RR_Xh | total precipitation in the las X hour(s) [mm] | 80 |
N(_L/M/H) | cloud cover (total, low, mid, high) [oct.] | 91/67/93/94 |
JJ_Xh | maximum temperature in the last X hour(s) [K] | 81 |
TMIN_Xh | minimum temperature in the last X hour(s) [K] | 243 |
RAD_DF_Xh | diffuse radiation in the last X hour(s) [J/m^2] | 238 |
RAD_GL_Xh | global radiation in the last X hour(s) [J/m^2] | 237 |
Observation sites used at DWD in the COSMO-D2 domain on an arbitrary day
Observation sites used at DWD in the ICON domain on an arbitrary day
Observation sites used at DWD in the ICON-EU nest domain on an arbitrary day