Priority project introduction

Priority Project "KENDA"
Km-Scale Ensemble-Based Data Assimilation

Last updated: 6 Jun 2012

Project leader: Christoph Schraff (DWD)

Introduction

The aim of the project is to develop a novel ensemble-based data assimilation system for the convective scale (i.e. 1 - 3 km model mesh-size) and to show that it works scientifically and gives a systematic positive impact (compared to nudging), in particular in convective situations, but also for low stratus conditions and near steep orography. The system has to be able to provide the initial conditions for convective-scale ensemble forecasting.

Two approaches have been envisaged at the beginning of the project, that is the Local Ensemble Transform Kalman Filter (LETKF, see Hunt et al., 2007) and the Sequential Importance Resampling (SIR) filter (van Leeuwen, 2003). The distribution of the resources should depend on the relevance of non-Gaussianity. In some preliminary investigation, only moderate deviations from Gaussianity have been found in most observation minus forecast statistics. Therefore, all the resources from the weather services in COSMO are being devoted to the LETKF in view of the fact that for this method, successful meteorological data assimilation applications exist and less practical problems are expected than for the SIR approach. Then, the more basic research required for the latter should rely mainly on resources from cooperating universities and research institutes.

Motivation

Our strategy for (very) short-range NWP in the coming years is to deliver not only deterministic forecasts, but a representation of the probability density function (PDF) for the atmospheric state, e.g. in the form of probabilities assigned to members of an ensemble running at a mesh-size of about 1 - 3 km. Furthermore, the use of indirect observations at high frequency is considered to become more important in the future.

Compared to larger scales, several conditions relevant for data assimilation are much more predominant in the convective scale. These include non-Gaussian PDFs, flow-dependent and poorly-known balance, and strong non-linearity. Therefore, it is considered more appropriate for the future to develop a separate data assimilation scheme for the convective scale and use a potentially different (but in practice similar) approach for a generalised system for global and regional modelling. Such a system combining global and regional modelling is being developed in the form of a global non-hydrostatic model with regional grid refinement at DWD and MPI Hamburg in the project ICON. Its analysis component will be based a hybrid 3DVAR (PSAS - physical space analysis system) - LETKF system.

Thus, an alternative data assimilation technique appropriate for the convective scale needs to be developed. However, the problem of convective-scale data assimilation is far from being solved nowadays. The main efforts by other groups and weather services in Europe are directed mainly towards 3D- and 4DVAR, but it is not clear how well this will work on these scales. For sure, developing a new 4DVAR system would require huge efforts. In the light of the human resources in COSMO, which are more limited than within MeteoFrance/HIRLAM/HARMONIE, let alone the UK Met Office, embarking on this approach would imply the risk of always lagging behind the other groups.

More recently, ensemble approaches, and in particular (variants of) the Ensemble Kalman Filter, have received increased attention (above all in USA). They usually require significantly less resources for development. Moreover, they can naturally be used to provide the initial conditions for convective-scale Ensemble prediction systems and are therefore much better suited for forecasting and delivering representations of the PDFs. By embarking on this approach, COSMO can attain a strong position within Europe with a chance for leadership in this area.

Actions proposed

Potentially, two methods will be assessed, that is the Local Ensemble Transform Kalman Filter (LETKF, see Hunt et al., 2007) and the Sequential Importance Resampling (SIR) filter (van Leeuwen, 2003). The role of non-Gaussianity will be investigated at the beginning of the project, and unless non-Gaussianity is expected to impede the success of the LETKF approach, the main focus for COSMO resources will be on the development of an LETKF system. The more basic research required for the SIR should then rely (mainly) on resources from cooperating universities and research institutes.

The SIR filter

SIR filtering (van Leeuwen, 2003) is a novel Ensemble or Monte Carlo type approach. It uses an ensemble of very-short range forecasts and selects the most likely members by comparing them to observations (Bayesian weighting). From the selected members, a new ensemble is created for the next analysis time. In its pure form, no direct modification of model fields by observations is required in this method. The filter consists of the following steps:

Take an ensemble together with a prior probability density function (PDF)
Find the distance of each member to the observations (using any "norm" or forward operator; neither inverse nor tangent linear approximations of operators are required)
Combine the prior PDF with the distance to the observations to obtain a posterior PDF
Construct a new ensemble reflecting the posterior PDF (importance re-sampling)
Integrate to the next observation time

While it has been tested successively for strongly non-linear oceanographic flow (van Leeuwen, 2003), it has not yet been applied to NWP. By design, it avoids some of the shortcomings of the more standard assimilation methods and can in principle handle the major challenges on the convective scale:

non-Gaussian PDFs
flow-dependent and unknown balance
model errors
highly non-linear processes (for which tangent linear approximation need not be valid)
direct and indirect observations with highly non-linear and complex observation operators and norms

Of particular interest is a localised approach of the method (called LSIR). Here, the Bayesian weighting and updating (re-sampling) is done separately for each grid point using the nearby observations, rather than globally for the whole ensemble member at once. This allows to significantly reduce the number of required ensemble members, however at the price of having to glue different ensemble members together, and this can easily result in imbalances.

The LETKF

In the LETKF (Hunt et al., 2007), an ensemble of forecasts is used to represent a situation-dependent background error covariance. This is implicitly deployed then in three ways taking into account the observations and their errors. Firstly, the analysis mean state (best linear unbiased estimate) is equal to the ensemble mean forecast plus a weighted sum of the forecast perturbations (i.e. the deviations of the individual ensemble forecasts from the ensemble mean forecast) where the weights depend on the deviations of the ensemble members to the observations. Secondly, the analysis covariance is calculated. Thirdly, the analysis perturbations are determined as a linear combination of the forecast perturbations such that they reflect the analysis covariance. Asynoptic, high-frequency, and indirect observations can also be accounted for.

Localisation, e.g. by using only observations in the vicinity of a certain grid point, requires the ensemble to represent uncertainty in only a rather lower-dimensional local unstable sub-space. This reduces the sampling errors (i.e. the errors in sampling the forecast errors) and rank deficiency (which expresses that due to the limited ensemble size, the ensemble is not able to explore the complete space of uncertainty in general). Note that the method provides a framework how to make smooth transitions between different linear combinations of the ensemble members in different regions (e.g. by gradually increasing the specified observation errors with increasing distance from the observation). This should allow to keep both the ensemble size and the imbalances caused by localisation reasonably small.

Compared to the SIR approach, the main drawback of the LETKF is that it is a linear scheme and assumes Gaussian error distributions. Apart from that, the LETKF seems clearly more suitable to the high-dimensional problem of (convective-scale) data assimilation for NWP.

Scientific risks

Besides a general risk of insufficient human resources, there are the following scientific risks:

Ensemble size (particularly SIR)

With the possibility of multi-modal distributions, the number of ensemble members required to properly sample the PDF can become large. The required computer resources may become too large in practice. In the ETKF, the problem is alleviated in the sense that it takes the Gaussian assumption and thereby reduces the ensemble size to sample that PDF. Localisation approaches increase the effective ensemble size, but may pose balance problems (mainly in LSIR). For the LETKF, an ensemble size of 20 may be too small, but a size of 40 is expected to be sufficient to render good results (Chris Snyder, personal communication at the COSMO GM at Athens). A larger size, however, would be more optimal to describe and account for the strongly situation-dependent error covariances.

Filter divergence (mainly SIR)

As in the pure SIR filter, no corrections by observations to the ensemble members are applied, there is the potential danger of all ensemble members drifting away from reality. Although the filter will notice this, there is no immediate way to bring the ensembles back on track. Because of the strong control of the interior solution of the convective-scale model on its limited domain by the lateral and lower boundary conditions, it is expected that this could be a transient problem at most. Should it happen, there is the possibility to include "nudged" members into the ensemble as a fall back position. Or, filter localisation reduces the problem to local drift. In the LETKF, the members are directly influenced by the observations, and ensemble drift would imply here (only) to sample incorrect background covariances and apply non-optimal analysis increments and analysis perturbations. Adaptive approaches should be tested to maintain a realistic amount of ensemble spread.

Incomplete use of observations (particularly SIR)

In the pure SIR, the total observational information is condensed to selecting appropriate ensemble members. By this, the high information density from remote sensing data such as radar data may be used far from completely. Localisation of the filter (LSIR) strongly increases the information density of the analysis step, however at the price of having to glue together different ensemble members, and this can easily cause imbalances. These imbalances are not such a big problem in the LETKF (unless the localisation scale is very small). However also in the LETKF, the available number of degrees freedom to fit the observations within the localisation scale in the analysis step does not exceed the number of ensemble members.

Non-Gaussian distributions (LETKF)

The LETKF makes the Gaussian assumption. The PDF"s and errors, however, are expected more non-Gaussian and multi-modal in the convective than in the larger scales. Yet, it is not clear how relevant it is to take this into account. Note that if the analysis system is able to closely follow the true (observed) trajectory by assimilating frequent observations in a very rapid update cycle, it is required to treat only the part of the attractor that is close to the observations, and the dynamics may not have enough time to become strongly non-linear and non-Gaussian.

Systematic model errors

Model bias is difficult to account for in data assimilation, and current operational schemes do not do this (except that bias corrections can be applied to observations in order to unbias the latter with respect to the model). In the presence of significant systematic model errors, analysis increments tend to be inferred to make up for these errors in the initial conditions at the observation locations (and for the observed variables). However, such increments are done for the wrong reason, and usually result in erroneous corrections to the initial state elsewhere. Thus, the model quality strongly influences the performance of the data assimilation, and this is particularly true for advanced data assimilation techniques like the ones proposed (or 4DVAR) which make strong use of the forecast model. Therefore, improving the model quality is as important as developing an advanced data assimilation system.

Links with other projects or activities

COSMO Priority Project "CONSENS": for the provision of an ensemble of perturbed lateral boundary conditions, possibly for the implementation of an ETKF
DWD project COSMO-DE-EPS: will deliver the infrastructure for generating forecast ensembles for COSMO-DE
DWD / MPI project "ICON": alternative option for perturbed lateral boundary conditions later on, by means of implementation of global ETKF / 3DVAR
COSMO Priority Project "UTCS": will improve the model physics and reduce systematic model errors
COSMO Priority Project "VERSUS2": for verification based on NetCDF feedback files.
DAQUA and ENSEMBLE projects within the DFG (German Research Foundation) program SPP1167; the DAQUA partners plan to set up a first version of a SIR system and assess it for test cases; within the ENSEMBLE project, an EnKF is developed for the global models GME / ICON
Hans-Ertel Centre (HErZ): LMU (University) of Munich, together with DLR (Deutsche Luft- und Raumfahrt) Oberpfaffenhofen, has become a research competence centre (HErZ) for data assimilation which is strongly supported by DWD funds. It has several projects which are related to KENDA:
- implementation and testing of a fast forward observation operator for SEVIRI VIS + NIR channels (KENDA task 4.4 is about IR channels)
- implementation and evaluation of the diagnostic methods proposed by Liu and Kalnay (2008), Liu et al. (2009) and Li et al. (2010) which quantify the analysis impact and the reduction of forecast error by individual observations
- assessment of the impact of perturbations from the initial condition, lateral boundary conditions, and physics (inc. stochastic physics); and assessment of the forecast sensitivity to different observations and its flow-dependency
- evaluation of robust state filters to treat non-linearity
(EUMETSAT fellowship at DWD: see task 4.4)

Project tasks

Task 1: General issues in the convective scale, evaluation of COSMO-DE-EPS
This task guides the subsequent decision on how the resources should be split between LETKF and SIR (COSMO-NWS and universities); it is part of the learning process.

Task 2: Implementation of an ensemble data assimilation framework
The technical implementation of a prototype LETKF on the convective scale will adopt the approach that the analysis step for the LETKF is in a program separate from the COSMO model. In fact, it is included as a module in the 3DVAR package so that the code shall be shared with the LETKF developed for GME / ICON (at least for test purposes; it is planned at least for GME / ICON to implement different LETKF versions).

Task 3: Evaluation and tuning of LETKF, comparison with nudging
Address primary scientific issues related to the LETKF on the convective scale and refine the system to the extent, that it runs stably and gives physically consistent results. At least in the early stage, only conventional in-situ data are used in this task.
(Or show that the limitations of the method (e.g. related to the Gaussian assumption) in the convective scale make the success unlikely and hence would suggest that further resources should be put elsewhere (e.g. in the SIR approach).)

Task 4: Inclusion of additional observations in LETKF
High-resolution and -frequency observational data have to be used in addition to conventional in-situ observations. Radar radial velocity and radar reflectivity are considered key data for the convective scale and mandatory to be used operationally KENDA, at least as for the radial winds from the beginning. Ground-based GPS data (preferably slant path delay data) and satellite-based information of cloud are seen very important, but not mandatory for operational KENDA from the beginning. The use of other satellite data are of a lesser importance for a convective-scale models with small domains mainly over land areas covered well by radar (but may still be added to KENDA at a later stage).
Note: All testing and tuning will be done in the framework implemented in task 2. This implies remote access to the experimentation system for cooperating partners.

Task 5: SIR filter: Development of core modules, evaluation of the system
General remark: The research and development for the SIR filter, at least for the time being, should take place mainly at cooperating universities and research institutes. The method is part of the COSMO strategy and therefore (formally) included in this project plan.

References:

Buizza, R., Miller. M., and Palmer, T.N.	1999	Stochastic representation of model uncertainties in the ECMWP ensemble prediction system. Q.J.R.Meteorol. Soc., 125, 2887 - 2908.
Hunt, B.R., Kostelich, E.J., and Szunyogh, I.	2007	Efficient data assimilation for spatiotemporal chaos: a Local Ensemble Transform Kalman Filter. Physica D, in press.
Li, H., J. Liu and E. Kalnay	2010	Correction of Estimating observation impact without adjoint model in an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 136, 1652-1654.
Liu, J., E. Kalnay, T. Miyoshi and C. Cardinali	2009	Analysis sensitivity calculation in an ensemble Kalman filter. . J. Roy. Meteor. Soc., 135, 1842-1851.
Liu, J. and E. Kalnay	2008	Estimating observation impact without an adjoint model in an ensemble Kalman filter. Quart. J. Roy. Meteor. Soc., 134, 1327-1335.
Palmer, T.N., Buizza. R., Doblas-Reyes, F., Jung, T., Leutbecher, M., Shutts, G.J., Steinheimer, M., and Weisheimer, A.	2009	Stochastic parameterization and model uncertainty. ECMWF Tech. Memo. 598.
Van Leeuwen, P.J.	2003	A variance minimizing filter for large scale applications. MWR, 131, 2071 2084

Priority Project "KENDA"Km-Scale Ensemble-Based Data Assimilation