Preprint

RC1: 'Comment on hess-2020-672', Anonymous Referee #1, 26 Feb 2021

Review

on the manuscript “Ensemble-based data assimilation of atmospheric boundary layer observations improves the soil moisture analysis” by Tobias Sebastian Finn, Gernot Geppert and Felix Ament

The manuscript “Ensemble-based data assimilation of atmospheric boundary layer observations improves the soil moisture analysis” is devoted to the very important and interesting topic of the comparison between two methods in the land-surface data assimilation (DA) in NWP, belonging to two families: of Extended Kalman Filters and of Ensemble Kalman Filters (namely, Simplified Extended Kalman Filter and Localized Ensemble Transform Kalman Filter). Currently in NWP, Extended Kalman Filters are widely used. However the family of Ensemble Kalman Filters arouses strong interest, due to the development of ensemble systems and using ensemble methods in the upper air DA. Also, there is a hope that ensemble methods will allow to develop the coupled DA, where forecast errors both in the land surface and atmospheric models will be corrected by observations in the surface layer (2-metre observations). Currently in NWP we have so-called weekly coupled systems, because assimilation of the 2-metre observations affects only the land surface model variables (soil moisture and temperature). In the manuscript, the attempt is made to to test feasibility of the ensemble DA methods for the land surface DA.

Unfortunately the manuscript is written so that it is impossible to understand it. The structure of the manuscript is illogical and even confusing. For example, a description of Simplified Extended Kalman Filter is given in Section 2.2.1, which is a subsection of Section 2 “Fully-coupled ensemble data assimilation framework”. Simplified Extended Kalman Filter is neither fully-coupled, nor ensemble. Very general descriptions of DA methods and using of ensembles in DA are given (Section 2.2 and beginning of Section 2.2.2), which is not needed. At the same time, for the important description of SEKF and LETKF (how they are applied? Which variables are in the state vector? What are observational operators and how they are linearized? Which variables are perturbed for the ensemble? How coupling in DA is provided?), only references are provided. Specific parameters of experiments, descriptions of different parts of the DA system, experiment setup, technical details, etc. - everything is mixed and scattered along the text. As a result, it takes too many efforts to read and to understand at least something.

Also, there are many places which are confusing or wrong. Authors are either too inaccurate, or even do not understand what they are doing. Examples are:

1) Lines 18-27. The authors mix physical coupling in the model and coupled DA. No effect of the 2m observations on the soil moisture happens when there is no physical coupling (!) between the soil and atmosphere. No physical coupling in the model, and also no in reality. For example, when there is a strong advection, or in very cloudy and windy conditions. Coupled DA will not help to correct the soil moisture from the 2m observations in these situations. But what coupled DA would do (what we expect from it) in these situations, it would correct atmospheric model lowest layer variables, for example cloudiness or wind.

2) Lines 37-39: “The soil moisture analysis is moreover often estimated in its own daily assimilation cycle in addition to assimilation cycles for the atmosphere on shorter, hourly-like, time-scales. To combine these assimilation cycles into one single cycle, EnKFs are one candidate because of their ensemble-based flow-dependency.” Impossible to understand. What is estimated? Why flow-dependency will help to combine cycles? Why flow-dependency is important to soil? In soil, there is no flow.

3) Lines 41-43: “We additionally test with this EnKF setup the hypothesis of hourly updating the soil moisture based on a flow-dependent coupling between land surface and atmosphere.” Impossible to understand. What is the hypothesis?

4) Lines 50-51: “Together with TerrSysMP, we perform idealized twin experiments, using the same system configuration for our nature run and our data assimilation experiments.” What is “nature run”? I guess, it is without DA. But this is only a guess …

5) Lines 52-53: “With this distilled setup, we are able to isolate the effect of perturbations within the soil moisture on the 2-metre-temperature without having model errors.” Absolutely impossible to understand.

6) Lines 74-75: “We further restrict our grid points to single plant functional types (PFTs) to simplify the setup.” How it is possible to restrict grid points to plant types???

7) Lines 109-110: “ This background forecast is updated at 00:00 UTC based on gridpoint observations at 12:00 UTC and Eq. (3).” Why you use this half day shift?

8) Lines 128-130: Impossible to understand. How Eq.3 can be used to estimate ensemble weights? There is nothing about weights in it. How the observation operator can be linearized around the ensemble mean? Linearization is calculating of a derivative. How the ensemble mean helps to calculate a derivative?

9) Lines 134-135: “These experiments are thus model-error-free ...” - no, they are initial state error free. Model errors remain.

10) Line 153: “Our data assimilation framework ...” - what is data assimilation framework? The code?

11) Line 155: “ … the background and first guess are read-in as output files from the models” - what is the difference between the background and first guess in your case?

12) Line 158: “We define a nature run (NATURE) as our truth in this study and to get our 2-metre-temperature observations.” Impossible to understand. You define a run without DA as a truth? How it can be?

13) Lines 168-169: “Our only perturbations in the atmosphere and soil are a result of initial soil moisture and soil temperature perturbations.” Impossible to understand. You have no perturbations in the atmosphere?

14) Lines 169-170: “A single run with a similar model configuration and a spin-up of 6 years builds the foundation for our initial conditions in the atmosphere and soil.” A model configuration similar to what? Why do you need so long spin-up?

15) Lines 171-173: “As horizontal correlation function, we use a truncated Gaussian kernel with a standard deviation of 14 grid points (14 km) and a truncation radius of 42 grid points. The same type of truncated Gaussian correlation is used in vertical dimensions with a standard deviation of 0.5 m and a truncation after 1 m.” How a dimension of a standard deviation may be “grid points” or “metres”??? The dimension of a standard deviation is the same as the dimension of the random variable, for which the standard deviation is defined. What is your random variable so that its standard deviation is defined in grid points???

I stop here. Too much work is needed, for at least approximate understanding.

My recommendation for the authors is to rewrite the manuscript totally, to elaborate the text and to re-submit it. Usually for these kind of manuscripts, the plan is the following:

1. Introduction. Why this study happens? What is the purpose?

2. Observations. Which observations are used for assimilation? Which observations are used for verification (or validation)?

3. Methods. Description of the model and DA methods (including observation operators), and how they are applied.

4. Experiment setup. Description of the domain and specific parameters of the DA system (or, other important parameters).

5. Results. Description of results, including criteria and methods, how to estimate them. What is better, what is worth? How verification (or validation) is performed?

6. Discussion. What is the applicability of the results? What are limitations of the study?

7. Conclusions. General conclusions and suggestions for future studies.

Citation: https://doi.org/10.5194/hess-2020-672-RC1

AC1: 'Comment on hess-2020-672', Tobias Sebastian Finn, 04 Mar 2021

We appreciate the review by referee #1 and are grateful for his useful insights into why the text currently appears illogical and difficult to understand. Here, we briefly respond to the major issues raised and we will provide a detailed response along with a revised manuscript later on.
We agree with the concerns raised with respect to comprehensibility and we have decided to improve the language, the technical details provided and revise completely the structure of the paper as follows:
We will reduce the use of jargon and provide explanations, e.g. for terms like "flow-dependency" in the context of data assimilation and "nature run" in the context of twin experiments
We will provide a new introduction section in which we outline the physical coupling between the land surface and the boundary layer and why boundary layer observations might be beneficial for the estimation of land surface states in data assimilation. We will elaborate more on the separated NWP data assimilation cycles for the atmosphere and land surface, as well as the concept of twin experiments with synthetic observations. We will additionally address our scientific questions that are answered in our study.
Furthermore, we will re-arrange the material in the current sections 2 and 3 into the following new sections with additional details:

Section 2 - Twin experiment case

Section 3 - Data assimilation methods

Section 4 - Experiments
Section 2 will be about the twin experiment case including a description of the model and the data used in the paper, what the nature run is and how synthetic observations are generated from this nature run. We will also talk about the general meteorological conditions in our simulated time period.
Section 3 will describe the used data assimilation methods and how we apply them together with observation operators. We will provide technical details and equations in a new appendix.
Section 4 will outline our experiments and related to this, details about the variables in the state vector, and we will additionally show how our ensemble is generated and constructed.
We believe that these changes will address the specific comments made by referee #1 and improve the readability of the paper significantly.

Citation: https://doi.org/10.5194/hess-2020-672-AC1

AC1: 'Comment on hess-2020-672', Tobias Sebastian Finn, 04 Mar 2021

We appreciate the review by referee #1 and are grateful for his useful insights into why the text currently appears illogical and difficult to understand. Here, we briefly respond to the major issues raised and we will provide a detailed response along with a revised manuscript later on.

We agree with the concerns raised with respect to comprehensibility and we have decided to improve the language, the technical details provided and revise completely the structure of the paper as follows:

We will reduce the use of jargon and provide explanations, e.g. for terms like "flow-dependency" in the context of data assimilation and "nature run" in the context of twin experiments

We will provide a new introduction section in which we outline the physical coupling between the land surface and the boundary layer and why boundary layer observations might be beneficial for the estimation of land surface states in data assimilation. We will elaborate more on the separated NWP data assimilation cycles for the atmosphere and land surface, as well as the concept of twin experiments with synthetic observations. We will additionally address our scientific questions that are answered in our study.

Furthermore, we will re-arrange the material in the current sections 2 and 3 into the following new sections with additional details:

Section 2 - Twin experiment case

Section 3 - Data assimilation methods

Section 4 - Experiments

Section 2 will be about the twin experiment case including a description of the model and the data used in the paper, what the nature run is and how synthetic observations are generated from this nature run. We will also talk about the general meteorological conditions in our simulated time period.

Section 3 will describe the used data assimilation methods and how we apply them together with observation operators. We will provide technical details and equations in a new appendix.

Section 4 will outline our experiments and related to this, details about the variables in the state vector, and we will additionally show how our ensemble is generated and constructed.

We believe that these changes will address the specific comments made by referee #1 and improve the readability of the paper significantly.

Citation: https://doi.org/10.5194/hess-2020-672-AC1

RC2:
'Reply on AC1', Anonymous Referee #1, 04 Mar 2021

Yes, I am satisfied. Good luck!

Citation: https://doi.org/10.5194/hess-2020-672-RC2
- AC3:
  'Reply on RC2', Tobias Sebastian Finn, 03 Jul 2021
  Additionally incorporating the comments by refree #2, we have decided to completely overhaul our manuscript to improve its clarity. As a first step, we will provide additional explanations to concepts that are needed within our manuscript.
  To be specific, we will rearange existing sections and create a new, hopefully more logical, structure:
  Introduction: We will provide a more gentle entry into the study, explaining the physical circumstances that allow to assimilate atmospheric boundary layer observations into the land surface. In addition, we will make the step from the simplified extended Kalman filter (SEKF) to the localized ensemble Kalman filter (LETKF) more explicit and explain more in detail what is needed for this step. In this introduction, we will clarify the scope of our manuscript as a proof-of-concept. Since the aspect of strongly-coupled data assimilation is only an additional result of our study, we will remove the last paragraph in favor for a more comprehensive introduction of the other parts.
  
  Twin experiments: We will introduce a new section explaining the idea behind our twin experiments. Here, we will shortly discuss the advantages and disadvantages of these experiments compared to “real” data assimilation experiments. In addition, we will explain how our model setup and the ensemble has been created, and how we have synthesized our observations from a single simulation called as our “nature” run. To show that the ensemble generation was successful, we will introduce a new figure that shows the error of the ensemble mean to the nature run and the ensemble spread for the atmospheric temperature in 10 meters height.
  
  Data assimilation: We will introduce the data assimilation concepts and elaborate more on the differences between SEKF and LETKF. Additional, we will shortly explain how these algorithms have been implemented into our data assimilation system. We will provide more technical details and equations in a new appendix.
  
  Experimental design: Here, we will shortly explain our experiments. We will specifically elaborate on the question what is assimilated where and on the differences between these experiments.
  
  Results: We will shorten this results section to streamline it and make it easier to understand.
  
  Discussion and Summary: Here and there, we will streamline this section to shorten the whole length of the manuscript. In addition, we will sketch further steps towards longer and more realistic experiments, needed to proof the applicability of the LETKF in real global land data assimilation systems.
  
  Conclusion: We will overthink the conclusions to improve the clarity of the implications of our results especially with regard to our idealized experiments.
  
  We hope that this new structure will improve the clarity of our study and manuscript. Its more thorough beginning then hopefully explains the necessary concepts for our study.
  
  Citation: https://doi.org/10.5194/hess-2020-672-AC3

RC3: 'Comment on hess-2020-672', Anonymous Referee #2, 26 May 2021

I recommend major revision of the manuscript before accepting it for publication.

The study assimilates synthetic 2-metre temperature observations into soil moisture in a fully-coupled limited-area model system for a seven-day period in Summer 2015. The study investigated many assimilations schemes and compared it against each other and also against the Nature run. The outcome of the study is interesting and explains certain land-atmosphere coupling aspects that are already known. The data assimilation experiments are detailed and explained. However, there are various weakness in the synthetic setup of the model and the assumptions made in the models initial conditions are unphysical, moreover the phenomenon of land-atmosphere interaction cannot be explained with a 7-days experiment window.

Major comments:

Setting up different scenarios with wet, moderate, and dry soil profile conditions for major hydroclimatic regions of the world is critical to highlight the importance and applicability of the work. With the current setup as described in Section 3.2 is totally unphysical. See the comment given in the manuscript.
The 7-days timeframe of the synthetic experiment is inadequate. To capture the whole gamut of the physical processes of land-atmosphere interactions a longer time window is critical that may cover different seasons. Otherwise synthetic experiments for different seasons are recommended with a longer timeframe.
Another suggestion is to validate the outcome the study with a real scenario in a hindcast experiment. As the authors using an operation system. This approach will validate the outcome of the data assimilation experiment and capability for future operational implementation with the incremental improvement in the analysis of the 10m air temperature.

I have recommendation to make the manuscript more relevant and easy to read and understand. Cut some flabs and curtail Section 4.

More remarks are provided in the manuscript as comments.

Citation: https://doi.org/10.5194/hess-2020-672-RC3

AC2: 'Reply on RC3', Tobias Sebastian Finn, 03 Jul 2021

Please see the attached supplementary file.

Citation: https://doi.org/10.5194/hess-2020-672-AC2