This paper presents a time series modelling-based (TSM-based) data preparation approach to a 30-year groundwater time series from the Netherlands. The TSM-based approached is evaluated for its reliability and potential for groundwater drought prediction.
The authors combine this evaluation of the data preparation method with a discussion on and analysis of the 2018-2019 groundwater drought in the southeastern Netherlands in terms of drought propagation and drivers of its spatial cohesion.
The topic of this work is of high interest to the research community, as it discusses important points on data reliability, methods to overcome these hindrances, and presents a timely and interesting overview of recent drought developments in the Netherlands. The manuscript follows a clear structure, and I commend the authors to its good overall structure and readability. Some improvements to the method description are required in regards to choosing thresholds and cut-off values, and adjustment of figures to allow for better comparison between measurement based SGIs and model-based SGIs (see specific comments below), but overall, I recommend this manuscript for publication after revisions.
Specific comments
Title/focus of the manuscript – The text itself is focussed on the TRM-based approach, while the title is drawing more attention to the drought analysis. I would reconsider the title to reflect better on the manuscript’s content. I am also not sure about the use of the term “data-based” approach (most studies are based on data?).
Figures – The figures are easy to read and support the conclusions. On Figure 9 though, it would be more useful to present the SGIs of the measured time series for the same months as for the simulated time series (Figure5).
Methods – The methods are clearly explained in principle, but in some instances, it would be helpful to elaborate a bit on why specific cut-offs were chosen. Please add some more information e.g. on:
L165 – ‘one or a few’, Did you use a specific threshold for ‘a few’? If yes, this should be added.
L175 – Please elaborate on which basis you decided on the cut-off at 20 cm.
L183 – As in L175, a bit more explanation on how procedures (in this case repeating outlier removal twice) would be helpful.
L131-132 – Why were those series with > 10 years of data selected? I would be helpful to elaborate a bit the cut-offs. Do the 10 years refer to the amount of datapoints, or to 10 years of consecutive data? Did you also consider a maximum length of allowed data gaps?
L204-206 – The selection of the time series could be explained a bit more in detail; e.g. why ‘around 120 series were selected at first.
L191 – 4 consecutive years? Also add why data from June-August 2018 was considered as particularly crucial.
L235 – Elaborate on why you chose the three-month-aggregated SPEI
L255 – Why did you chose 1/n instead of 1/2n?
Table1 – Please add the reference for drought classification. It might be good to discuss in the text, why this classification was chosen. In other literature (e.g. Svoboda, M., Hayes, M., and Wood, D.: Standardized precipitation index user guide, World Meteorological Organization, Geneva, Switzerland, 24 pp., 2012.) drought periods are only defined if the index is continuously negative and reaches an intensity of -1.0 or less.
L270 – Why January 1993?
L272 – Please add the type of regression analysis
L321 – Please reiterate based on what the dry, normal and wet conditions are defined.
Minor specific comments:
L10 – Could mention limited quantity along with limited data quality?
L13 – As later discussed in the text, the groundwater drought continued into 2020 in much of the area, so the term 2018-2019 groundwater drought could lead to some confusion. Maybe just leave it with 2018-2019 drought/ 2018-2019 meteorological drought.
L32 – Add in which regions droughts are expected to become more frequent.
L33 – ‘most hit by the weather extremes’, By those of the 2018-2019 drought or in general?
L54 – ‘Bloomfield et al, 2018’, you could add ‘Brauns, B., Cuba, D., Bloomfield, J. P., Hannah, D. M., Jackson, C., Marchant, B. P., ... & Schubert, G. (2020). The Groundwater Drought Initiative (GDI): analysing and understanding groundwater drought across Europe. Proceedings of the International Association of Hydrological Sciences, 383, 297-305. https://doi.org/10.5194/piahs-383-297-2020’ for full paper.
L61 – ‘in the Netherlands and elsewhere’, could be more specific. For which type of (hydrogeological?) settings is this particularly applicable?
L98 – ‘with some skill’, this is slightly vague. Maybe clarify if this depends on skills, or other factors (data availability and quality? Hydrogeological setting?)
L102 – ‘abnormal drought conditions’, do you mean during drought conditions in general, or during particular (abnormal) droughts?
L108 – ‘usefulness’ seems a little vague here, maybe ‘reliability’ or ‘accuracy’ could be a better term?
L110-114 – This may be a matter of taste, but you could also leave this paragraph out in my opinion.
L118 – ‘Higher elevation’, higher elevation than 30 m AMSL (if yes, what is the maximum), or ‘The higher elevations’?
L120 – How high is the precipitation surplus?
L120-122 – You could consider swapping the last two sentences of this paragraph (the abstractions tie in quite well with the afore-mentioned agricultural activities). Also, do you have any information if irrigation of the agricultural land is similar across the region, or are there some areas that have particularly high water demand (if yes, then this might contribute to the later discussion on groundwater drought development).
L129 – You could add the start of the time series (‘from XXX to 2019’)
L135 – What was the distribution of the weather stations? Are there more data-dense areas, or are they distributed quite homogenously?
L136 – If data from another station was used, what was the furthest distance to it, i.e. would you expect any impacts on data quality from this?
L148-149 – Also potentially overlain by abstractions?
L155 – ‘relocation of wells’; do you mean renaming?
L254 – ‘aggregated’ by averaging?
Table2 – Please spell out true/false positives/negatives or add to table description. From my perspective, it would also be easier to interpret the performance data is given for percent of the total time series rather than number of series. This would also tie in better with the manuscript’s main text. You could also consider
L235 – This is in contrast to the overall observation on positively bias during low groundwater levels?
Table3/4 – It would be beneficial here to add in the MAE as in Table4. In both tables, a standard deviation could be used instead of the range (though this might be a matter of taste).
L330 – It is later stated (e.g. L562) that the drought peaks in Oct/Nov, so in autumn. Please correct accordingly.
Figure3 – Minor y-axis breaks (e.g. quarterly) would be helpful
L340 – In which part of the study area did the heave rain occur?
Figure6 – Please add water table depth to caption (as you did in Figure5).
L401-405 – If the overestimations are larger in southern Noord-Brabant, is the prediction performance than really independent of the catchment characteristics?
L440 – Please add the approximate depths of ‘very deep groundwater tables’, as this is categorization is clearly very region/country-specific.
L444 – ‘large proportion’, how large in %?
L445 – Please give an indication of approximate thickness in m for ‘thick unsaturated zones’.
L445-446 – How long would you recommend?
L452-455 – Very nice discussion.
L515-517 – During which months did the drying up of the stream occur?
Technical corrections
L12 – This is very minor, but I ‘especially with’ would read a bit easier than ‘with especially’.
L35 – ‘getting rid of water surpluses’, ‘avoiding flooding’ might sound a bit more elegant.
L37 – ‘IenW’, supposed to be ‘Ien, W.’?
L39 – ‘related to deep drawdowns’, consider replacing with ‘declines in groundwater levels’
L58 – ‘done based on’, ‘made based on’ might be more elegant.
L65-67 – Consider splitting the sentence at the semicolon.
L82 (and others)– ‘Time series modelling’, this was already introduced as TSM. Use abbreviation throughout the manuscript for consistency.
L92 – ‘surface water influence’, consider rephrasing as groundwater-surface water interaction?
L116 – ‘Pleistoncene-era’, only ‘Pleistocene’ would be sufficient
L118 – ‘AMSL’, could be spelled on first use.
L148 – ‘yearly cycly’, ‘annual cycyle’ may be more common to use.
L150 – You might want to replace ‘shifts’ with ‘step changes’?
L197 – Needs spaces around larger/smaller signs (‘p < 0.05 and r2 > 0.15’)
L222 – Change to ‘’data becomes available’.
L285 – Revise to ‘on transforming a time series’ or ‘on the transformation of a time series’
L319 – I may have missed it, but I think ‘EVP’ was not previously spelled out.
L334 – ‘less dry conditions’, replace with ‘fewer dry conditions’
L351 – ‘also’ can be removed from the sentence
L360 – ’50 %’, remove space (‘50%’)
L361 – ‘found’ can be removed from the sentence.
L387- ’31 %’, remove space before ‘%’
L534 – Adjust ‘late, long-lasting’ to ‘later, longer-lasting’.
L557 – Adjust to ‘may be impossible to obtain with’
Review comments on the manuscript Spatiotemporal development of the 2018–2019 groundwater drought in the Netherlands: a data-based approach by E. Brakkee et al.
The manuscript presents an approach of using impulse-response time series modelling to estimate groundwater heads covering several regions in the Netherlands during a recent drought event. The model results are used to calculate the standardized groundwater level index (SGI) during the drought event in order to analyze the spatial variability, severity and development of the groundwater drought. The manuscript combines several data pre-processing steps, groundwater head estimation, drought analysis and drought estimation based on solely precipitation data for more than 2500 groundwater observation wells. The topic of the manuscript is overall interesting and important for both researchers and stakeholders, which – based on personal experience - in cold and temperate climate countries are still not always well aware of this topic.
Applying impulse-response time series modelling on so many groundwater observation wells in order to analyze the drought pattern on a regional scale is an impressive work. However, I am not really convinced by the necessity and the advantages of the presented approach in the analysis of a drought event. This is partly connected to missing or sometimes rather vague explanations/justifications partly to the approach itself. To overcome the first issue, it might be advisable to focus either on a) the analysis of the drought event or b) the drought prediction.
My detailed comments are listed below.
Main Comments
In general, I think impulse-response time series modelling can be a useful tool in the analysis of groundwater systems. As you point out, the approach is able to give site-specific information (e.g. response time) and might also be a powerful tool in forecasting groundwater heads. However, reading the manuscript I am still not convinced by the advantages of this approach in connection with the SGI in the analysis of a drought event. Most of the following statements are related to the question of how much information you gain or lose before calculation SGI values.
L177: Why do you use 20% as a cutoff, please elaborate.
L179: Data cleaning step 2 in combination of using the model results of PASTAS based on time series of precipitation and evapotranspiration to remove outliers homogenizes the time series. Don’t you lose specific features in the time series especially those related to dry conditions? Is this what you mean by ‘over-filtering’ (L211)?
L196-200: What are the reasons for ‘atypical behaviour’? Could site-specific characteristics, i.e. hydrogeology, play a role? In general, I think it would be good for the reader to see either some of the original groundwater head time series or the model outcome.
L201: Why do you use 60%? It seems to be very low. The distribution (histogram) of the models’ EVP would be interesting. It might also be interesting to compare time series with different EVPs, e.g. 60% and 80%. Where are the differences – extremes/timing etc.?
L254: How do you aggregate the daily data? Do you really need a time series lengths of 30 years in a daily resolution to calculate the SGI values for 2018? Is it not efficient enough to use time series with a coarser temporal resolution and a time series length shorter than 30 years? How many measured data sets out of the 2722 fulfill weaker requirements in terms of measurement frequency and length, e.g. at least weekly data from the past 20 years? How would the SGI values (using normal score transformation) and their spatial distribution look like for these time series. How much extra information do you get of using your approach? This kind of analysis would be interesting in order to show the advantages of this approach.
Secondary Comments
L159: How do you define short-term disturbance? Depending on the aquifer system, human disturbance is able to result or magnify groundwater droughts, for example by increased short-term(?) water consumption as a consequence of high temperatures/evapotranspiration rates, or?
L188: Why do you use a threshold of 4 years missing data?
L211: some errors, overremoved are very vague expressions. Please clarify.
L212: over-filtering? Please elaborate or even better show an example.
L221: What about factors influencing groundwater recharge on different time scales, e.g. long-lasting frost seasons, land-use change, or anthropogenic influences changing the boundary conditions, e.g. pilling, water abstraction?
L255: Is this a typo or do you intentionally use different values than the ones in Bloomfield and Marchant (2013): 1/(2n)?
Table 2: Add abbreviation in section 3.1 or even better, write the words out here. You might consider adding percentage here (in brackets).
L321: Are the terms dry, normal and wet conditions based on the percentiles (L227-229)? How do you define ‘more extreme groundwater levels’?
Figure 3: Why do you show midpoints instead of regions? How is the midpoint defined?
L390-395: 3 cm sounds indeed acceptable but an average absolute SGI error of 0.34...What does this mean for the prediction of droughts using the presented method also in terms of thresholds etc. for stakeholders? Remember: The 2018 drought was an extreme drought.
L404: Please elaborate. Is depth to groundwater or vadose zone thickness not a site-specific characteristic?
The manuscript “Spatiotemporal development of the 2018–2019 groundwater drought in the Netherlands: a data-based approach” by Brakkee et al. presents an application of time series modeling of simulate groundwater levels (GWL) to study groundwater droughts, and I found the manuscript interesting to read. The authors deal with the common problem of irregular time steps between GWL observations. Time series modeling is applied to obtain GWL time series that can be used to study the droughts of 2018 and 2019 in the Netherlands. At this point I need to acknowledge here that my response may not be entirely impartial, being one of the authors of the Pastas software, and apologize in advance for any self-referencing. However, I wanted to provide the Authors with some suggestions to further strengthen the acceptability of the reported results, and improve the manuscript regarding its reproducibility. I will restrict myself to the time series modeling approach, as I think others have already provided excellent reviews of the manuscript in its entirety.
Effect of use of linear model
An important assumption underlying this study is that a linear recharge model can be used to accurately simulate the effects of precipitation and evaporation on the GWL. Previous studies have shown (e.g., Berendrecht et al., 2005; Peterson and Western, 2014; Collenteur et al., 2020) that this assumption may not always be valid, particularly during drought periods when non-linear unsaturated zone processes become more important (e.g., evaporation limited by the availability of soil moisture). This is particularly important because these droughts events are the periods of interest in this study. This model deficiency may partly explain the large RMSE and ME values in Table 3 of the manuscript and the results shown in Figure 8. The linear model could still be an appropriate choice here, but a justification of this assumption is required in my opinion. The impact of this assumption on the estimated SGI values and the results in general could also be discussed in the discussion section (e.g., in lines 450-455).
Uncertainty in time series modelling and its impact on the SGI
Time series models are used to obtain regular GWL time series, comparable to the approach presented by Marchant and Bloomfield (2018). In that study the uncertainty of the simulated GWL was also quantified and used to compute the uncertainty of the SGI values. I think it would be interesting to do this in this study as well (or discuss why this is not done), given that, despite generally good fits, the simulated GWL time series and thus the SGI may still have considerable uncertainties.
Reproducibility of results
Some of the claims made in the manuscript directly depend on how the time series modeling was done, which is very briefly described in section 3.1 and 3.2. From the information contained in the manuscript it is not possible to reproduce the results from this study, to verify any of these claims. I therefore think some work is required to improve the reproducibility of the presented work. This could be a much more detailed description of the modeling process (e.g., settings, model structure, calibration settings, software versions), but perhaps an easier way to do this would be to upload all scripts and data (if allowed) to an online repository (e.g., Zenodo) and assign an DOI. This would enable other researchers to build upon this work more easily and make it a more valuable contribution.
Specific Line Comments: Please find a few specific lines comments below.
L168. I would kindly ask the Authors to change “PASTAS” to “Pastas” throughout the manuscript.
L168-170. I think it would be good to rephrase this sentence, because it reads as if this was the goal of developing Pastas, which is incorrect. The use of (Python) scripting is what allows the models to be applied in larger workflows.
L170. It is unclear from the manuscript or the reference what “the basic settings of Pastas” are. It should either be described in the manuscript, or the scripts and data can be provided in an Appendix or external repository. Sharing the scripts would help in improving the reproducibility, without requiring the Authors to go into details about the modeling in the manuscript itself and increasing its length.
L172. Pastas has multiple non-linear recharge models, based on Berendrecht et al. (2005) and Collenteur et al. (2020). It is not clear which method has been tried here, but this would be valuable information. Moreover, the statement is not supported by any data presented in this manuscript, so it is hard to verify such a general statement. Since the non-linear models have more parameters to fit the model to the data, compared to the linear model, I find the finding somewhat surprising. However, it could be the case that the linear model really does work better, perhaps because evaporation of groundwater occurs at most monitoring wells (which would be an interesting finding in itself).
L173-174. “Small minority” could be quantified (e.g., XX number of wells). Also, the current phrasing in this sentence seems to suggest that a non-linear model was used for some locations, but in that case no parameter “f” would be available for some models (L.185).
L462. The fact that the drought was overestimated may also result from the use of a linear model, where evaporation is not limited by soil water availability and may ultimately lead to a decrease in simulated GWL.
References
Berendrecht, W. L., Heemink, A. W., van Geer, F. C., and Gehrels, J. C.: A non-linear state space approach to model groundwater fluctuations, Advances in Water Resources, 29, 959–973, https://doi.org/10.1016/j.advwatres.2005.08.009,
Collenteur, R. A., Bakker, M., Caljé, R., Klop, S. A., and Schaars, F.: Pastas: Open Source Software for the Analysis of Groundwater Time Series, Groundwater, 57, 877-885, https://doi.org/10.1111/gwat.12925, 2019.
Collenteur, R., Bakker, M., Klammler, G., and Birk, S.: Estimating groundwater recharge from groundwater levels using non-linear transfer function noise models and comparison to lysimeter data, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2020-392, in review, 2020.
Marchant, B. and Bloomfield, J.: Spatio-temporal modelling of the status of groundwater droughts, J. Hydrol., 564, 397–413, 2018
Peterson, T. J. and Western, A. W.: Nonlinear time-series modeling of unconfined groundwater head, Water Resources Research, 50, 8330–8355, https://doi.org/10.1002/2013WR014800, 2014.