The paper documents the development of 1km x 1km gridded dataset of precipitation and maximum and minimum temperature. The authors apply methods of Serrano-Notivoli (2017, 2019) and while there are no methodological innovations, the work itself is not trivial. The dataset is in my opinion important contribution to regional climatology and is potentially useful for following applications from trend assessment to hydrological modeling and as such it is worth publication in ESSD.
I have, however, some concerns about the way the paper presents the application of Serrano-Notivoli (2017, 2019) framework and related uncertainties. To summarize in general, the authors focus more on the presentation of the derived dataset than the derivation itself. Which I believe is opposite the way it is demanded by ESSD. Details are given below. Therefore I suggest to reconsider the paper after major revision of the paper.
Comments:
1. Methods l. 101 and further - authors use 10 neighboring stations to estimate the central value. It is not clear whether this applies also to the boundary stations or not. In addition, as the data series are not available for the whole 1950-2018 period, it seems that the 10-nearest set changes over time, implying that also the glm or glmm models should change. This should be stated explicitly since it has an impact on uncertainty of the estimated values. It should be further discussed in section 5.
It is also not clear, what was the basis for selecting 10 stations and how this choice impacts the results.
l. 105 the authors are speaking here about wet probability but it is at this point of paper not clear at all, how they get it.
l. 107 what do you mean by "internal coherence"
l. 110 standard deviation of what?
l. 113 all suspects were removed?
l. 115-116 "the RVs ... were then calculated ..." - from the text above it seems that the RVs had to be calculated already before to identify suspects. If this is the case, then I believe it would be better to begin with the glm(m) model to also clear up the QC procedure.
It is also not clearly presented, what have you done to obtain station time-series prior to the start of the measurement and after its end. Does it mean that the model for a location changed when a close station popped-up? This should be clearly described and discussed also later in sect. 5 since it impacts the uncertainty of the estimates, which in principle would be different across years.
2. model description
The key part of the procedure are the GLM and GLMMs models, however, we do not learn much about the choice of the explanatory variables and nor any model assumptions are discussed. I believe it is very important to reveal, how the model was set up, how the variable selection was done and how the uncertainty was estimated.
l. 117-118 authors state that precipitation and temperature were used as dependent variables and lat, lon, alt and distance from the coast as independent - is it really like this? Please, give a precise description of what you have done.
3. Results - there should be less information on Slovenian climate and more information on model selection, uncertainty assessment etc.
4. Discussion - limitations and uncertainties of the dataset should be discussed in detail.
Minor comments: l. 58 Another source of discrepancies is that the grided data represent spatial average rather than point value leading to smoothing of extreme values.
l. 140 Please consider adding information on the overall number of missing values.
Fig. 3 - not entirely clear what is on y-axis? Is it the total number of removed days for a year and all stations?
l. 143 "The majority of the data was removed ..." vs l. 145 "... only 1.26% ... were removed" - please revise the sentence to be clear.
l. 163-164 would it not be better to set up some wet-day threshold?
l. 169 - 173 I was not able to understand what is described here and what precisely is represented by Fig. 4
Fig. 7 perhaps would be better to use lines instead of dots.
The database seems to be very useful also for dendroecology. Did the authors think to provide a tool or some web application for downloading climatic data just for calculated and selected grid points (maybe in specific selected time periods) in a more widely used format?
In this paper the authors present a high-resolution (1 x 1 km) dataset of daily precipitation and temperature for Slovenia for the 1950-2018 period. The interest of this dataset is high as it covers a gap in the climatology of Slovenia, a region with both climate and terrain complexity, as mentioned in the manuscript.
The authors used a previously designed methodology by two of them, that has already been tested in Spain and other regions.This methodology is based on estimating daily data of the variable of interest by using GLMs/GLMMs, daily data of the 10 closest weather stations and geographic information, such as latitude, longitude, altitude and distance from the coast. The main advantage of this methodology is that all available data can be used, even if weather stations cover short periods.
The paper is well written and the use of climate terminology is adequate.
I also consider the paper is correctly distributed in their sectioning, and the authors provided different verification tests of the variables, both in time and space.
I would like to make some general comments to this paper.
- While the authors used a method based on the spatial structure of climate data, other methods relies on the temporal structure of climate data to estimate missing data.
My first comment is related with this point, as I would like to know the opinion of the authors regarding strengths and drawbacks of the methodology they used compared with methods based on the temporal structure of the climate data.
- Also regarding the methodology. If I have understood correctly, the authors first estimated missing data of weather stations and in another iteration of the algorithm they estimated climate data at each grid cell of 1 x 1 km. If this is correct, it turns out the authors used estimated data as if they were observed data. Is this correct? If so, what is the possible impact of this on the obtained results in the opinion of the authors?
- Slovenia is a complex territory, with high elevation areas. Unfortunately, only a few number of stations are available above 1000 m. While some of the validation tests are shown in terms of elevation, some discussion regarding its possible impact on the obtained dataset is missing.
- Station network changes all over the period, with an initial increase in the number of stations followed by a constant decrease after the 80’s. This constant change on the number of true observations can have a great impact on the quality of obtained results. While validation of the results is presented by altitudes and by months, why the authors did not provide a validation based on decades (for example)? I think it could be really interesting to assess the impact of changes in the station network in the obtained results.
- One of the key points of the used methodology is the estimation of the uncertainty of each estimated data. The authors should provide more information regarding the calculation of the uncertainty and some kind of validation of the estimated uncertainty could also be provided. Maybe a comparison between uncertainty of estimated data and MAE? A temporal perspective of the estimated uncertainty would also be of high interest to evaluate the impact of changes in station network on the uncertainty of obtained data.
I also have some specific comments
Lines 81-82. “Although some data series begin before 1950, we decided to limit the research to the years 1950 to 2018, when the station network remained stable over time and space”. I don’t fully understand this sentence. When the authors said that the station network remained stable over time and space, what exactly do they refer to? In figure 2 it is clear that the number of stations reduced constantly from the maximum number slightly higher than 100 to a number of only 20 in 2018.
- In Figure 9 and figure 11 the authors provided the estimated uncertainty for some derived indices of temperature and rainfall. While in the methodology section the authors mentioned how they estimated the uncertainty of each estimated data of temperature and rainfall, how the authors derived the uncertainty of the indices?
I would like to make some comments/suggestions regarding some of the thresholds used in temperature quality control.
- “For temperature we used also five criteria: (1) internal coherence;” I assume the authors refer to the coherence between Tmax and Tmin when they write “internal coherence”. This should be clarified in the manuscript. Just for curiosity, is this coherence test based on Tmax > Tmin or Tmax ≥ Tmin?
- The following two comments are more a suggestion for successive works or some update of the database than comments to modify this manuscript.
I think the thresholds used to classify a daily data of temperature as an “out of range” data could be enhanced, especially those used for minimum temperature.
While in the introduction, the authors says. “Moreover, temperature ranges from -35 to +40 °C (BertalaniÄ et al., 2006) show the very extreme character of the seasons”. Then, in the explanation of the quality control they say: “(3) removal of those days out of range considering maximum temperature (TMAX) ≥ 50 °C or TMAX ≤ –30 °C and minimum temperature (TMIN ≥ 40 °C or TMIN ≤ –35 °C)”
If temperature ranges from -35 to +40ºC, the consideration of Tmin= -35ºC as an out of range value could imply deleting real extreme values. On the other hand, I consider as too high the Tmin>= 40ºC threshold.
- “;(4) removal of all days in a month with a standard deviation equal to zero (suspect repeated values in the series);” With this criterion, the authors only considered repeated values when all the month has the same value. Well, this criterion is as valid as others, but I think this could be easily modified to be more restrictive. Many different thresholds are used in the literature, but when temperature data with decimal precision is provided, 10 consecutive days showing the same temperature are very unlikely. I would suggest the authors to use 7-10 consecutive days as a new threshold.
Table 5.- December is missing.
Figure 1.- As the authors mentioned the complexity of terrain in Slovenia in the manuscript, they could represent a digital elevation model in this figure to help the readers of the manuscript to better understand this complexity. In this figure, the duration of data of each meteorological station could be also represented.