Based upon the reviews, the article needs major revision. Authors are kindly invited to follow the indications by the reviewers when preparing the revised manuscript, or, in case they disagree with their comments, to explain in detail the reasons why. 
Suggestions for revision or reasons for rejection  Processedbased modelling to evaluate simulated groundwater levels and frequencies in a Chalk catchment in Southwest England.
Brenner at al.
This is a revised version of a manuscript detailing the simulation of groundwater levels in a Karst environment in the Southwest of England. I described in my review of the original manuscript how this topic is highly relevant to the journal and timely given the increasing need to simulate and forecast groundwater levels from limited datasets. The manuscript is suitably concise and the description is clear. I fully agree with the authors’ statement in their Abstract that “specialised modelling approaches are required that balance model complexity and data availability”. The authors assess whether they have achieved this balance by both exploring the identifiability of the parameters within their model (using a Shuffled Complex Evolution Method; SCEM) and by comparing model performance metrics for calibration and validation datasets (i.e. a splitsample test). They conclude that their modelling exercise had been a success because their analyses suggest that all of the parameters are identifiable and the differences between the calibration and validation metrics take values which they consider to be small.
Whilst I fully endorse the authors’ general approach to assessing the performance of their model I have severe concerns about the exact way in which it has been implemented. I do not believe that the posterior distributions of the parameters yielded from the SCEM accurately reflect the uncertainty of these parameters. Furthermore, I do not believe that the comparison between performance metrics upon calibration and validation are particularly meaningful. For these reasons I do not recommend that the manuscript is accepted for publication.
I first detail my concerns about the analyses of parameter identifiability. Looking at Figure 5, it is apparent that according to the SCEM that when the model is calibrated using only discharge data that the Kc parameter (for example) is almost perfectly identifiable. This posterior distribution indicates that this parameter definitely has a value less than 1. However, when all of the calibration data is used the parameter definitely has a value greater than 9. This is a clear contradiction and at least one of these two posterior distributions must be incorrect. Similar contradictions are evident for all of the other parameters except for those related to the groundwater level in a specific borehole.
Furthermore, the theoretical justification for the authors’ choice to use the KlingGupta efficiency (KGE) as the objective function within the SCEM is rather weak. The formal theory of Markov Chain Monte Carlo methods such as the SCEM require that the objective function is a likelihood (i.e. the probability that the data is realised from the proposed model). The authors indicate that the KGE can be treated as an ‘informal’ likelihood function and refer to a paper by Smith, Bevan and Tawn. This paper does discuss informal likelihood functions and describes sufficient conditions for informal likelihood functions to satisfy the most fundamental axioms of a probability. As far as I can see, the paper does not explicitly mention the KGE. The starting point for satisfying the axioms of probability is that the informal likelihood function can be written as an Lpnorm. It is not immediately clear to me that the KGE can be written as an Lpnorm. Therefore, I am unclear of the relevance of the Smith et al. paper to the authors’ study and I am not convinced that the KGE satisfies the fundamental axioms of a probability. I would have thought these axioms were a necessary requirement for a function to be treated as a likelihood.
The authors do not provide any calibration diagnostics which might indicate that the SCEM has converged to a stable posterior distribution.
The authors do not conduct any validation tests which might indicate that the posterior distributions reflect the uncertainty of the model parameters.
I similarly have a number of concerns about the authors split sample tests. First, the authors conclude that decreases in model performance upon validation of 11 and 21% are sufficiently small to indicate robust model performance. They refer to other studies where a similar decrease in performance was observed. However, the decision that 21% is ‘sufficiently small’ is completely subjective. The expected decrease will be a complex function of the number of observations and the degree of seasonality, variability and autocorrelation realised by the data. Therefore, the comparison with other studies is irrelevant. This being said, if I were to compare these values with the results of modelling exercises I have previously undertaken then I would consider 21% to be a relatively large decrease.
In their ‘Responses to comments’ document the authors describe how the KGE objective function “was chosen by trial and error comparing the simulation performances during calibration and validation obtained different objective functions (RMSE and other)”. This ‘trial and error’ approach very much concerns me. The validation data should not be involved with the model calibration in any way – this includes the decisions about how the model is calibrated and what objective function is used. In my opinion, the use of the validation data in this manner invalidates the authors’ splitsample tests. If the authors have infinite patience it is almost inevitable that they will eventually find a model calibration set up which yields results which they find pleasing. However, this setup is likely to be particular honed to the particular characteristics of the data they have used and is likely to perform less well as other data become available.  

Suggestions for revision or reasons for rejection  The manuscript from Brenner et al. is dealing with modeling of fractured and karstified aquifers in England. I found the topic well developed and of interest for an international audience. The contribution is well organized and contains interesting data and discussion. Some minor comments are detailed below:
 Introduction: the literature review on this topic is wider than described in the text. About potential future changes in groundwater dynamics (lines 1415 p2), several additional examples can be found in recent literature; please add references using recent papers
 Results: what are the "hardly identifiable parameters"? See line 22 p7. Please list the parameters which are not well identifiable and discuss this limit in the discussion section
 Discussion: same topic, line 38 p8: "all model parameters are identifiable": It does not seem from the figure
 line 2627 p9: the sentence starting with "This is obvious" is not clear to me, please revise and explain better.
 Conclusions: this chapter needs revision; at the moment it appears more like an abstract than a conclusion. Please identify the main findings and the main limits of your study, and list if possible as "take home messages", clearly and concisely.  

Minor revisions are required to your article. Please, when preparing the revised edition, take into the due account all comments and suggestions from the reviewers. In case you disagree with some of them, please indicate the reasons why. I am looking forward to receiving your revised paper. 
Minor revisions are required before acceptance.
Most of the corrections deal with the references. Please strictly follow the guidelines for citation, and particularly list the references in the text in chronological order.
Some additional references are also suggested in the attached file.
Comments to the Author: PDF 