The authors present a novel method for emulating ocean-induced sub ice shelf meltrates based on image segmentation and an autoencoder network, which takes the nearby ocean state and ice shelf cavity information as inputs. The methodology shows promising results at emulating the meltrate produced by a full ocean model (NEMO) and appears to provide superior predictions when compared to the "medium-range complexity" PICO and PLUME parameterizations. A separate, idealized, analysis indicates that this conclusion is appropriate, since the ML-based methodology reproduces what we would expect from theory despite being quite different (e.g. in terms of geometry, thermohaline forcing) than the training and validation data. Additionally, in the final discussion, the authors consider potential avenues by which this approach can be used as a baseline to develop more advanced sub ice shelf meltrate parameterizations.
The paper provides a valuable contribution to the field of ice-ocean modelling, as it establishes that a neural network architecture such as MELTNET can be used to provide computationally efficient, yet accurate, representations of basal melting to an ice sheet model - a critical external forcing mechanism. Moreover, it will be interesting and valuable to see how this framework can be used to aid our development of meltrate parameterizations in future work. The manuscript is methodologically sound, and it is a pleasure to read. I would like to compliment the authors on making figures that convey the main ideas of the methods (which involve many details) concisely and effectively. I have just a few relatively minor comments and suggestions that I think would help improve the paper.
Specific Comments
Making predictions with an ML surrogate is much cheaper than running an ocean model, but the training is not free and can be quite computationally demanding. The authors allude to this in Line 64:
"Since the computational cost of a machine learning algorithm is insignificant once it has been trained,"
However, I think it should be mentioned in this way in the abstract and earlier in the introduction, e.g. around Line 39, since model training can be a major computational expense in ML for high dimensional problems such as those in the geosciences. Moreover, I think the paper would be strengthened by providing some estimate of the computational costs of training and and making predictions with MELTNET. This could be as simple as a table with training and validation walltime for each network, along with the architecture (e.g. was this on a laptop or run in the cloud? how many nodes/cores/threads were used?). Providing these details would help quantify the statement that predictions are almost free, and would help establish to the community that, generally speaking, ML based emulators are worth pursuing.
Lines 101-105: I have a philosophical disagreement with using the temperature and salinity conditions at the icefront instead of using the open boundary conditions to NEMO. MELTNET is an emulator for NEMO (as far as I understand), and therefore it should not use anything that NEMO produces as an input. Rather, it should be given the same boundary conditions and then bypass NEMO altogether. It is fortunate and also useful to note that using either of these conditions provides essentially equivalent result, since this provides an exciting opportunity in the case where ample icefront T/S data are available and could be used as inputs to MELTNET. However, in this paper MELTNET is being presented as a NEMO emulator, so I recommend keeping this note, but using results based on using the same forcing for NEMO and for MELTNET.
Lines 262-270 and 309-314: The comparison to PICO and PLUME in this paper is entirely appropriate. However, in some sense the comparison is unfair since these models are calibrated by tuning 2 global parameters while MELTNET has many degrees of freedom which are optimized during training. One could argue that to make the comparison as fair as possible PICO and PLUME should have spatially varying parameters, which should be calibrated. PICO and PLUME are not used in this way, so I don't think this should be implemented, but it raises a couple of points that I think are worth discussing.
Some details on MELTNET should be included, such as: the degrees of freedom (i.e. number of nodes) for each layer, the number of layers in each stage of the model, and the cost function that is optimized during training (is it simply the norm of the model/data misfit? is there regularization used to penalize large weights?)
The PICO and PLUME parameters are not really optimized, but "hand tuned" so I would suggest changing that wording, especially since MELTNET *is* optimized (trained). Additionally, I think that the difference in degrees of freedom could be worth mentioning. In a sense, one could make the argument that the neural network is a way of capturing the additional degrees of freedom that we would want to have in the PICO or PLUME models (for instance with spatially varying parameter fields) but that we don't know how to specify. All in all - I think some discussion or hypotheses for potential reasons on why MELTNET outperforms these models would improve the paper.
Minor/Technical Comments and Suggestions
lines 35 and 37: Please fix citations: "e.g." comes after the citation but should come before
line 59: "lower complexity parameterizations", I suggest to make the minor clarification that these are ice sheet model parameterizations
Fig 1: I suggest adding a note in the figure caption mentioning that the GAN step is merely a method to generate many realistic T/S profiles for training, but is not necessary for making predictions once MELTNET is trained, with a reference to section 2.3.2 (and possibly Appendix A). The training and prediction stages are clearly delineated in the figure, and your figure caption is well written, but I think adding a note like this will help a reader who is skimming through the figures as quickly as possible (which, of course, will be many people ...).
line 120: I recommend being a bit more specific than "these filters are learned", for instance something like "the weights that make up these filters are learned"
line 148: I recommend referencing Fig B2 before making the parenthetical note comparing swish are ReLU, since I went to Fig B2 looking for a comparison between the two, rather than a description of the normalisation and layers
line 159-161: For the ocean modellers, could you provide some citations for the specific subgrid-scale parameterizations used? E.g. it sounds like the vertical mixing scheme is from (Gaspar et al, 1990, https://doi.org/10.1029/JC095iC09p16179). What is the scheme for generating lateral viscosity coefficients (Smagorinsky, Leith, etc)? What horizontal and vertical diffusivities are used? I think these details will be nice without being overloading.
Line 169: what is the vertical spacing for each of the 45 vertical levels?
Line 208: wouldn't the constraint on ice shelf area be a maximum, rather than a minimum?
Fig 3: Have these ice shelf images been rotated to all be in the same orientation, since you mention that you provide ice shelves in all cardinal directions to MELTNET? If so, you may want to mention that some of these are rotated (e.g. "north isn't always up") in the caption.
Line 242: Please add the year to the citation Boyer et al
Line 250: Do you mean to say "all models *are* judged"?
Line 260: Misspelled "parameterisations"
Line 264: "tunetable" -> "tunable"
Fig 4 caption: Please add the units to the contour intervals in the statement: "contoured from 200 to 800 m at 100 *m* intervals"
Line 291: I would recommend putting this part of the paper (discussing the idealized geometry experiments) in its own subsection or even section. This would help the reader since you are testing a new hypothesis.
Line 351: Another small wording suggestion: "NN emulators *that are* constrained, ..."
This paper proposes a novel method to substitute a physical model that predicts ice-shelf melt rates from geometry, temperature and salinity fields by a deep learning emulator. The strategy is to use a state-of-the-art ocean model (NEMO) in order to generate a large variety of input/output pairs of data, and to train an Artificial Neural Network (ANN) (two ANN here). Using ANNs instead of NEMO permits to save huge computational time with moderated loss in accuracy. I have no doubt that the method proposed by the authors will be of high interest for the community considering the current need for physically complete and computationally efficient ice sheet models -- especially to design the new generation of models for glacier evolution and sea-level rise prediction. Deep-learning surrogate models have already proved their worth in other disciplines (several orders of magnitude speed-ups compared to their instructor models are often reported in the literature). This approach has been shortly used in glaciology to emulate the ice flow dynamics (Brinkerhoff and al., 2021, JOG), (Jouvet and al., 2021, JOG). The proposed application by the authors sounds therefore relevant. I found the paper overall well written and convincing. I have mostly one major comment about the machine learning approach, (my first one below), which does not question the overall relevance of the paper. I give below specific and minor comments that I hope will help the authors to improve their manuscript.
Main Comments:
As you design an ANN mapping 2D to 2D fields with continuous variables, the most logical and intuitive to me would be to use a standard Convolutional Neural Network (CNN) trained as a regression problem with a L1 or L2 loss (e.g. similarly to the CNN I use to learn ice dynamics). You may have also considered a U-NET architecture as well to better capture underlying multiscales, if any. Therefore, my main point is: \textbf{why do you split in two networks?} -- a first segmentation/classification and an auto-encoder (AE) -- I just do not see what this brings except unnecessary complications (and probable loss of information!). Unfortunately, I could not find any line of justification for this choice, namely transforming the problem into a classification one, and then afterwards recovering the lost information (or 'corrupted' as you term it) by an AE. \textbf{To me, the final paper should either i) try to simplify their approach using a single and simple regression network if this proves to be as efficient OR ii) clearly justify the choice of going to a more complex network sequence and explain why the simplest approach was unsatisfactory}. In case of i), consider revising the paper title and removing references to segmentation.
The most convincing to me is the fidelity result of the ANN to the instructor NEMO, but then I think it would be good to clearly give clear numbers and report it in abstract and conclusion. You may choose a metric and state how far (in %) is MELTNET solutions from NEMO? By contrast, I unsure that comparisons with other simpler models should be too elaborated. E.g. Fig. 4 is useful as it shows that the loss in accuracy between MELTNET from NEMO is small/negligible compared to the discrepancy between low and high complexity models (PICO vs PLUME). I think that is enough as I expect the paper mostly to focus on the accuracy of the MELTNET to reproduce its instructor model -- the in-between model comparisons being a substantial task to make sure this is done fairly (I don't have the expertise to assess this). From Fig. 4 I retain that comparing MELTNET with other models is roughly the same as comparing NEMO with others as the two are (hopefully) very close to each other (as the ANN makes a very good job). This also means that the rest is a pure comparison of models no longer involving deep learning, and this may go beyond this scope of the paper. In conclusion, I would probably keep the comparison with PICO & PLUME rather concise, and favor MELTNET/NEMO comparisons. \item The main point of using deep learning emulators is the huge computational gain versus minor loss in accuracy. While you have quantified the accuracy (Fig. 4), it is a pity that you do not do it for computational time. What speed-up? I expect several orders of magnitude. Quantifying the computational time is essential for your paper. You may also comment on the fact ANNs run extremely well on GPU (which is not the case of CPU), giving another important advantage of your method (compared e.g., to NEMO which may not take the same advantage on GPU).
I think the paper can be made more efficient by moving technical machinery in Section 2.3.1 to appendix. The generation of synthetic geometries is necessary, but of lower interest. Moreover, using a GAN is an elegant strategy, but this is probably nonessential.
I think Section 2.2 should come first for the sake of clarity. It sounds more logical to first describe the physical model, and then the ANN you design to learn from the physical model as the choice of the ANN architecture is motivated by the type of emulated physics.
Why not using Antarctica and Greenland real topographies to generate ice shelf geometry? This would avoid to generate synthetic geometries?
Minor comments:
For clarity, I think you should call MELTNET like NEMO-trained MELTNET or at least include NEMO as you may train MELTNET with other models.
l76: suggest "the inputs and NEMO resulting melt rates ..."
l79: fix the typo with "paramterisations"
"These filters are learnt ..." not sure this is understandable for who is unfamiliar with ML vocabulary.
l127-128: why not cropping the area of interest instead of weighting? Anyway, you normally can feed your ANN with different frame dimensions.
l129: You can state clearly you do data augmentation.
l169 2000 m
l273 Figure 2 presents THE main result of the study, ....
s the AE a U-NET architecture? If yes, you should say it.
l 352: you may call it PINN for Physically Informed Neural Network.