This manuscript describes the creation and application of a new Python-based software, called “SITool”, for evaluating Arctic and Antarctic sea ice in global climate models. The authors utilize SITool to analyze models from the CMIP6 OMIP in terms of their sea ice concentration, thickness, snow depth and ice drift. The authors find that model biases exceed observational uncertainties and note improved model performance using the JRA-55 atmospheric forcing versus the CORE-II atmospheric forcing. No single model performs best in all metrics as there is no link found between performance in one variable and performance in another.
This manuscript is thorough and well-organized. The figures and tables support the discussion well and the analysis clearly demonstrates the utility of SITool. My main comment is that, while discussion of model ranking is distributed through the manuscript, there is no section devoted to synthesizing the cross-metric analysis or a figure documenting the rankings (mentioned in the Conclusions on Page 20, Line 468). I recommend the addition of a short paragraph to the Results section summarizing the findings and implications of the cross-metric analysis. While model ranking may not be the primary goal of the tool, it is mentioned enough in the manuscript to warrant further discussion and context prior to the Conclusions. The authors may consider moving the text on Page 20, Lines 470-473 to the added paragraph and/or including a table of the best and worst performing models for each metric to the main text or Appendix.
Overall I recommend minor revisions to address the cross-metric analysis and the specific comments below.
Page 1, Line 11: I recommend replacing the phrase “bi-polar” with “Arctic and Antarctic” throughout the manuscript for clarity.
Page 2, Line 49: I recommend expanding briefly on what is meant by “rather limited” to describe which sea ice diagnostics are provided in ESMValTool and which are unique to SITool.
Page 2, Line 52: Can you please clarify what is meant by SITool providing “qualitative” information? The tool seems to be used primarily for calculating model biases and related metrics, which I would consider primarily quantitative.
Page 3, Line 92: Can you please clarify here if the interpolation is a component of the SITool workflow or is a preprocessing step that needs to be completed before using SITool?
Page 4, Line 117: It would be helpful to have a brief sentence explaining why February and September were chosen (for example, why February instead of March).
Page 6, Line 184: Please list here the respective resolutions of CMCC-CM2-HR4 and CMCC-CM2-SR5 or provide a reference to Table 1.
Page 9, Line 259 (and page 14, line 359): Throughout the manuscript, I recommend using “finer” or “higher” spatial resolution versus “increased”.
Page 10, Figure 2 (as well as Figures 5, 7, 10): It would be helpful to remind the reader in each of these figure captions that lower values indicate better skill.
Page 14, Line 355: “…the ice edge location simulations in the Arctic are much better than that in the Antarctic.” This is an interesting and logical point that you’ve quantified. Perhaps this has also been shown elsewhere? If so, reference(s) would be helpful.
Page 16, Line 420: Can you please clarify what is meant by “different observational references” in this sentence? Different from what?
Page 17, Figure 8 (and page 18, Figure 9): I recommend a new color map for these figures as the chosen color map may present challenges for readers with red-green color blindness.
Page 19, Line 446: On page 6, line 144 the authors write that two observational references are used for each variable, but here the phrase “at least two” is used. Can you please clarify if you mean that SITool is equipped to handle more than two sets of observational references?
Page 21, Line 488: “While it is running, SITool (v1.0) produces ancillary maps and time series that can be consulted by the expert to understand the origin of one particular metric value.” I believe this means that SITool automatically creates the kinds of maps provided in Appendix A, and if that’s true, please reference Appendix A here. It would also be useful to note in Section 2 that SITool automatically outputs the differences (which may be just as useful to some users) in addition to the scaled metrics.
Page 2, Line 60: I recommend rephrasing the grammar of the final sentence to something such as:
“The SITool is written in the open-source language Python and distributed under the Nucleus for European Modelling of the Ocean (NEMO) standard tools. SITool is provided with the reference code and documentation to make sure the final results are traceable and reproducible.”
We thank all three reviewers for their constructive comments on the earlier version of the manuscript. Our reply is attached here.
The manuscript "SITool (v1.0) – a new evaluation tool for large-scale sea ice simulations: application to CMIP6 OMIP" describes a new Python diagnostic tool to evaluate sea ice models in the Arctic and Antarctic over the historical period. Although it is designed primarily for atmospheric reanalysis-forced simulations, as presented in the manuscript, it could be useful in other model frameworks as well. This tool is complementary to other climate model evaluation tools, such as ESMValTool. Comparison with multiple observational datasets allow for evaluation of sea ice concentration, extent, edge location, ice thickness, snow depth, and ice drift. The evaluation of Ocean Model Intercomparison Project runs are used here as example, but also provide results and sea ice model performance.
This manuscript is well within the scope of the journal, as it introduces a novel new tool for evaluating climate model performance on a critical component: sea ice. Consistent, repeatable methods of evaluation like this are greatly needed by the community. It also provides novel results on the impact of the atmospheric forcings on the modeled sea ice (especially that model biases are significantly reduced by using JRA-55). The title and abstract well capture the key points. Methods are generally clearly described, and the code is well-documented and easily accessible. The paper is generally well structured and clearly written, but some figures could be improved for easier interpretation. It would be helpful to be clearer about how the presented results connect with the code and outputs in the published package. I believe with minor suggested edits and demonstration of code implementation (either by an additional reviewer or by the author, within the repository), this manuscript warrants publishing. Note: this reviewer did not complete a test of the scripts, and it may be useful for this code to be checked and tested by someone who is experienced in working with these output types.
Review of “SITool (v1.0) - a new evaluation tool for large-scale sea ice simulations: application to CMIP6 OMIP” by Xia Lin, François Massonnet, Thierry Fichefet, Martin Vancoppenolle (gmd-2021-99).
This paper introduces an evaluation tool for sea ice simulation and presents its application to CMIP6-OMIP simulations available through ESGF. I think that such a tool will become a valuable asset for the climate/sea ice modeling community and such activities should be strongly encouraged. Calculation methods of metrics are well described and the evaluation using this tool is well presented. The comparison between OMIP-1 and OMIP-2 simulations, which use different surface atmospheric forcing dataset, is timely and should be highly appreciated. However, I think that some discussion would be needed for the proposed method for the evaluation of interannual variability and trend as commented below.
Metrics are proposed for the monthly mean state, interannual variability, and trend, with each metric basically using common calculation method: difference between simulation and observational reference is scaled by observational uncertainty based on the difference between two observational datasets. For me, applying this method to the monthly mean state was understandable, but it was somewhat difficult to interpret the specific values of metrics for interannual variability (standard deviation of monthly anomalies) and trend. If I was to evaluate interannual variability of a simulation, I would like to know the size of the standard deviation of monthly anomalies relative to that of observational reference. Specifically, I think that the metrics would be easier to interpret if the standard deviation was scaled by the that of an observational reference and the range of values obtained by applying different observational references were presented. The same argument would be applied to trends and in this case the signs of trends could be also evaluated. I would like to ask the authors to explain the background behind the choice of the current method.
I would like to add that it would be useful and clear if the calculation methods are presented using mathematical formulas.
L135, 150, 164: Why equal weight is used for these metrics?
L184: “the influence model resolution” should read “the influence of model resolution”.
L286: “exits” should read “exists”.
L288: “without reduction”… I could not understand the meaning of this phrase in the sentence.
Figure 3: It was difficult for me to distinguish the lines. I would suggest the figures to be separated for OMIP-1, OMIP2, and their means, that is, into the total of six figures.
Please provide a reason why you see this comment as being abusive.You might include your name and email but you can also stay anonymous.
Please provide a reason why you see this comment as being abusive.
Please confirm reCaptcha.