|I find that, overall, this paper has been improved substantially since its initial submission and the authors are to be commended for the detail and care that they applied to each of my previous comments. The error analysis especially is much more rigorous and robust and produces quantifiable values that are of much greater use to the research community than what was present in the original draft. I am excited about taking my own personal infrared thermometer (which I normally use in my kitchen to tell me when my skillet will give my steak a good sear) and pointing it at the sky to retrieve the PWV. I find that there are still outstanding issues that should be addressed before this paper reaches final approval, though I feel it is on the path to that destination.|
The present work examines two years of handheld infrared thermometer observations and uses PWV calculated from a weighted average of the two nearest radiosonde sites. Remotely-sensed PWV observations are available from instruments that are situated more closely to the IR thermometer observing sites. The authors make use of these observations in a secondary sense to illustrate the overall annual evolution of PWV over the domain. These results are displayed in Figure B1, a welcome addition to the discussion. However, this figure also raises a significant question: are the radiosonde observations (especially EPZ) biasing the analysis? It seems to me that the most representative observation for the IR thermometer is going to be from the AERONET station: it’s not the closest observation point, but it is the most similar in altitude and it is substantially closer than the radiosonde sites, and it also has observations that are much closer in time to the IR observations than the sondes do. Figure B1 indicates that the sondes tend to be more moist than the AERONET, with EPZ consistently more moist than the other observations. How is it known, therefore, that the spatially weighted average of the ABQ and EPZ sondes is the most representative observation and therefore the one around which this work should be built?
The authors do not make primary use of the remotely sensed observations for two reasons, representativeness (due to the altitude difference between Suominet and the IR thermometer) and missing data (due to a year-long gap in the AERONET and Suominet observations). The first issue is a valid one, but I feel this second point deserves some more inspection. After all, the AERONET observations are still available for approximately half the observing period. It may be that two years of weighted-averaged radiosonde data that is 100s of km and approximately 6 h removed from the IR thermometer observations is better than one year of AERONET observations that is both spatially and temporally closer to the target. However, that point needs to be explicitly argued.
In the end, this work relies on observations that are frequently 6 h old (or 6 h early), at least 100 km away, and much more moist than more local observations. Using the radiosonde observations may be the appropriate course of action, but it needs to be demonstrated that this set of decisions is the correct one.
Finally, as I read through this work again, I’m left with one very fundamental question: how good is it? An analysis that shows the relationship for the IR PWV product to some kind of truth (be it the merged sondes, AERONET, etc.) seems to be lacking. The figures shown in the present work, such as the relationship between the sky temperature and PWV, are important but the relationship between the new product and the truth is critical. This could take the form of a scatterplot, histogram of the differences, box and whisker plot for the differences in various PWV bins, etc., but something should be in there. Crucially, I do not have a sense of how well the product performs as a function of different values.
98. For the observations taken at 2300 UTC, are they matched to temporally averaged radiosonde observations or are they just matched to the nearest sonde time?
112. It is important to emphasize that the determination of clear or cloudy skies is a subjective observation by a human observer.
115. The lack of brightness temperature observations below a given temperature threshold (resulting in NaN values) means that low PWV values cannot be observed with this method. What is the minimum PWV value that can be observed, and what is the seasonal distribution of missing data? This seems like an important issue that end users ought to be aware of. I assume that this is a more frequent occurrence in the high deserts of New Mexico than it is in the environment observed by Mims, and that wintertime values are more likely to be missing, but these points should be made explicit in the text.
Figure 1: This is an extremely minor point, and you can address or ignore as you see fit, but I find figures easier to interpret when grid lines are present.
142. When you say ground temperature, do you specifically mean skin temperature as measured by the IR thermometer?
165. If the Suominet and AERONET observations are going to be part of this analysis (even if only in the appendix), their locations should be noted on Fig. 2.
165. Sometimes the text refers to Figure N, and other times it refers to Fig. N. This may be a stylistic choice, as it appears that the word is spelled out at the start of a sentence but not elsewhere, so I don’t know how much consistency you are going for here.
174-175: I’m not seeing where your product appears in Appendix B (unless you only mean the merged sondes). This goes back to the point I made in the major comments above about not really getting a sense of the skill or utility of the product.
203. This seems like a counterintuitive way to approach the exceedance thresholding, as though the most important thing was to preserve 90% of the dataset instead of crafting a representative dataset. If the data are unrepresentative, they should not be used regardless of how many event dates must be removed. At a minimum, it is important to know how many standard deviations that 55% difference represents. (Also note: in the response to the reviewers, the authors stated this was a 75% threshold, so they should verify which value is the correct one.) It is easier to scientifically justify a standard-deviation-based filter than a filter designed to preserve a certain fraction of the total dataset, even if in the end you tune one filter to match the other.
226. This is close to what I was suggesting when I suggested a monte carlo simulation. My thinking was that you could take an IR-observed temperature, randomly perturb it by some value drawn from a gaussian, and plug that into your tool to obtain a new PWV. Do that a few thousand times, and you’ll have an estimate on how the instrument uncertainties contribute to uncertainties in PWV. This doesn’t take into account the uncertainties in the model, however, which your approach seems to do.
230. Does this RMSE vary with the magnitude of the signal? Looking at Figure B1, a RMSE of 0.35 cm is very close to the observed value for the winter months. Do you expect that the error bars are very similar throughout the year, or do the larger PWV values in the summer have greater uncertainties associated with them?
285. This seems to imply that it may be possible to derive the appropriate relationships between PWV and the IR temp without needing to take two years of manual observations to generate a testing dataset. Is this true? Or, rather, are there ways to arrive at the needed coefficients using existing data? (Earlier, when I said that I wanted to point my IR thermometer at the sky to get PWV, I meant it.) In all seriousness, you have done a good job demonstrating that the system needs to be trained to specific locations due to the large climatological variability in water vapor content. But are there ways to achieve acceptable results using a priori data? I think this is an important point for the issues raised in the conclusions, as substantial datasets will need to be collected by citizen scientists and school groups just to train the relationships. If an initial model can be implemented immediately from prior observations, NWP, etc., the adoption of such a program will likely increase.
Figure B1. I keep coming back to this figure throughout reading and reviewing this paper. At times I wonder if this figure is important enough that it deserves promotion ot the main body of the paper.