I already commented on the original submission, just because of my interest in this type of work. At that time we were working on merging two ET products with a somehow similar approach, therefore my interest in learning from this work and providing some feedback. The first results from our work are now under discussion at https://www.hydrol-earth-syst-sci-discuss.net/hess-2017-573/, in case the authors are interested in our results.
I glanced quickly through the new manuscript and the authors reply to the reviewer comments, and the paper is in a better shape now and possible nearly ready for publication. Still, I feel that there are still a few things that may perhaps be better highlighted in the final manuscript.
(1) This is just an opinion, but I see this more as an initial exercise on merging flux estimates that any offering for a merged product that the community may feel ready to adopt. The monthly time scale, period chosen, and products selected for the merging are a bit constraining the utility of that product. In particular, leaving aside broadly adopted products (e.g., PT-JPL), combining obsolete and newer versions of a product (e.g. GLEAM), or just focusing on MODIS era (i.e., 2000 onwards) are not helping here, even if the choices have been well explained.
(2) The authors limit the time period and exclude products so no time discontinuities exist. However, spatial discontinuities do not seem to be a problem, as they select geographical areas with different products. So, for instance, if I am doing a spatial average over an area containing regions with different merged products, is that a valid estimate? The point I am raising is not to also remove spatial discontinuities, but to signal that no merging will possibly result in a perfect product. Perhaps being more flexible in the time compositing to allow a data-merging-richer product with more extended time coverage would be of interest.
(3) All bias corrections for ET products that I have seen so far implied some type of climatological component, so I am still a bit puzzled about the annual-value bias correction applied here. I may be missing something, but if you compare all-stations fluxes with the corresponding product X fluxes, derive a bias number, and then bias-correct the product X fluxes, it is quite likely that at a large number of locations the resulting fluxes will be unphysical. For instance, at locations with strong seasonal cycles and very low winter values, depending on the sign of the bias correction you may get negative winter fluxes, or unrealistically high fluxes. Those fluxes are later merged, so perhaps the weights may take care of this and produce realistic merged fluxes. Perhaps the authors want to comment on this.
(4) The removal of MPI from the merged product, with an original weight of close to 0.5, to produce nearly statistically identical results is a surprise. Figures 3-4 and 11-12 are nearly identical. I have to confess that I do not understand this. The removal of one of the apparently more contributing pieces of information for the merging does not seem to affect the skill of the merge product to get closer to the tower fluxes.
(5) Points 3 and 4 make me think that a few examples of time series at a couple of tower locations displaying the original products to be merged, the bias corrected products, the resulting merge for different configurations, etc., may have been very illustrative. The boxplots and global maps are great at conveying the large picture, but a few time series may have been very useful to pass a clearer message regarding the merging methodology.
(6) Perhaps my largest concern comes from glancing through the first column of Figures 3-4, and 11-12 and noticing again how little improvement the merged product seem to offer compared with the simple mean product. MSE, COR, and MSRD mean improvements over the different sample selections are close to the zero line. I would speculate (I may be wrong) that if we plotted differences of the weighted-mean and simple mean products with respect to common products (e.g., MPI and LandFlux, as in the paper), the pattern in the differences would be quite close. In other words, I think it is a valid question to ask if all the trouble of merging the products with the weighted methodology are justified given the close performance of the weighted mean. Based on this work and my own research, I am still not convinced. The authors stated in the conclusion, “the DOLCE product performs better than any of its six constituent members overall”, but based on the boxplot results it could be that the simple mean also do so. Perhaps this could have been explored a bit more. |