Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850
Summary (6 min read)
1. Introduction
- The historical surface temperature data set Had-CRUT [Jones, 1994; Jones and Moberg, 2003] has been extensively used as a source of information on surface temperature trends and variability [Houghton et al., 2001] .
- Since the last update, which produced HadCRUT2 [Jones and Moberg, 2003] , important improvements have been made in the marine component of the data set [Rayner et al., 2006] .
- These include the use of additional observations, the development of comprehensive uncertainty estimates, and technical improvements that enable, for instance, the production of gridded fields at arbitrary resolution. [3].
- These new developments include improvements to: the land station data, the process for blending land data with marine data to give global coverage, and the statistical process of adjusting the variance of the gridded values to allow for varying numbers of contributing observations.
- Results and uncertainties for the new blended, global data set, called HadCRUT3, are presented.
2.1. Station Data
- The land surface component of HadCRUT is derived from a collection of homogenized, quality-controlled, monthly averaged temperatures for 4349 stations.
- This collection has been expanded and improved for use in the new data set.
2.1.1. Additional Stations and Data
- New stations and data were added for Mali, the Democratic Republic of Congo, Switzerland [Begert et al., 2005] and Austria.
- Data for 16 Austrian stations were completely replaced with revised values.
- A total of 29 Mali series were affected: 5 had partial new data, 8 had completely new data, and 16 were new stations.
- As well as the new stations discussed above, additional monthly data have been obtained for stations in Antarctica [Turner et al., 2005] , while additional data for many stations have been added from the National Climatic Data Centre publication Monthly Climatic Data for the World.
2.1.2. Quality Control
- Much additional quality control has also been undertaken.
- Only a small fraction of the data needed correction, however; of the more than 3.7 million monthly station values, the ERA-40 comparison found about 10 doubtful grid boxes and the visual inspection about 270 monthly outliers. [8].
- These duplicates have arisen where the same station data are assimilated into the archive from two different sources, and the two sources give the same station but with different names and WMO identifiers.
- Where there are insufficient station data to achieve this for the period, normals were derived from WMO values [World Meteorological Organization (WMO), 1996] or inferred from surrounding station values [Jones et al., 1985] .
- Figure 1 shows the locations of the stations used, and indicates those where changes have been made.
2.3. Uncertainties
- To use the data for quantitative, statistical analysis, for instance, a detailed comparison with GCM results, the uncertainties of the gridded anomalies are a useful additional field.
- Black circles mark all stations, green circles mark deleted stations, blue circles mark stations added, and red circles mark stations edited.
- Impossible, because it is always possible that some unknown error has contaminated the data, and no quantitative allowance can be made for such unknowns.
- There are, however, several known limitations in the data, and estimates of the likely effects of these limitations can be made (Defense secretary Rumsfeld press conference, June 6, Back to disarmament documentation, June 2002, London, The Acronym Institute (available at www.acronym.org.uk/docs/ 0206/doc04.htm)).
- This means that uncertainty estimates need to be accompanied by an error model: a precise description of what uncertainties are being estimated. [14].
2.3.1. Station Errors
- The uncertainties in the reported station monthly mean temperatures can be further sub divided.
- The values being gridded are anomalies, calculated by subtracting the station normal from the observed temperature, so errors in the station normals must also be considered.
- So the error in the monthly average will be at most 0.2/ ffiffiffiffiffi 60 p = 0.03°C and this will be uncorrelated with the value for any other station or the value for any other month. [19].
- So this does not contribute to the measurement error.
2.3.2. Sampling Error
- Even if the station temperature anomalies had no error, the mean of the station anomalies in a grid box would not necessarily be equal to the true spatial average temperature anomaly for that grid box.
- This difference is the sampling error; and it will depend on the number of stations in the grid box, on the positions of those stations, and on the actual variability of the climate in the grid box.
- The spatial distribution of sampling error , like the station error, is dominated by the station standard deviations and the number of observations.
- The distribution is very similar to that for the station error.
2.3.4. Combining the Uncertainties
- The total uncertainty value for any grid box can be obtained by adding the station error, sampling error, and bias error estimates for that grid box in quadrature.
- This gives the total uncertainty for each grid box for each month. [44].
- In practice, however, this combined uncertainty is less useful than the individual components.
- The combined effect of grid box sampling errors will be small for any continental-scale or hemispheric-scale average (though the lack of global coverage introduces an additional source of sampling error, this is discussed in section 6.1).
- Combined station errors will be small for largescale spatial averages, but remain important for averages over long periods of the same small grid box.
3. Marine Data
- The marine data used are from the sea surface temperature data set HadSST2 [Rayner et al., 2006] .
- Previous versions of HadCRUT use the SST data set MOHSST6 [Parker et al., 1995] .
- It has been shown, for example, by Parker et al. [1994] , that this is the case, and that marine SST measurements provide more useful data and smaller sampling errors than marine air temperature measurements would.
- Like the land data, the marine data set has known errors: Estimates have been made of the measurement and sampling error, and the uncertainty in the bias corrections.
- Where there are known sources of uncertainty, estimates of the size of those uncertainties have been made.
4. Blending Land and Marine Data
- To make a data set with global coverage the land and marine data must be combined.
- The aim of weighting by area was to place more weight on the more reliable data source where possible.
- As the land and marine errors are independent, this choice of weighting gives the lowest measurement and sampling error for the blended mean, giving an error in the blended mean of EQUATION.
- The smaller SST errors mean that the blended temperatures for coastal and island grid boxes are dominated by the SST temperatures.
5. Variance Adjustment
- Assigning a grid box anomaly simply as the mean of the observational anomalies in that grid box produces a good estimate of the actual temperature anomaly.
- The error estimates for the gridded data have been used to devise a simpler adjustment method applicable to both land and marine data, and the adjustment process has been tested on synthetic data to ensure that it does not introduce biases into the data.
- The previous version of the variance adjusted data set, HadCRUT2v, started in 1870. [62].
- Variance adjustment is successful at the individual grid box scale: Comparison with synthetic data shows that the inflation of the grid box variance caused by the limited number of observations can be removed without introducing biases into the grid box series.
- In particular, global and regional time series should be calculated using unadjusted data.
6. Analyses of the Gridded Data Set
- From the 5°Â 5°gridded data set and its comprehensive set of uncertainty estimates it is possible to calculate a large variety of climatologically interesting summary statistics and their uncertainty ranges.
- Of this variety, global and regional temperature time series probably have the widest appeal, so some illustrative examples of these are presented here.
6.1. Hemispheric and Global Time Series
- If the gridded data had complete coverage of the globe or the region to be averaged, then making a time series would be a simple process of averaging the gridded data and making allowances for the relative sizes of the grid boxes and the known uncertainties in the data.
- To estimate the missing data uncertainty of the HadCRUT3 mean for a particular month, the reanalysis data for that calendar month in each of the 50+ years is subsampled to have the same coverage as HadCRUT3, and the difference between the complete average and the subsampled average anomaly is calculated in each of the 50+ cases.
- Similarly, estimates can be made of uncertainties of coverage uncertainties for smoothed annual or decadal averages. [67].
- The grid box sampling and measurement errors are greatly reduced when the gridded data are averaged into large-scale means, so the only other important uncertainty component of global and regional time series is that owning to the biases in the data.
- This is dealt with by making data sets with allowances for bias uncertainties incorporated.
6.1.1. Global Averages
- The global temperature is calculated as the mean of the Northern and Southern Hemisphere series (to stop the better sampled Northern Hemisphere from dominating the average).
- The monthly averages are dominated by shortterm fluctuations in the anomalies; combining the data into annual averages produces a clearer picture, and smoothing the annual averages with a 21-term binomial filter highlights the low-frequency components and shows the importance of the bias uncertainties. [69].
- The dominant bias uncertainties are those due to bucket correction [Rayner et al., 2006] and thermometer exposure changes [Parker, 1994] both of which are large before the 1940s. [70].
- The station, sampling and measurement, and coverage errors depend on the number and distribution of the observations, and these components of the error decrease steadily with time as the number of observations increases.
- The bias uncertainties, however, do not reduce with spatial or temporal averaging, and they are largest in the early twentieth century; so the smoothed annual series, where the uncertainty is dominated by the bias uncertainties, also has its largest uncertainty in this period. [71].
6.1.2. Hemispheric Averages
- Comparing the smoothed mean temperature time series for the Northern Hemisphere and Southern Hemisphere shows the difference in uncertainties between the two hemispheres.
- The difference in the uncertainty ranges for the two series stems from the very different land/sea ratio of the two hemispheres.
- The Northern Hemisphere has more land, and so a larger station, sampling and measurement error , but it has more observations and so a smaller coverage uncertainty.
- The bias uncertainties are also larger in the Northern Hemisphere both because it has more land (especially in the tropics where the land biases are large), and because the SST bias uncertainties are largest in the Northern Hemisphere western boundary current regions where the SST can be very different from the air temperature [Rayner et al., 2006] . [73].
- So the previously observed increase in the interhemispheric difference in the mid twentieth century [see, e.g., Folland et al., 1986; Kerr, 2005] is shown to be significantly outside the uncertainties.
6.2. Differences Between Land and Marine Data
- Comparison of global average time series for landonly and marine-only data demonstrates both a marked agreement in the temperature trends, and a large difference in the uncertainties.
- The black line is the best estimate value; the red band gives the 95% uncertainty range caused by station, sampling, and measurement errors; the green band adds the 95% error range due to limited coverage; and the blue band adds the 95% error range due to bias errors. [75].
- There are much larger uncertainties in the land data because the surface air temperature over land is much more variable than the SST.
- The difference between the land and sea temperatures is not distinguishable from zero until about 1980.
- There are several possible causes for the recent increase:.
6.3. Comparison of Global Time Series With Previous Versions
- Figure 13 shows time series of the global average of the land data, the marine data, and the blended data set with their uncertainty ranges, and compares them to the previous versions of each data set. [78].
- The additions and improvements made to the land data do not make any large differences to the global land average, except very early in the record where the uncertainties are large.
- The differences between the old and new marine data series are sometimes outside the error range of the new series.
- For the marine data, climatologies are specified for each grid box, and they are constant in time, so uncertainties in the marine climatology do not contribute directly to uncertainties in changes in marine temperature anomalies).
- Even after removing the constant offset produced by the climatology change, there are still differences between the old and new SST series that are larger than the assessed random and sampling errors.
6.4. Comparison With Central England Temperature
- The Central England Temperature (CET) series is the longest instrumental temperature record in the world [Parker et al., 1992] .
- It records the temperature of a triangular portion of England bounded by London, Herefordshire and Lancashire, and provides mean daily temperature estimates back to 1772.
- Comparing the CET data with the corresponding grid box in CRUTEM3 shows encouraging agreement: Despite being based on largely different observations, the two series agree within their uncertainties. [82].
- The uncertainty varies in time because, unlike the land data, the number of SST observations changes with time:.
- For example when looking at paleodata from tree rings near coasts it is probably better to use the land data set CRUTEM3 than the blended data set HadCRUT3.
7. Conclusions
- A new version of the gridded historical surface temperature data set HadCRUT3 has been produced.
- This data set is a collaborative product of scientists at the Met Office Hadley Centre (who provide the marine data), and at the Climatic Research Unit at the University of East Anglia (who provide the land surface data).
- The principal advance over previous versions of the data set [Jones et al., 2001; Jones and Moberg, 2003] is in the provision of a comprehensive set of uncertainties to accompany the gridded temperature anomalies. [86].
- All the gridded data sets, and some time series derived from them, are available from the Web sites http://www.hadobs.org and http://www.cru.uea.ac.uk. [87].
- Many marine observations from the first half of the nineteenth century are known to exist in log books kept in the British Museum and the U.K. National Archive, but these observations have never been digitized.
Did you find this useful? Give us your feedback
Citations
7,720 citations
5,552 citations
Cites methods from "Uncertainty estimates in regional a..."
...10 is also warmer than CRUTEM3 during the 1935–1950 period (warming was strongest in the high latitudes – e.g. Kuzmina et al., 2008 – and interpolation can again explain differences between the two datasets)....
[...]
...CRUTEM3 was utilized for hemispheric comparisons; CRUTEM4 for a more spatially detailed analysis of trends, in the final paragraph of this section....
[...]
...For temperature, the Northern Hemisphere mean agrees well with the CRUTEM3 (Brohan et al., 2006) dataset (much of the station data is common to both datasets, though the methods of gridding the data are different) but the less well sampled Southern Hemisphere shows differences before 1950 that are…...
[...]
...10 compared with CRUTEM3, which does not interpolate to infill the (coarser resolution) grid cells that do not contain any station data (see also Jones et al., 2012)....
[...]
...Post-1999 TMP CLIMAT data were replaced by using TMP calculated directly from the BoM data (David Jones, BoM, pers. comm. and see also Brohan et al., 2006)....
[...]
3,385 citations
3,043 citations
2,957 citations
Cites background or methods or result from "Uncertainty estimates in regional a..."
...For the recent period, since 1950, the merged.v3 errors are slightly smaller than the Brohan et al. (2006) estimates....
[...]
...In addition, the total global error estimates of Brohan et al. (2006) are similar to the merged.v3 total global error estimates....
[...]
...For LST, the noise-to-signal variance ratio for an individual station was estimated by assuming a ratio of 1 for an individual observation....
[...]
...The LF analysis gives the background climate-change variations that the interannual variations modulate....
[...]
...This is in part because Brohan et al. (2006) do not interpolate to fill all locations, so their sampling error for the global average is larger....
[...]
References
28,145 citations
13,366 citations
2,674 citations
2,018 citations
1,447 citations