Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850

doi:10.1029/2005JD006548

Journal Article•DOI•

Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850

Philip Brohan¹, John Kennedy¹, Ian Harris², Simon F. B. Tett³, Philip Jones² - Show less +1 more•Institutions (3)

Hadley Centre for Climate Prediction and Research¹, University of East Anglia², University of Reading³

27 Jun 2006-Journal of Geophysical Research (Wiley-Blackwell)-Vol. 111, pp 1-21

TL;DR: HadCRUT3 as mentioned in this paper is a new version of this data set, benefiting from recent improvements to the sea surface temperature data set which forms its marine component, and from improving to the station records which provide the land data.

read less

Abstract: [1] The historical surface temperature data set HadCRUT provides a record of surface temperature trends and variability since 1850. A new version of this data set, HadCRUT3, has been produced, benefiting from recent improvements to the sea surface temperature data set which forms its marine component, and from improvements to the station records which provide the land data. A comprehensive set of uncertainty estimates has been derived to accompany the data: Estimates of measurement and sampling error, temperature bias effects, and the effect of limited observational coverage on large-scale averages have all been made. Since the mid twentieth century the uncertainties in global and hemispheric mean temperatures are small, and the temperature increase greatly exceeds its uncertainty. In earlier periods the uncertainties are larger, but the temperature increase over the twentieth century is still significantly larger than its uncertainty.

...read moreread less

Summary (6 min read)

Jump to: [1. Introduction] – [2.1. Station Data] – [2.1.1. Additional Stations and Data] – [2.1.2. Quality Control] – [2.3. Uncertainties] – [2.3.1. Station Errors] – [2.3.2. Sampling Error] – [2.3.4. Combining the Uncertainties] – [3. Marine Data] – [4. Blending Land and Marine Data] – [5. Variance Adjustment] – [6. Analyses of the Gridded Data Set] – [6.1. Hemispheric and Global Time Series] – [6.1.1. Global Averages] – [6.1.2. Hemispheric Averages] – [6.2. Differences Between Land and Marine Data] – [6.3. Comparison of Global Time Series With Previous Versions] – [6.4. Comparison With Central England Temperature] and [7. Conclusions]

1. Introduction

The historical surface temperature data set Had-CRUT [Jones, 1994; Jones and Moberg, 2003] has been extensively used as a source of information on surface temperature trends and variability [Houghton et al., 2001] .
Since the last update, which produced HadCRUT2 [Jones and Moberg, 2003] , important improvements have been made in the marine component of the data set [Rayner et al., 2006] .
These include the use of additional observations, the development of comprehensive uncertainty estimates, and technical improvements that enable, for instance, the production of gridded fields at arbitrary resolution. [3].
These new developments include improvements to: the land station data, the process for blending land data with marine data to give global coverage, and the statistical process of adjusting the variance of the gridded values to allow for varying numbers of contributing observations.
Results and uncertainties for the new blended, global data set, called HadCRUT3, are presented.

2.1. Station Data

The land surface component of HadCRUT is derived from a collection of homogenized, quality-controlled, monthly averaged temperatures for 4349 stations.
This collection has been expanded and improved for use in the new data set.

2.1.1. Additional Stations and Data

New stations and data were added for Mali, the Democratic Republic of Congo, Switzerland [Begert et al., 2005] and Austria.
Data for 16 Austrian stations were completely replaced with revised values.
A total of 29 Mali series were affected: 5 had partial new data, 8 had completely new data, and 16 were new stations.
As well as the new stations discussed above, additional monthly data have been obtained for stations in Antarctica [Turner et al., 2005] , while additional data for many stations have been added from the National Climatic Data Centre publication Monthly Climatic Data for the World.

2.1.2. Quality Control

Much additional quality control has also been undertaken.
Only a small fraction of the data needed correction, however; of the more than 3.7 million monthly station values, the ERA-40 comparison found about 10 doubtful grid boxes and the visual inspection about 270 monthly outliers. [8].
These duplicates have arisen where the same station data are assimilated into the archive from two different sources, and the two sources give the same station but with different names and WMO identifiers.
Where there are insufficient station data to achieve this for the period, normals were derived from WMO values [World Meteorological Organization (WMO), 1996] or inferred from surrounding station values [Jones et al., 1985] .
Figure 1 shows the locations of the stations used, and indicates those where changes have been made.

2.3. Uncertainties

To use the data for quantitative, statistical analysis, for instance, a detailed comparison with GCM results, the uncertainties of the gridded anomalies are a useful additional field.
Black circles mark all stations, green circles mark deleted stations, blue circles mark stations added, and red circles mark stations edited.
Impossible, because it is always possible that some unknown error has contaminated the data, and no quantitative allowance can be made for such unknowns.
There are, however, several known limitations in the data, and estimates of the likely effects of these limitations can be made (Defense secretary Rumsfeld press conference, June 6, Back to disarmament documentation, June 2002, London, The Acronym Institute (available at www.acronym.org.uk/docs/ 0206/doc04.htm)).
This means that uncertainty estimates need to be accompanied by an error model: a precise description of what uncertainties are being estimated. [14].

2.3.1. Station Errors

The uncertainties in the reported station monthly mean temperatures can be further sub divided.
The values being gridded are anomalies, calculated by subtracting the station normal from the observed temperature, so errors in the station normals must also be considered.
So the error in the monthly average will be at most 0.2/ ffiffiffiffiffi 60 p = 0.03°C and this will be uncorrelated with the value for any other station or the value for any other month. [19].
So this does not contribute to the measurement error.

2.3.2. Sampling Error

Even if the station temperature anomalies had no error, the mean of the station anomalies in a grid box would not necessarily be equal to the true spatial average temperature anomaly for that grid box.
This difference is the sampling error; and it will depend on the number of stations in the grid box, on the positions of those stations, and on the actual variability of the climate in the grid box.
The spatial distribution of sampling error , like the station error, is dominated by the station standard deviations and the number of observations.
The distribution is very similar to that for the station error.

2.3.4. Combining the Uncertainties

The total uncertainty value for any grid box can be obtained by adding the station error, sampling error, and bias error estimates for that grid box in quadrature.
This gives the total uncertainty for each grid box for each month. [44].
In practice, however, this combined uncertainty is less useful than the individual components.
The combined effect of grid box sampling errors will be small for any continental-scale or hemispheric-scale average (though the lack of global coverage introduces an additional source of sampling error, this is discussed in section 6.1).
Combined station errors will be small for largescale spatial averages, but remain important for averages over long periods of the same small grid box.

3. Marine Data

The marine data used are from the sea surface temperature data set HadSST2 [Rayner et al., 2006] .
Previous versions of HadCRUT use the SST data set MOHSST6 [Parker et al., 1995] .
It has been shown, for example, by Parker et al. [1994] , that this is the case, and that marine SST measurements provide more useful data and smaller sampling errors than marine air temperature measurements would.
Like the land data, the marine data set has known errors: Estimates have been made of the measurement and sampling error, and the uncertainty in the bias corrections.
Where there are known sources of uncertainty, estimates of the size of those uncertainties have been made.

4. Blending Land and Marine Data

To make a data set with global coverage the land and marine data must be combined.
The aim of weighting by area was to place more weight on the more reliable data source where possible.
As the land and marine errors are independent, this choice of weighting gives the lowest measurement and sampling error for the blended mean, giving an error in the blended mean of EQUATION.
The smaller SST errors mean that the blended temperatures for coastal and island grid boxes are dominated by the SST temperatures.

5. Variance Adjustment

Assigning a grid box anomaly simply as the mean of the observational anomalies in that grid box produces a good estimate of the actual temperature anomaly.
The error estimates for the gridded data have been used to devise a simpler adjustment method applicable to both land and marine data, and the adjustment process has been tested on synthetic data to ensure that it does not introduce biases into the data.
The previous version of the variance adjusted data set, HadCRUT2v, started in 1870. [62].
Variance adjustment is successful at the individual grid box scale: Comparison with synthetic data shows that the inflation of the grid box variance caused by the limited number of observations can be removed without introducing biases into the grid box series.
In particular, global and regional time series should be calculated using unadjusted data.

6. Analyses of the Gridded Data Set

From the 5°Â 5°gridded data set and its comprehensive set of uncertainty estimates it is possible to calculate a large variety of climatologically interesting summary statistics and their uncertainty ranges.
Of this variety, global and regional temperature time series probably have the widest appeal, so some illustrative examples of these are presented here.

6.1. Hemispheric and Global Time Series

If the gridded data had complete coverage of the globe or the region to be averaged, then making a time series would be a simple process of averaging the gridded data and making allowances for the relative sizes of the grid boxes and the known uncertainties in the data.
To estimate the missing data uncertainty of the HadCRUT3 mean for a particular month, the reanalysis data for that calendar month in each of the 50+ years is subsampled to have the same coverage as HadCRUT3, and the difference between the complete average and the subsampled average anomaly is calculated in each of the 50+ cases.
Similarly, estimates can be made of uncertainties of coverage uncertainties for smoothed annual or decadal averages. [67].
The grid box sampling and measurement errors are greatly reduced when the gridded data are averaged into large-scale means, so the only other important uncertainty component of global and regional time series is that owning to the biases in the data.
This is dealt with by making data sets with allowances for bias uncertainties incorporated.

6.1.1. Global Averages

The global temperature is calculated as the mean of the Northern and Southern Hemisphere series (to stop the better sampled Northern Hemisphere from dominating the average).
The monthly averages are dominated by shortterm fluctuations in the anomalies; combining the data into annual averages produces a clearer picture, and smoothing the annual averages with a 21-term binomial filter highlights the low-frequency components and shows the importance of the bias uncertainties. [69].
The dominant bias uncertainties are those due to bucket correction [Rayner et al., 2006] and thermometer exposure changes [Parker, 1994] both of which are large before the 1940s. [70].
The station, sampling and measurement, and coverage errors depend on the number and distribution of the observations, and these components of the error decrease steadily with time as the number of observations increases.
The bias uncertainties, however, do not reduce with spatial or temporal averaging, and they are largest in the early twentieth century; so the smoothed annual series, where the uncertainty is dominated by the bias uncertainties, also has its largest uncertainty in this period. [71].

6.1.2. Hemispheric Averages

Comparing the smoothed mean temperature time series for the Northern Hemisphere and Southern Hemisphere shows the difference in uncertainties between the two hemispheres.
The difference in the uncertainty ranges for the two series stems from the very different land/sea ratio of the two hemispheres.
The Northern Hemisphere has more land, and so a larger station, sampling and measurement error , but it has more observations and so a smaller coverage uncertainty.
The bias uncertainties are also larger in the Northern Hemisphere both because it has more land (especially in the tropics where the land biases are large), and because the SST bias uncertainties are largest in the Northern Hemisphere western boundary current regions where the SST can be very different from the air temperature [Rayner et al., 2006] . [73].
So the previously observed increase in the interhemispheric difference in the mid twentieth century [see, e.g., Folland et al., 1986; Kerr, 2005] is shown to be significantly outside the uncertainties.

6.2. Differences Between Land and Marine Data

Comparison of global average time series for landonly and marine-only data demonstrates both a marked agreement in the temperature trends, and a large difference in the uncertainties.
The black line is the best estimate value; the red band gives the 95% uncertainty range caused by station, sampling, and measurement errors; the green band adds the 95% error range due to limited coverage; and the blue band adds the 95% error range due to bias errors. [75].
There are much larger uncertainties in the land data because the surface air temperature over land is much more variable than the SST.
The difference between the land and sea temperatures is not distinguishable from zero until about 1980.
There are several possible causes for the recent increase:.

6.3. Comparison of Global Time Series With Previous Versions

Figure 13 shows time series of the global average of the land data, the marine data, and the blended data set with their uncertainty ranges, and compares them to the previous versions of each data set. [78].
The additions and improvements made to the land data do not make any large differences to the global land average, except very early in the record where the uncertainties are large.
The differences between the old and new marine data series are sometimes outside the error range of the new series.
For the marine data, climatologies are specified for each grid box, and they are constant in time, so uncertainties in the marine climatology do not contribute directly to uncertainties in changes in marine temperature anomalies).
Even after removing the constant offset produced by the climatology change, there are still differences between the old and new SST series that are larger than the assessed random and sampling errors.

6.4. Comparison With Central England Temperature

The Central England Temperature (CET) series is the longest instrumental temperature record in the world [Parker et al., 1992] .
It records the temperature of a triangular portion of England bounded by London, Herefordshire and Lancashire, and provides mean daily temperature estimates back to 1772.
Comparing the CET data with the corresponding grid box in CRUTEM3 shows encouraging agreement: Despite being based on largely different observations, the two series agree within their uncertainties. [82].
The uncertainty varies in time because, unlike the land data, the number of SST observations changes with time:.
For example when looking at paleodata from tree rings near coasts it is probably better to use the land data set CRUTEM3 than the blended data set HadCRUT3.

7. Conclusions

A new version of the gridded historical surface temperature data set HadCRUT3 has been produced.
This data set is a collaborative product of scientists at the Met Office Hadley Centre (who provide the marine data), and at the Climatic Research Unit at the University of East Anglia (who provide the land surface data).
The principal advance over previous versions of the data set [Jones et al., 2001; Jones and Moberg, 2003] is in the provision of a comprehensive set of uncertainties to accompany the gridded temperature anomalies. [86].
All the gridded data sets, and some time series derived from them, are available from the Web sites http://www.hadobs.org and http://www.cru.uea.ac.uk. [87].
Many marine observations from the first half of the nineteenth century are known to exist in log books kept in the British Museum and the U.K. National Archive, but these observations have never been digitized.

Did you find this useful? Give us your feedback

Figures (15)

Figure 13. New data set versions and their 95% uncertainty ranges (in blue), compared with the previous version of each data set (in red): (top) land data, (middle) marine data, and (bottom) combined data.

Figure 5. CRUTEM3 station errors ( C) for January 1969.

Figure 6. CRUTEM3 sampling errors ( C) for January 1969.

Figure 9. HadCRUT3 measurement and sampling error ( C) for January 1969.

Figure 12. Global average of land and marine components of HadCRUT3 ( C): (top) land, (middle) sea, and (bottom) difference (land sea). The black line is the best estimate value; the red band gives the 95% uncertainty range caused by station, sampling, and measurement errors; the green band adds the 95% error range due to limited coverage; and the blue band adds the 95% error range due to bias errors.

Figure 1. Land station coverage. Black circles mark all stations, green circles mark deleted stations, blue circles mark stations added, and red circles mark stations edited. Many station edits are minor changes, involving, for instance, the correction of a single outlier.

Figure 11. HadCRUT3 hemisphere temperature anomaly time series ( C): (top) Northern Hemisphere (NH), (middle) Southern Hemisphere (SH), and (bottom) difference (NH SH). The black line is the best estimate value; the red band gives the 95% uncertainty range caused by station, sampling, and measurement errors; the green band adds the 95% error range due to limited coverage; and the blue band adds the 95% error range due to bias errors.

Figure 3. CRUTEM3 anomalies ( C) for January 1969 (North America, HadGEM1 model grid (1.875 1.25 )).

Figure 2. CRUTEM3 anomalies ( C) for January 1969 (global, 5 5 ).

Figure 15. HadCRUT3 (for 50–55 N, 0–5 W) comparison with CET (error ranges are 95%).

Figure 10. HadCRUT3 global temperature anomaly time series ( C) at (top) monthly, (center) annual, and (bottom) smoothed annual resolutions. The black line is the best estimate value; the red band gives the 95% uncertainty range caused by station, sampling, and measurement errors; the green band adds the 95% error range due to limited coverage; and the blue band adds the 95% error range due to bias errors.

Figure 8. Land data blending weight for January 1969. (Greater emphasis on the land would give numbers closer to one).

Figure 7. HadCRUT3 anomalies ( C) for January 1969.

Figure 4. Distribution of station homogeneity adjustments ( C). The solid line is the distribution of the adjustments known to have been made (763 adjustments from Jones et al. [1985, 1986] and Vincent and Gullet [1999]), the dashed line is a hypothesized distribution of the adjustments required, and the dotted line is the difference and so the distribution of homogeneity adjustment error.

Figure 14. CRUTEM3 (for 50–55 N, 0–5 W) comparison with Central England Temperature (CET) (error ranges are 95%).

Content maybe subject to copyright Report

Edinburgh Research Explorer

Uncertainty estimates in regional and global observed

temperature changes: A new data set from 1850

Citation for published version:

Brohan, P, Kennedy, JJ, Harris, I, Tett, SFB & Jones, PD 2006, 'Uncertainty estimates in regional and

global observed temperature changes: A new data set from 1850', Journal of Geophysical Research, vol.

111, no. D12, D12106, pp. 1-21. https://doi.org/10.1029/2005JD006548

Digital Object Identifier (DOI):

10.1029/2005JD006548

Link:

Link to publication record in Edinburgh Research Explorer

Document Version:

Publisher's PDF, also known as Version of record

Published In:

Journal of Geophysical Research

Publisher Rights Statement:

Published in the Journal of Geophysical Research: Atmospheres by the American Geophysical Union (2006)

General rights

and / or other copyright owners and it is a condition of accessing these publications that users recognise and

abide by the legal requirements associated with these rights.

Take down policy

The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer

content complies with UK legislation. If you believe that the public display of this file breaches copyright please

contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and

investigate your claim.

Download date: 10. Aug. 2022

Uncertainty estimates in regional and global observed

temperature changes: A new data set from 1850

P. Brohan,

J. J. Kennedy,

I. Harris,

S. F. B. Tett,

and P. D. Jones

Received 2 August 2005; revised 19 December 2005; accepted 14 February 2006; published 24 June 2006.

[1] The historical surface temperature data set HadCRUT provides a record of surface

temperature trends and variability since 1850. A new version of this data set, HadCRUT3,

has been produced, benefiting from recent improvements to the sea surface temperature

data set which forms its marine component, and from improvements to the station records

which provide the land data. A comprehensive set of uncertainty estimates has been

derived to accompany the data: Estimates of measurement and sampling error, temperature

bias effects, and the effect of limited observational coverage on large-scale averages have

all been made. Since the mid twentieth century the uncertainties in

global and hemispheric mean temperatures are small, and the temperature increase greatly

exceeds its uncertainty. In earlier periods the uncertainties are larger, but the

temperature increase over the twentieth century is still significantly larger than its

uncertainty.

Citation: Brohan, P., J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones (2006), Uncertainty estimates in regional and global

observed temperature changes: A new data set from 1850, J. Geophys. Res., 111, D12106, doi:10.1029/2005JD006548.

1. Introduction

[2] The historical surface temperature data set Had-

CRUT [Jones, 1994; Jones and Moberg, 2003] has been

extensively used as a source of information on surface

temperature trends and variability [Houghton et al.,

2001]. Since the last update, which produced HadCRUT2

[Jones and Moberg, 2003], important improvements have

been made in the marine component of the data set

[Rayner et al., 2006]. These include the use of additional

observations, the development of comprehensive uncer-

tainty estimates, and technical improvements that enable,

for instance, the production of gridded fields at arbitrary

resolution.

[

3] This paper describes work to produce a new data set

version, HadCRUT3, which will extend the advances made

to the marine data to the global data set. These new

developments include improvements to: the land station

data, the process for blending land data with marine data

to give global coverage, and the statistical process of

adjusting the variance of the gridded values to allow for

varying numbers of contributing observations. Results and

uncertainties for the new blended, global data set, called

HadCRUT3, are presented.

2. Land Surface Data

2.1. Station Data

[

4] The land surface component of HadCRUT is derived

from a collection of homogenized, quality-controlled,

monthly averaged temperatures for 4349 stations. This

collection has been expanded and improved for use in the

new data set.

2.1.1. Additional Stations and Data

[

5] New stations and data were added for Mali, the

Democratic Republic of Congo, Switzerland [Begert et

al., 2005] and Austria. Data for 16 Austrian stations were

completely replaced with revised values. A total of 29 Mali

series were affected: 5 had partial new data, 8 had com-

pletely new data, and 16 were new stations. Five Swiss

stations were updated for the period 1864 –2001 [Begert et

al., 2005]. Thirty-three Congolese stations were affected:

Thirteen were new stations, and 20 were updates to existing

stations.

[

6] As well as the new stations discussed above, addi-

tional monthly data have been obtained for stations in

Antarctica [Turner et al., 2005], while additional data for

many stations have been added from the National Climatic

Data Centre publication Monthly Climatic Data for the

World.

2.1.2. Quality Control

[

7] Much additional quality control has also been

undertaken. A comparison [Simmons et al., 2004] of the

Climatic Research Unit (CRU) land temperature data with

the ERA-40 reanalysis found a few areas where the

station data were doubtful, and this was augmented by

visual examination of individual station records looking

for outliers. Some bad values were identified and either

corrected or removed. Only a small fraction of the data

needed correction, however; of the more than 3.7 million

JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 111, D12106, doi:10.1029/2005JD006548, 2006

Hadley Centre for Climate Prediction and Research, Met Office,

Exeter, UK.

Climatic Research Unit, School of Environmental Sciences, University

of East Anglia, Norwich, UK.

Met Office Hadley Centre (Reading Unit), University of Reading,

Reading, UK.

Published in 2006 by the American Geophysical Union.

D12106 1of21

monthly station values, the ERA-40 comparison found

about 10 doubtful grid boxes and the visual inspection

about 270 monthly outliers.

[

8] Checking the station data for identical sequences in

all possible station pairs turned up 53 stations which were

duplicates of others. These duplicates have arisen where the

same station data are assimilated into the archive from two

different sources, and the two sources give the same station

but with different names and WMO identifiers. The dupli-

cate stations were merged and duplicate temperature data

were deleted.

[

9] Also the station normals and standard deviations were

improved. The station normals (monthly averages over the

normal period 1961 –1990) are generated from station data

for this period where possible. Where there are insufficient

station data to a chieve this for the period, normals

were derived from WMO values [World Meteorological

Organization (WMO), 1996] or inferred from surrounding

station values [Jones et al., 1985]. For 617 stations, it was

possible to replace the additional WMO normals (used by

Jones and Moberg [2003]) with normals derived from the

station data. This was made possible by relaxing the

requirement to have data for 4 years in each of the three

decades in 1961 –1990 (the requirement now is simply to

have at least 15 years of data in this period), so reducing the

number of stations using the seemingly less reliable WMO

normals. As well as making the normals less uncertain (see

the discussion of normal error below), these improved

normals mean that the gridded fields of temperature anoma-

lies are much closer to zero over the normal period than was

the case for previous versions of the data set. Figure 1

shows the locations of the stations used, and indicates those

where changes have been made.

2.2. Gridding

[

10] To interpolate the station data to a regular grid the

methods of [Jones and Moberg, 2003] are followed. Each

grid box value is the mean of all available station anomaly

values, except that station outliers in excess of five standard

deviations are omitted.

[

11] Two changes have been made in the gridding pro-

cess. The station anomalies can now be gridded to any

spatial resolution, instead of being limited to a 5  5

resolution; this simplifies comparison of the gridded data

with General Circulation Model (GCM) results. Also pre-

vious versions of the data set did some infilling of missing

grid box values using data from surrounding grid boxes

[Jones et al., 2001]. This is no longer done, allowing

the attribution of an uncertainty to each grid box value.

The resulting gridded land-only data set has been given the

name CRUTEM3. The previous version of this data set,

CRUTEM2, started in 1851: In CRUTEM3 the start date

has been extended back to 1850 to match the marine data

(section 3). Figure 2 shows a gridded field for an example

month, at the standard 5  5 degree resolution.

[

12] For comparison with GCM results, or for regional

studies of areas where observations are plentiful, it can be

useful to perform the gridding at higher resolution. Figure 3

shows a gridded field for the same example month, at the

resolution of the HadGEM1 model [Johns et al., 2004], but

only for North America.

2.3. Uncertainties

[

13] To use the data for quantitative, statistical analysis,

for instance, a detailed comparison with GCM results, the

uncertainties of the gridded anomalies are a useful addi-

tional field. A definitive assessment of uncertainties is

Figure 1. Land station coverage. Black circles mark all stations, green circles mark deleted stations,

blue circles mark stations added, and red circles mark stations edited. Many station edits are minor

changes, involving, for instance, the correction of a single outlier.

D12106 BROHAN ET AL.: HADCRUT3

2of21

D12106

impossible, because it is always possible that some un-

known error has contaminated the data, and no quantitative

allowance can be made for such unknowns. There are ,

however, several known limitations in the data, and esti-

mates of the likely effects of these limitations can be made

(Defense secretary Rumsfeld press conference, June 6, Back

to disarmament documentation, June 2002, London, The

Acronym Institute (available at www.acronym.org.uk/docs/

0206/doc04.htm)). This means that uncertainty estimates

need to be accompanied by an error model: a precise

description of what uncertainties are being estimated.

[

14] Uncertainties in the land data can be divided into

three groups: (1) station error, the uncertainty of individual

station anomalies; (2) sampling error, the uncertainty in a

grid box mean caused by estimating the mean from a small

number of point values; and (3) bias error, the uncertainty in

large-scale temperatures caused by systematic changes in

measurement methods.

2.3.1. Station Errors

[

15] The uncertainties in the reported station monthly

mean temperatures can be further sub divided. Suppose

actual

¼ T

þ 

þ C

þ 

; ð1Þ

where T

actual

is the actual station mean monthly tempera-

ture, T

is the reported temperature, 

is the measurement

error, C

is any homogenization adjustment that may have

been applied to the reported temperature and 

is the

uncertainty in that adjustment, and 

is the uncertainty

due to inaccurate calculation or miss reporting of the station

mean temperature.

[

16] The values being gridded are anomalies, calculated

by subtracting the station normal from the observed tem-

perature, so errors in the station normal s must also be

considered.

actual

¼ T

 T

þ 

þ C

þ 

; ð2Þ

where A

actual

is the actual temperature anomaly, T

is the

estimated station normal, and 

is the error in T

[

17] The basic station data include normals and may have

had homogenization adjustments applied, so they provide

+ C

and T

; also needed are estimates for 

, 

and 

2.3.1.1. Measurement Error (

)

[

18] The random error in a single thermometer reading is

about 0.2C(1s)[Folland et al., 2001]; the monthly

average will be based on at least two readings a day

throughout the month, giving 60 or more values contribut-

ing to the mean. So the error in the monthly average will be

at most 0.2/

ﬃﬃﬃﬃﬃ

= 0.03C and this will be uncorrelated with

the value for any other station or the value for any other

month.

Figure 2. CRUTEM3 anomalies (C) for January 1969 (global, 5  5).

Figure 3. CRUTEM3 anomalies (C) for January 1969 (North America, HadGEM1 model grid

(1.875  1.25)).

D12106 BROHAN ET AL.: HADCRUT3

3of21

D12106

[19] There will be a difference between the true mean

monthly temperature (i.e., from 1 min averages) and the

average calculated by each station from measurements made

less often; but this difference will also be present in the

station normal and will cancel in the anomaly. So this does

not contribute to the measurement error. If a station changes

the way mean monthly temperature is calculated it will

produce an inhomogeneity in the station temperature series,

and uncertainties due to such changes will form part of the

homogenization adjustment error.

2.3.1.2. Homogenization Adjustment Error (

)

[

20] Inhomogeneities are introduced into the station tem-

perature series by such things as changes in the station site,

changes in measurement time, or changes in instrumenta-

tion. The station data that are used to make HadCRUT have

been adjusted to remove these inhomogeneities, but such

adjustments are not exact; there are uncertainties associated

with them.

[

21] For some stations both the adjusted and unadjusted

time series are archived at CRU and so the adjustments that

have been made are known [Jone s et al., 1985, 1986;

Vincent and Gullet, 1999], but for most stations only a

single series is archived, so any adjustments that might have

been made (e.g., by National Met. services or individual

scientists) are unknown.

[

22] Making a histo gram of the adjustments applied

(where these are known) gives the solid line in Figure 4.

Inhomogeneities will come in all sizes, but large inhomo-

geneities are more likely to be found and adjusted than

small ones. So the distribution of adjustments is bimodal,

and can be interpreted as a bell-shaped distribution with

most of the central, small, values missing.

[

23] Hypothesizing that the distribution of adjustments

required is Gaussian, with a standard deviation of 0.75C

gives the dashed line in Figure 4 which matches the number

of adjustments made where the adjustments are large, but

suggests a large number of missing small adjustments. The

homogenization uncertainty is then given by this missing

component (dotted line in Figure 4), which has a standard

deviation of 0.4C. This uncertainty applies to both adjusted

and unadjusted data, the former have an uncertainty on

the adjustments made, the latter may require undetected

adjustments.

[

24] The distribution of known adjustments is not sym-

metric; adjustments are more likely to be negative than

positive. The most common reason for a station needing

adjustment is a site move in the 1940 –1960 period. The

earlier site tends to have been warmer than the later one, as

the move is often to an out of town airport. So the adjust-

ments are mainly negative, because the earlier record (in the

town/city) needs to be reduced [Jones et al., 1985, 1986].

Although a real effect, this asymmetry is small compared

with the typical adjustment, and is difficult to quantify; so

the homogenization adjustment uncertainties are treated as

being symmetric about zero.

[

25] The homogenization adjustment applied to a station

is usually constant over long periods: The mean time over

which an adjustment is applied is nearly 40 years [Jones et

al., 1985, 1986; Vincent and Gullet, 1999]. The error in

each adjustment will therefore be constant over the same

period. This means that the adjustment uncertainty is highly

correlated in time: The adjustment uncertainty on a station

value will be the same for a decadal average as for an

individual monthly value.

Figure 4. Distribution of station homogeneity adjustmen ts (C). The solid line is the distribution of the

adjustments known to have been made (763 adjustments from Jones et al. [1985, 1986] and Vincent and

Gullet [1999]), the dashed line is a hypothesized distribution of the adjustments required, and the dotted

line is the difference and so the distribution of homogeneity adjustment error.

D12106 BROHAN ET AL.: HADCRUT3

4of21

D12106

HTML Viewer

Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850

Summary (6 min read)

1. Introduction

2.1. Station Data

2.1.1. Additional Stations and Data

2.1.2. Quality Control

2.3. Uncertainties

2.3.1. Station Errors

2.3.2. Sampling Error

2.3.4. Combining the Uncertainties

3. Marine Data

4. Blending Land and Marine Data

5. Variance Adjustment

6. Analyses of the Gridded Data Set

6.1. Hemispheric and Global Time Series

6.1.1. Global Averages

6.1.2. Hemispheric Averages

6.2. Differences Between Land and Marine Data

6.3. Comparison of Global Time Series With Previous Versions

6.4. Comparison With Central England Temperature

7. Conclusions

Figures (15)

Citations

Cites methods from "Uncertainty estimates in regional a..."

Cites background or methods or result from "Uncertainty estimates in regional a..."

References

Related Papers (5)