The U.S. National Temperature Index, is it based on data? Or corrections?

By Andy May

The United States has a very dense population of weather stations, data from them is collected and processed by NOAA/NCEI to compute the National Temperature Index. The index is an average temperature for the nation and used to show if the U.S. is warming. The data is stored by NOAA/NCEI in their GHCN or “Global Historical Climatology Network” database. GHCN-Daily contains the quality-controlled raw data, which is subsequently corrected and then used to populate GHCN-Monthly, a database of monthly averages, both raw and final. I downloaded version 4.0.1 of the GHCN-Monthly database on October 10, 2020. At that time, it had 27,519 stations globally and 12,514 (45%) of them were in the United States, including Alaska and Hawaii. Of the 12,514 U.S. stations, 11,969 of them are in “CONUS,” the conterminous lower 48 states. The current station coverage is shown in Figure 1.

Figure 1. The GHCN weather station coverage in the United States is very good, except for northern Alaska. There are two stations in the western Pacific that are not shown.

We have several questions about the land-based temperature record, which dominates the long-term (~170-year) global surface temperature record. The land-based measurements dominate because sea-surface temperatures are very sparse until around 2004 to 2007, when the ARGO network of floats became complete enough to provide good data. Even in 2007, the sea-surface gridding error was larger than the detected ocean warming.

Ocean Warming

We have estimated that the oceans, which cover 71% of the Earth’s surface, are warming at a rate of 0.4°C per century, based on the least squares linear trend shown in Figure 2. This is a very rough estimate and based only on data from 2004 to 2019 and temperatures from the upper 2,000 meters of the oceans. The data before 2004 is so sparse we didn’t want to use it. The error in this estimate is roughly ±0.26°C, from the surface to 2,000 meters and unknown below that.

Argo measurements of ocean temperature at 2,000 meters are a fairly constant 2.4°C. So, we assumed a temperature of 0.8°C at the average ocean depth of 3,688 meters (12,100 feet) and below. For context, the freezing point of seawater at 2900 PSI (roughly 2,000 meters or 2,000 decibars) is -17°C. The value of 0.8°C is from deep Argo data as described by Gregory Johnson and colleagues (Johnson, Purkey, Zilberman, & Roemmich, 2019). There are very few measurements of deep ocean temperatures and any estimate has considerable possible error (Gasparin, Hamon, Remy, & Traon, 2020). The anomalies in Figure 2 are based on those assumptions. The calculated temperatures were converted to anomalies from the mean of the ocean temperatures from 2004 through 2019. The data used to make Figure 2 is from Jamstec. An R program to read the Jamstec data and plot it can be downloaded here, the zip file also contains a spreadsheet with more details. Our calculations suggest an overall average 2004-2019 ocean temperature of 4.6°C.

Figure 2. A plot of the global grid of ocean temperatures from JAMSTEC. It is built from ARGO floats and Triton buoy data mostly. Jamstec is the source of the grid used to compute these anomalies.

Observed ocean warming is not at all alarming and quite linear, showing no sign of acceleration. The oceans contain 99.9% of the thermal energy (“heat”) on the surface of the Earth, the atmosphere contains most of the rest. This makes it hard for Earth’s surface to warm very much, since the oceans act as a thermal regulator. Various calculations and constants regarding the heat stored in the oceans and atmosphere are in a spreadsheet I’ve prepared here. References are in the spreadsheet. The oceans control warming with their high heat capacity, which is the amount of thermal energy required to raise the average ocean temperature one degree. The thermal energy required to raise the temperature of the atmosphere 1,000 degrees C would only raise the average ocean temperature one degree.

I only mention this because, while the land-based weather stations provide us with valuable information regarding the weather, they tell us very little about climate change. Longer term changes in climate require much more information than we currently have on ocean warming. That said, let us examine the GHCN data collected in the United States.

The GHCN station data
In the U.S., and in the rest of the world, the land-based weather stations comprise most of the average temperature record in the 19th and 20th centuries. Knowing how accurate they are, and the influence of the corrections applied relative to the observed warming is important. Lots of work has been done to document problems with the land-based data. Anthony Watts and colleagues documented numerous problems with station siting and equipment in 2011 with their surface stations project. Important information on this study by John Neison-Gammon can be seen here and here. The Journal of Geophysical Research paper is here. Many of the radical changes in NOAA’s U.S. temperature index and in the underlying database in the period between 2009 and 2014 are due to the work done by Watts and his colleagues as described by NOAA’s Matthew Menne in his introductory paper on version 2 of the U. S. Historical Climatology Network (USHCN):

“Moreover, there is evidence that a large fraction of HCN sites have poor ratings with respect to the site classification criteria used by the U.S. Climate Reference Network (A. Watts 2008 personal communication; refer also to” (Menne, Williams, & Vose, 2009)

Menne, et al. acknowledged Watt’s and colleagues in their introductory paper to the revised USHCN network of stations, this suggests that the surface stations project was an important reason for the revision. USHCN was a high-quality subset of the full NOAA Cooperative Observer program (COOP) weather station network. The USHCN stations were chosen based upon their spatial coverage, record length, data completeness and historical stability, according to Matthew Menne. A set of quality control checks and corrections were developed to clean up the selected records and these are described in Matthew Menne and colleague’s publications. The main paper is cited above in the boxed quote, but he also wrote a paper to describe their Pairwise Homogenization algorithm, abbreviated “PHA” (Menne & Williams, 2009a). Stations with problems were removed from USHCN as they were found and documented by Watts, et al. As a result, the original 1218 USHCN stations dwindled to ~832 by 2020. The dismantled stations were not replaced, the values were “infilled” statistically using data from neighboring stations.

In early 2014, USHCN subset was abandoned as the source data for the National Temperature Index and replaced with a gridded instance of GHCN, but the corrections developed for USHCN were kept. They were just applied to all 12,514 U.S. GHCN stations, rather than the smaller 1,218 station (or fewer) USHCN subset.

NOAA appears to contradict this in another web page on GHCN-Daily methods. On this page they say that GHCN-Daily does not contain adjustments for historical station changes or time-of-day bias. But they note that GHCN-Monthly does. Thus, it seems that the corrections are done after extracting the daily data and while building the monthly dataset. NOAA does not tamper with the GHCN-Daily raw data, but when they extract it to build GHCN-Monthly, they apply some dramatic corrections, as we will see. Some NOAA web pages hint that the time-of-day bias corrections have been dropped for later releases of GHCN-Monthly, but most explicitly say they are still being used, so we assume they are still in use. One of the most worrying findings was how often, and how radically, NOAA appears to be changing their “correction” procedures.

The evolving U.S. Temperature Index
The current U.S. “National Temperature Index,” draws data from five-kilometer grids of the GHCN-Monthly dataset. The monthly gridded dataset is called nClimGrid, and is a set of map grids, not actual station data. The grids are constructed using “climatologically aided interpolation” (Willmott & Robeson, 1995). The grids are used to populate a monthly average temperature dataset, called nClimDivnClimDiv is used to create the index.

Currently, the NOAA base period for nClimDiv, USHCN, and USCRN anomalies is 1981-2010. We constructed our station anomalies, graphed below, using the same base period. We accepted all stations that had at least 12 monthly values during the base period and rejected stations with fewer. This reduced the number of CONUS stations from 11,969 to 9,307. No stations were interpolated or “infilled” in this study.

Some sources have suggested data outside the GHCN-Daily dataset might be used to help build the nClimDiv monthly grids and temperature index, especially some nearby Canadian and Mexican monthly averages. But NOAA/NCEI barely mention this on their websitenClimDiv contains climate data, including precipitation, and a drought index, as well as average monthly temperature. As mentioned above, the same corrections are made to the GHCN station data as were used in the older USHCN dataset. From the NOAA website:

“The first (and most straightforward) improvement to the nClimDiv dataset involves updating the underlying network of stations, which now includes additional station records and contemporary bias adjustments (i.e., those used in the U.S. Historical Climatology Network version 2)” source of quote: here.

Besides the new fully corrected GHCN-Monthly dataset and the smaller USHCN set of corrected station data, there used to be a third dataset, the original NOAA climate divisional dataset. Like GHCN-Daily and nClimDiv, this older database used all the COOP network of stations. However, the COOP data used in the older Climate Division dataset (called “TCDD” in Fenimore, et al.) was uncorrected. This is explained in a white paper by Chris Fenimore and colleagues (Fenimore, Arndt, Gleason, & Heim, 2011). Further, the data in the older dataset was simply averaged by climate division and state, it was not gridded, like nClimDiv and USHCN. There are some new stations in nClimDiv, but most are the same as in TCDD. The major difference in the two datasets are the corrections and the gridding. Data from this earlier database is plotted as a blue line in Figures 6 and 7 below.

The simple averages used to summarize TCDD, ignored changes in elevation, station moves and other factors that introduced spurious internal trends (discontinuities) in many areas. The newer nClimDiv monthly database team claims to explicitly account for station density and elevation with their “climatologically aided interpolation” gridding method (Fenimore, Arndt, Gleason, & Heim, 2011). The methodology produces the fully corrected and gridded nClimGrid five-kilometer grid dataset.

nClimDiv is more useful since the gradients within the United States in temperature, precipitation and drought are more accurate and contain fewer discontinuities. But, as we explained in previous posts, when nClimDiv is reduced to a yearly conterminous U.S. (CONUS) temperature record, it is very similar to the record created by the older, official temperature record called USHCN, when both are gridded the same way. This may be because, while nClimDiv has many more weather stations, the same corrections are applied to them as were applied to the USHCN stations. While USHCN has fewer stations, they are of higher quality and have longer records. The additional nClimDiv stations, when processed the same way as the USHCN stations, do not change things, at least on a national and yearly level. As noted in a previous post, stirring the manure faster, with more powerful computers and billions of dollars, doesn’t really matter for widespread averages.