Updated 08/14/2014 ***************************************** INTRODUCTION ***************************************** This directory contains the consolidated master monthly database, which is derived from stage 2. The stage 3 dataset contains the recommended version of the merge, along with variants, in order to characterize the uncertainty. A description of the methodology, along with a description of each variant can be found in this directory (merging_methodology.pdf). A manuscript has been accepted to an open access, peer-reviewed journal, and once published, will be provided. Within these directories, data and code have been compressed into gzip format. In order to decompress, the following LINUX command will work tar -zxvf name-of-file.tar.gz Alternatively, if "tar" does not support decompression, a user can try gzip -d name-of-file.tar.gz tar -xvf name-of-file.tar.gz If using WINDOWS, programs like 7-Zip will decompress (http://www.7-zip.org/) ***************************************** CONTENTS OF EACH VARIANT'S DIRECTORY ***************************************** code/ This directory contains all the code and input programs that were used to perform the merge. The code was written in FORTRAN 95, and should be compatible with any compiler. A free compiler can be found here: http://www.g95.org/downloads.shtml Using g95, the code can be compiled using the following command: g95 merge_module.f95 merge_main.f95 There are four input files that must be present in the working directory, or else the program will fail. More information about these files can be found in the manuscript. config_file.txt: configuration file where user is required to fill in information databank_blacklist.txt: a list of candidate stations with known issues with metadata and data databank_sources.txt: a list of all input sources, in prioritized form lookup_IA.txt: CDF lookup table for the Index of Agreement plots/ This directory contains figures that display results of both the merged dataset, as well as the withheld bin. A description of each file is below: merged_locations: Spatial distribution of all stations in merged product, stratified by period of record merged_stations: Number of stations over time, compared to GHCNM-v3 merged_stations-USvsNONUS: Number of stations stratified by those in the US, and outside the US, compared to GHCNM-v3 merged_histogram: Number of stations stratified by station length, compared to GHCNM-v3 merged_gridboxes-GLOBE: Number of global 5 deg X 5 deg gridboxes sampled, compared to GHCNM-v3 merged_gridboxes-NH: Number of Northern Hemisphere 5 deg X 5 deg gridboxes sampled, compared to GHCNM-v3 merged_gridboxes-SH: Number of Southern Hemisphere 5 deg X 5 deg gridboxes sampled, compared to GHCNM-v3 merged_YYYY-YYYY.pdf: Spatial distribution of stations during time period YYYY-YYYY merged_anomaly: Anomaly (Annual) of merged dataset (base period 1961-1990), compared to GHCNM-v3 merged_anomaly-DJF: Anomaly (Dec,Jan,Feb) of merged dataset (base period 1961-1990), compared to GHCNM-v3 merged_anomaly-MAM: Anomaly (Mar,Apr,May) of merged dataset (base period 1961-1990), compared to GHCNM-v3 merged_anomaly-JJA: Anomaly (Jun,Jul,Aug) of merged dataset (base period 1961-1990), compared to GHCNM-v3 merged_anomaly-SON: Anomaly (Sep,Oct,Nov) of merged dataset (base period 1961-1990), compared to GHCNM-v3 withheld_locations: Spatial distribution of all stations in withheld bin, stratified by period of record withheld_flags_XXX: Spatial distribution of withheld flags (102-107) results/ This directory provides ASCII text / binary files of both the merged and withheld product. Results are generated in three different formats. The first is the format outlined by the International Surface Temperature Iniative (ISTI) where there is one inventory file and files for each individual station. The second format follows the standard GHCN-Monthly version 3 format, where there is only one inventory file and one data file. Finally, netCDF files have also been provided and should be compliant with the Climate and Forecast (CF) Metadata Conventions, version 1.6 In addition, metadata files that were generated during the merge are also provided in this directory The databank follows the version control protocols set out for GHCNM-v3. Namely: The formal designation is variant.monthly.stage3.vX.Y.Z[optionally -betan].yyyymmdd where X = major upgrades of unspecified nature and always accompanied by a peer reviewed manuscript Y = substantial modifications to the databank, including a new set of stations or substantive changes to merge algorithms. Accompanied by a technical note published on the ftp site. Z = minor revisions to both data and processing software that are tracked in "CHANGELOG_DATABANK". yyyy = year in which the update to the databank occurred. mm = month in which the update to the databank occurred. dd = day in which the update to the databank occurred. ISTI METADATA FORMAT (INVENTORY_monthly_merged_stage3 and INVENTORY_monthly_withheld_stage3) Variable Columns Type -------- ------- --------- ID 1-12 Integer NAME 14-43 Character COUNTRY 45-64 Character LATITUDE 66-75 Real LONGITUDE 77-86 Real ELEVATION 88-95 Real START_TMAX 97-100 Integer END_TMAX 102-105 Integer START_TMIN 107-110 Integer END_TMIN 112-115 Integer START_TAVG 117-120 Integer END_TAVG 122-125 Integer ID2 127-146 Character EXTRA_INFO 148-158 Character Variable Definitions: ID: station identifier. Type of Identifier dependent on file user is looking at INVENTORY_monthly_merged_stage3 (Recommended Merge): 11 Digit Station ID, structured in similar fashion to GHCN-Daily INVENTORY_monthly_merged_stage3 (Other Variants): 8 Digit Key produced by merge INVENTORY_monthly_withheld_stage3 (All): 8 Digit Key produced by merge NAME: station name. Blank = missing COUNTRY: country of origin. Blank = missing LATITUDE: latitude of station in decimal degrees -999.9000 = missing. LONGITUDE: longitude of station in decimal degrees -999.9000 = missing. ELEVATION: is the station elevation in meters. -999.90 = missing. START_TMAX: Year data record begins TMAX temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) END_TMAX: Year data record ends TMAX temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) START_TMIN: Year data record begins TMIN temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) END_TMIN: Year data record ends TMIN temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) START_TAVG: Year data record begins TAVG temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) END_TAVG: Year data record ends TAVG temperature. 9999 = missing. (INVENTORY_monthly_merged_stage3 ONLY) ID2: 19 character compisiton of station identification. The first 2 digits correspond to the position in the prioritized source list, followed by an underscore. The following alpha-numeric characters compose of the id provided within the stage2 source. If no such id was provided, an incremental number was applied. EXTRA_INFO: Extra information, depending on which file user is looking at INVENTORY_monthly_merged_stage3 (Recommended Merge): 8 Digit Key produced by merge, appended by "REC" INVENTORY_monthly_merged_stage3 (Other Variants): Blank INVENTORY_monthly_withheld_stage3 (All): Flags indicating reason station was withheld 101= Missing Metadata 102= Poor Metadata 103= No Data Comparison made, best station does not reach second metadata threshold 104= Data Comparison made, metrics insufficient to merge or unique station 105= Merged Station has less than 12 months of data 106= Metadata metric >= 0.90, however data comparisons were so poor station became unique 107= Blacklist could not resolve candidate station. Automatically withheld ISTI DATA FORMAT (merged and withheld files) Variable Columns Type -------- ------- --------- NAME 1-30 Character LATITUDE 32-41 Real LONGITUDE 43-52 Real ELEVATION 54-61 Real YEAR 63-66 Integer MONTH 67-68 Integer DAY 69-70 Integer VALUE_TMAX 72-76 Integer VALUE_TMIN 78-82 Integer VALUE_TAVG 84-88 Integer FLAG_STAGE0 90-92 Integer FLAG_STAGE1 94-96 Integer FLAG_TYPE 98-100 Integer FLAG_MOC 102-104 Integer FLAG_MODC_TMAX 106-108 Integer FLAG_MODC_TMIN 110-112 Integer FLAG_MODC_TAVG 114-116 Integer FLAG_MOMC_TMAX 118-120 Integer FLAG_MOMC_TMIN 122-124 Integer FLAG_MOMC_TAVG 126-128 Integer FLAG_MOT 130-132 Integer TMAX_SRCFLAG 134-141 Character TMIN_SRCFLAG 143-150 Character TAVG_SRCFLAG 152-159 Character Variable Definitions: NAME: station name LATITUDE: latitude of station in decimal degrees -999.9 = missing. LONGITUDE: longitude of station in decimal degrees -999.9 = missing. ELEVATION: is the station elevation in meters. -999.9 = missing. YEAR: 4 digit year of the station record MONTH: 2 digit month of the station record DAY: 2 digit day of the station record (XX for monthly data) VALUE_TMAX: maximum temperature value (missing = -9999). Values are in hundredths of a degree Celsius, but are expressed as whole integers (e.g. divide by 100.0 to get whole degrees Celsius). VALUE_TMIN: minimum temperature value (missing = -9999). Values are in hundredths of a degree Celsius, but are expressed as whole integers (e.g. divide by 100.0 to get whole degrees Celsius). VALUE_TAVG: average temperature value (missing = -9999). Values are in hundredths of a degree Celsius, but are expressed as whole integers (e.g. divide by 100.0 to get whole degrees Celsius). FLAG_STAGE0: Data source for Stage 0 files (Originated from Stage 2 Data) 101: Paper, NCDC 102: Paper, JMA 103: Paper, Australian BOM 104: Paper, Met Service of New Zealand 105: Paper, Royal Netherlands Meteorological Institute (KNMI) 201: Images, University Rovira I Virgili, Centre for Climate Change 301: Images, Databank Stage 0 FTP Site 302: Images, EDADS website, NCDC 303: Images, NOAA Library Website 999: Missing/Unknown/Not Applicable FLAG_STAGE1: Data source for Stage 1 files (Originated from Stage 2 Data) 100: NCDC International Collection 101: High Plains Regional Climate Center 102: NCDC DSI-3200 103: NCDC DSI-3206 104: University Rovira I Virgili, Centre for Climate Change 105: NCDC CDMP Digital Archive 106: Japan Meteorological Agency 107: Met Service of New Zealand 108: European Climate Assessment & Data Project 109: University of Alabama: Huntsville 110: Antarctic Meteorological Research Center 111: Meteo France 112: National Institute for Space Research (INPE) - Brazil 113: MeteoSwiss 114: Nicholas Copernicus Univ IPY Collection 115: University of Melbourne 116: UKMET Office 117: INIA (Instituto Nacional de Investigacion Agropecuaria): Uruguay 118: Australian BOM 119: Environment Canada 120: International Arctic Research Center: Univ of Alaska at Fairbanks 121: Central Institute for Meteorology and Geodynamics (ZAMG) 122: National Snow and Ice Data Center (NSIDC) 123: Instituto Nacional De Meteorologia E Hidrologia: Ecuador 124: Scientific Committee on Atmospheric Research 125: Databank Stage 2 Daily Source (converted to monthlies) 126: Databank Stage 3 Daily Source (converted to monthlies) 127: Databank Stage 4 Daily Source (converted to monthlies) 128: States of Jersey Meteorological Department 129: Meteo Russia (RIHMI - WDC) 130: University of Giessen 131: CISL Research Data Archive 132: INTA (National Institute of Agricultural Technology): Argentina 133: India Meteorological Department 134: DWD (Deutscher Wetterdienst): Germany 135: Universidad de la Republica, Montevideo, Uruguay 136: Norwegian Meteorological Institute 999: Missing/Unknown/Not Applicable FLAG_TYPE: Type (Originated from Stage 2 Data) 101: Raw 102: Quality Controlled by originator 103: Homogenized by originator 999: Missing/Unknown/Not Applicable FLAG_MOD: Mode of Digitization (Originated from Stage 2 Data) 101: Keyed, SourceCorp 102: Keyed, CDMP 103: Keyed, CDMP Forts Project 104: Keyed, Local originator 000: Auto Collect 999: Missing/Unknown/Not Applicable FLAG_MODC_TMAX: Mode of DAILY Calculation of Maximum Temperature (Originated from Stage 2 Data) 101: Daily value original 102: Daily value calculated from main standard synoptic observations (00,06,12,18 UTC) 103: Daily value calculated from main and intermediate synoptic observations (00,03,06,09,12,15,18,21 UTC) 104: Daily value calculated from other sub-daily observations (at least 3 obs available) 105: Daily value calculated from other sub-daily observations (at least 20 obs available) 999: Missing/Unknown/Not Applicable FLAG_MODC_TMIN: Mode of DAILY Calculation of Minimum Temperature (Originated from Stage 2 Data) FLAG_MODC_TAVG: Mode of DAILY Calculation of Average Temperature (Originated from Stage 2 Data) FLAG_MOMC_TMAX: Mode of MONTHLY Calculation of Maximum Temperature (Originated from Stage 2 Data) 000: Monthly value original 001-031: Monthly value calculated from daily average (number indicates number of days available) 999: Missing/Unknown/Not Applicable FLAG_MOMC_TMIN: Mode of MONTHLY Calculation of Minimum Temperature (Originated from Stage 2 Data) FLAG_MOMC_TAVG: Mode of MONTHLY Calculation of Average Temperature (Originated from Stage 2 Data) FLAG_MOT: Mode of Transmission (Originated from Stage 2 Data) 101: Mail 102: E-Mail 103: FTP 104: SRRS FTP 105: NOAA Port 106: NMHS Web Service 107: Telephone Modem 108: Direct Datalogger download/PDA 109: Other Satellite 999: Missing/Unknown/Not Applicable TMAX_SRCFLAG: 8 digit source of TMAX value (MISSING=XXXXXXXX). First 2 digits correspond to the position in the prioritized source list If last 6 digits say "UPDATE," then these values were added during the monthly near real time update system TMAX_SRCFLAG: 8 digit source of TMIN value. TMAX_SRCFLAG: 8 digit source of TAVG value. GHCN METADATA FORMAT Variable Columns Type -------- ------- ---- ID 1-11 Integer LATITUDE 13-20 Real LONGITUDE 22-30 Real STNELEV 32-37 Real NAME 39-68 Character FLAG 70-72 Integer Variable Definitions: ID: station identifier. Type of Identifier dependent on file user is looking at Merge, Recommended: 11 Digit Station ID, structured in similar fashion to GHCN-Daily Merge, Other Variants and All Withheld: 8 Digit Key produced by merge, appended by following 3 digit code REC for Recommended Merge (withheld only) VRX For Varariant Number (i.e., VR1 for variant1) LATITUDE: latitude of station in decimal degrees -99.0000 = missing. LONGITUDE: longitude of station in decimal degrees -999.0000 = missing. STELEV: is the station elevation in meters. -999.0 = missing. NAME: station name FLAG: Flags indicating reason station was withheld (blank for merged) 101= Missing Metadata 102= Poor Metadata 103= No Data Comparison made, best station does not reach second metadata threshold 104= Data Comparison made, metrics insufficient to merge or unique station 105= Merged Station has less than 12 months of data 106= Metadata metric >= 0.90, however data comparisons were so poor station became unique 107= Blacklist could not resolve candidate station. Automatically withheld GHCN DATA FORMAT Variable Columns Type -------- ------- ---- ID 1-11 Integer YEAR 12-15 Integer ELEMENT 16-19 Character VALUE1 20-24 Integer . . . . . . . . . VALUE12 108-112 Integer Variable Definitions: ID: station identifier. Type of Identifier dependent on file user is looking at Merge, Recommended: 11 Digit Station ID, structured in similar fashion to GHCN-Daily Merge, Other Variants and All Withheld: 8 Digit Key produced by merge, appended by following 3 digit code REC for Recommended Merge (withheld only) VRX For Varariant Number (i.e., VR1 for variant1) YEAR: 4 digit year of the station record. ELEMENT: element type, monthly mean temperature="TAVG" monthly maximum temperature="TMAX" monthly minimum temperature="TMIN" VALUE: monthly value (MISSING=-9999). Temperature values are in hundredths of a degree Celsius, but are expressed as whole integers (e.g. divide by 100.0 to get whole degrees Celsius). META COMPARISONS FORMAT (merge_metadata/metadata_comparisons) Variable Columns Type -------- ------- --------- ELEMENT_CODE 1-3 Character SOURCE_NUMBER 5-6 Integer TARGET_ID 8-15 Integer CANDIDATE_ID 17-24 Integer TARGET_ID2 26-40 Character TARGET_NAME 42-61 Character CANDIDATE_ID2 63-77 Character CANDIDATE_NAME 79-98 Character DIST_METRIC 100-109 Real HEIGHT_METRIC 111-120 Real JI_METRIC 122-131 Real YEAR_METRIC_1 133-142 Real YEAR_METRIC_2 144-153 Real META_METRIC 155-164 Real OVERLAP_1 166-169 Integer OVERLAP_2 171-174 Integer ID_MATCH 176-176 Boolean Variable Definitions: ELEMENT_CODE: Identifier letting user know which comparisons are being made TXN: TMAX and TMIN comparisons TVG: TAVG comparisons only SOURCE_NUMBER: Number identifying place in source heirarchy TARGET_RealID: identifier (key) of target stataion CANDIDATE_ID: identifier (key) of candidate stataion TARGET_ID2: 19 character composition of target station identification. The first 2 digits correspond to the position in the prioritized source list, followed by an underscore. The following alpha-numeric characters compose of the id provided within the stage2 source. If no such id was provided, an incremental number was applied. TARGET_NAME: target station name CANDIDATE_ID2: 19 character composition of candidate station identification. The first 2 digits correspond to the position in the prioritized source list, followed by an underscore. The following alpha-numeric characters compose of the id provided within the stage2 source. If no such id was provided, an incremental number was applied. CANDIDATE_NAME: candidate station name DIST_METRIC: metric calculated after calculating geographic distance between target and candidate station. -9999 = missing. HEIGHT_METRIC: metric calculated after calculating height difference between target and candidate station. -9999 = missing. JI_METRIC: metric calculated after calculating jaccard index between target and candidate station names. -9999 = missing. YEAR_METRIC_1: First year metric between target and candidate station (TMAX or TAVG). -9999 = missing. YEAR_METRIC_2: Second year metric between target and candidate station (TMIN or TAVG). -9999 = missing. META_METRIC: metadata metric between target and candidate station. -9999 = missing. OVERLAP_1: First overlap period between target and candidate station (TMAX or TAVG). OVERLAP_2: Second overlap period between target and candidate station (TMIN or TAVG). ID_MATCH: Boolean to determine if ID2 matches between Target and Candidate. DATA COMPARISONS FORMAT (merge_metadata/data_comparisons) Variable Columns Type -------- ------- --------- ELEMENT_CODE 1-3 Character SOURCE_NUMBER 5-6 Integer TARGET_ID 8-15 Integer CANDIDATE_ID 17-24 Integer TARGET_NAME 26-45 Character CANDIDATE_NAME 47-66 Character META_METRIC 68-75 Real OVERLAP1 77-80 Integer OVERLAP2 82-85 Integer IA_1 87-94 Real H1_1 96-103 Real H2_1 105-112 Real IA_2 114-121 Real H1_2 123-130 Real H2_2 132-139 Real PST_METRIC_SAME 141-148 Real PST_METRIC_UNIQ 150-157 Real START_1 159-162 Integer END_1 164-167 Integer START_2 169-172 Integer END_2 174-177 Integer Variable Definitions: ELEMENT_CODE: Identifier letting user know which comparisons are being made TXN: TMAX and TMIN comparisons TVG: TAVG comparisons only SOURCE_NUMBER: Number identifying place in source hierarchy TARGET_ID: identifier (key) of target station CANDIDATE_ID: identifier (key) of candidate station TARGET_NAME: target station name CANDIDATE_NAME: candidate station name META_METRIC: metadata metric between target and candidate station. -9999 = missing. The next few variables have 2 results each IF ELEMENT_CODE = TXN, then first is TMAX and second is TMIN IF ELEMENT_CODE = TVG, then both are TAVG (duplicate) This is done to maintain data format consistency. OVERLAP1: First overlap period between target and candidate station. OVERLAP2: Second overlap period between target and candidate station. IA_1: First Index of Agreement calculated between target and candidate station. -9999 = missing. H1_1: First metric of station match (H1) between target and candidate station. -9999 = missing. H2_1: First metric of station uniqueness (H2) between target and candidate station. -9999 = missing. IA_2: Second Index of Agreement calculated between target and candidate station. -9999 = missing. H1_2: Second metric of station match (H1) between target and candidate station. -9999 = missing. H2_2: Second metric of station uniqueness (H2) between target and candidate station. -9999 = missing. PST_METRIC_SAME: Final posterior metric of station match between target and candidate station. -9999 = missing. PST_METRIC_UNIQ: Final posterior metric of station uniqueness between target and candidate station. -9999 = missing. START1: First year start of overlap period between target and candidate station. END1: First year end of overlap period between target and candidate station. START2: Second year start of overlap period between target and candidate station. END2: Second year end of overlap period between target and candidate station. MERGING INFO FORMAT (merge_metadata/merging_info) Variable Columns Type -------- ------- --------- ELEMENT_CODE 1-3 Character SOURCE_NUMBER 5-6 Integer TARGET_ID 8-15 Integer CANDIDATE_ID 17-24 Integer TARGET_NAME 26-55 Character CANDIDATE_NAME 57-86 Character OVERLAP1 88-92 Integer OVERLAP2 94-98 Integer META_METRIC 100-109 Real PST_METRIC_SAME 111-120 Real PST_METRIC_UNIQ 122-131 Real MERGE_REASON 133-135 Integer HAS_NEW_1 137-137 Boolean HAS_NEW_2 139-139 Boolean Variable Definitions: ELEMENT_CODE: Identifier letting user know which comparisons are being made TXN: TMAX and TMIN comparisons TVG: TAVG comparisons only SOURCE_NUMBER: Number identifying place in source hierarchy TARGET_ID: identifier (key) of target station CANDIDATE_ID: identifier (key) of candidate station TARGET_NAME: target station name CANDIDATE_NAME: candidate station name OVERLAP have 2 results each IF ELEMENT_CODE = TXN, then first is TMAX and second is TMIN IF ELEMENT_CODE = TVG, then both are TAVG (duplicate) This is done to maintain data format consistency. OVERLAP1: First overlap period between target and candidate station. OVERLAP2: Second overlap period between target and candidate station. META_METRIC: metadata metric between target and candidate station. -9999 = missing. PST_METRIC_SAME: Final posterior metric of station match between target and candidate station. -9999 = missing. PST_METRIC_UNIQ: Final posterior metric of station uniqueness between target and candidate station. -9999 = missing. MERGE_REASON: Flags indicating reason target and candidate station was chosen for merge 101= No data comparison made, decision to merge was based on metadata metric and passed threshold 102= At least one IA calculated, however best metadata comparison was non-overlap case 103= At least one IA calculated, posterior metric of station match passed same threshold 104= Stations chosen to be unique, but passed last look test, so they merged 105= Stations with non-overlapping data merged because they passed the ID test 106= Stations chosen to be unique, but passed ID test, so they merged 107= Stations with overlapping data merged because they passed the ID test HAS_NEW have 2 results each IF ELEMENT_CODE = TXN, then first is TMAX and second is TMIN IF ELEMENT_CODE = TVG, then both are TAVG (duplicate) This is done to maintain data format consistency. HAS_NEW_1: Boolean determining if first candidate series has new data that can be added to first target series HAS_NEW_2: Boolean determining if second candidate series has new data that can be added to second target series ***************************************** E-MAIL ***************************************** Any questions or inquiries should be sent to general.enquiries@surfacetemperatures.org