Dataset Overview | National Centers for Environmental Information (NCEI)

Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs) on 2019-05-28 (NCEI Accession 0291582)

Preview graphic

This dataset contains chemical data collected in the Adriatic Sea, Aegean Sea, Andaman Sea or Burma Sea, Arabian Sea, Arafura Sea, Arctic Ocean, Balearic (or Iberian) Sea, Bali Sea, Baltic Sea, Banda Sea, Barents Sea, Bass Strait, Bay of Bengal, Bering Sea, Bismarck Sea, Black Sea, Celebes Sea (Sulawesi Sea and Mindanao Sea), Ceram Sea or Seram Sea, Coral Sea, East China Sea (Tung Hai), East Siberian Sea, English Channel, Flores Sea, Great Australian Bight, Greenland Sea (including Iceland Sea and North Greenland Sea), Gulf of Aden, Gulf of Bothnia, Gulf of Finland, Gulf of Guinea, Gulf of Oman, Gulf of Riga, Gulf of Thailand, Gulf of Tomini, Halmahera Sea, Indian Ocean, Ionian Sea, Japan Sea, Java Sea, Kara Sea, Kattegat, The Sound, Great Belt, Little Belt, Laccadive Sea, Laptev (or Nordenskjold) Sea, Ligurian Sea, Makassar Strait, Malacca Straits, Mediterranean Sea, Mediterranean Sea - Eastern Basin, Mediterranean Sea - Western Basin, Molucca Sea, Mozambique Channel, North Atlantic Ocean, North Pacific Ocean, North Sea, Norwegian Sea, Persian Gulf (Gulf of Iran), Philippine Sea, Red Sea, Ross Sea, Savu Sea, Sea of Okhotsk, Skagerrak, Solomon Sea, South Atlantic Ocean, South China Sea (Nan Hai), South Pacific Ocean, Southern Ocean, Sulu Sea, Tasman Sea, Timor Sea, Tyrrhenian Sea, White Sea, and Yellow Sea (Hwang Hai) on 2019-05-28. These data include dN15_NO3 and depth. These data were collected by Dario Marconi of Princeton University, Patrick Rafter of University of California-Irvine, and Aaron Bagnell and Timothy DeVries of University of California-Santa Barbara as part of the "Collaborative research: Combining models and observations to constrain the marine iron cycle (Fe Cycle Models and Observations)" project. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) submitted these data to NCEI on 2019-06-17.

The following is the text of the dataset description provided by BCO-DMO:

Estimated nitrate d15N

Dataset Description:
Acquisition Description:
For complete methodology, refer to Rafter et al. (2019). In summary:

Data Compilation: Nitrate d15N observations were compiled from studies dating from 1975 to 2018. This global ocean nitrate d15N database was interpolated using an ensemble of artificial neural networks (EANNs). For the compiled observed global ocean nitrate d15N data, see the related dataset: https://www.bco-dmo.org/dataset/768627

Building the neural network model: We utilize an ensemble of artificial neural networks (EANNs) to interpolate our global ocean nitrate d15N database, producing complete 3D maps of the data. By utilizing an artificial neural network (ANN), a machine learning approach that effectively identifies nonlinear relationships between a target variable (the isotopic dataset) and a set of input features (other available ocean datasets), we can fill holes in our data sampling coverage of nitrate d15N.

Binning target variables (Step 1): We binned the nitrate d15N observations to the World Ocean Atlas 2009 (WOA09) grid with a 1-degree spatial resolution and 33 vertical depth layers (0-5500 m). When binning vertically, we use the depth layer whose value is closest to the observation's sampling depth (e.g. the first depth layer has a value of 0 m, the second of 10 m, and the third of 20 m, so all nitrate isotopic data sampled between 0-5 m fall in the 0 m bin; between 5-15 m they fall in the 10 m bin, etc.). An observation with a sampling depth that lies right at the midpoint between depth layers is binned to the shallower layer. If more than one raw data point falls in a grid cell we take the average of all those points as the value for that grid cell. Certain whole ship tracks of nitrate d15N data were withheld from binning to be used as an independent validation set.

Obtaining input features (Step 2): Our input dataset contains a set of climatological values for physical and biogeochemical ocean parameters that form a non-linear relationship with the target data. We have six input features including objectively analyzed annual-mean fields for temperature, salinity, nitrate, oxygen, and phosphate taken from the WOA09 ( https://www.nodc.noaa.gov/OC5/WOA09/woa09data.html ) at 1-degree resolution. Additionally, daily chlorophyll data from Modis Aqua for the period Jan-1-2003 through Dec-31-2012 is averaged and binned to the WOA09 grid (as described in Step 1) to produce an annual climatological field of chlorophyll values, which we then log transform to reduce their dynamic range.

The choice of these specific input features was dictated by our desire to achieve the best possible R2 value on our internal validation sets (Step 4). Additional inputs besides those we included, such as latitude, longitude, silicate, euphotic depth, or sampling depth either did not improve the R2 value on the validation dataset or degraded it, indicating that they are not essential parameters for characterizing this system globally. By opting to use the set of input features that yielded the best results for the global oceans, we potentially overlooked combinations of inputs that perform better at regional scales. However, given the scarcity of d15N data in some regions, it is not possible to ascribe the impact of a specific combination of input features versus the impact of available d15N data, which may not be representative of the region's climatological state, to the relative model performance in these regions.

Training the ANN (Step 3): The architecture of our ANN consists of a single hidden layer, containing 25 nodes, that connects the biological and physical input features (discussed in Step 2) to the target nitrate isotopic variable (as discussed in Step 1). The role of the hidden layer is to transform input features into new features contained in the nodes. These are given to the output layer to estimate the target variable, introducing nonlinearities via an activation function. The number of nodes in this hidden layer, as well as the number of input features, determines the number of adjustable weights (the free parameters) in the network. For complete information, refer to Rafter et al. (2019).

Validating the ANN (Step 4): To ensure good generalization of the trained ANN, we randomly withhold 10% of the d15N data to be used as an internal validation set for each network. This is data that the network never sees, meaning it does not factor into the cost function, so it works as a test of the ANN's ability to generalize. This internal validation set acts as a gatekeeper to prevent poor models from being accepted into the ensemble of trained networks (see Step 5). A second, independent or 'external' validation set, composed of complete ship transects from the high and low latitude ocean were omitted from binning in Step 1 and used to establish the performance of the entire ensemble. Our rationale for using complete ship transects is the following. If we randomly chose 10% of observations to perform an external validation, this dataset will be from the same cruises as the wider data. In other words, despite being randomly selected, the validating observational dataset will be highly correlated geographically. Contrast this with validating the EANN results with observations from whole research cruises in unique geographic regions—areas where the model has not "learned" anything about nitrate. We therefore argue that these observations from whole ship tracks therefore provide a more difficult test of the model.

Forming the Ensemble (Step 5): The ensemble is formed by repeating Steps 3 to 4 (using a different random 10% validation set) until we obtain 25 trained networks for the nitrate d15N dataset. A network is admitted into the ensemble if it yields an R² value greater than 0.81 on the validation dataset. For complete information, refer to Rafter et al. (2019).

Dataset Citation

Cite as: Rafter, Patrick; Bagnell, Aaron; DeVries, Timothy; Marconi, Dario (2024). Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs) on 2019-05-28 (NCEI Accession 0291582). [indicate subset used]. NOAA National Centers for Environmental Information. Dataset. https://www.ncei.noaa.gov/archive/accession/0291582. Accessed [date].

Dataset Identifiers

ISO 19115-2 Metadata

gov.noaa.nodc:0291582

Full Text · XML

Download Data	HTTPS (download) Navigate directly to the URL for data access and direct download. FTP (download) These data are available through the File Transfer Protocol (FTP). FTP is no longer supported by most internet browsers. You may copy and paste the FTP link to the data into an FTP client (e.g., FileZilla or WinSCP).
Distribution Formats	MAT-file TSV
Ordering Instructions	Contact NCEI for other distribution options and instructions.
Distributor	NOAA National Centers for Environmental Information +1-301-713-3277 NCEI.Info@noaa.gov
Dataset Point of Contact	NOAA National Centers for Environmental Information ncei.info@noaa.gov

Time Period	2019-05-28 to 2019-05-28
Spatial Bounding Box Coordinates	West: .5 East: 179.5 South: -78.5 North: 83.5
Spatial Coverage Map

General Documentation

NCEI Dataset Landing Page
Navigate directly to the URL for a descriptive web page with download links.
Descriptive Information
Navigate directly to the URL for a descriptive web page with download links.

Associated Resources

Biological, chemical, physical, biogeochemical, ecological, environmental and other data collected from around the world during historical and contemporary periods of biological and chemical oceanographic exploration and research managed and submitted by the Biological and Chemical Oceanography Data Management Office (BCO-DMO)
- NCEI Collection
  Navigate directly to the URL for data access and direct download.
Rafter, P., Bagnell, A., Marconi, D., DeVries, T. (2019) Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs). Biological and Chemical Oceanography Data Management Office (BCO-DMO). Dataset version 2019-05-28. https://doi.org/10.1575/1912/bco-dmo.768655.1
- https://doi.org/10.1575/1912/bco-dmo.768655.1 (download)
  originator dataset

gov.noaa.nodc:BCO-DMO

Publication Dates	publication: 2024-04-21
Data Presentation Form	Digital table - digital representation of facts or figures systematically displayed, especially in columns
Dataset Progress Status	Complete - production of the data has been completed Historical archive - data has been stored in an offline storage facility
Data Update Frequency	As needed
Purpose	This dataset is available to the public for a wide variety of uses including scientific research and analysis.
Use Limitations	accessLevel: Public Distribution liability: NOAA and NCEI make no warranty, expressed or implied, regarding these data, nor does the fact of distribution constitute such a warranty. NOAA and NCEI cannot assume liability for any damages caused by any errors or omissions in these data. If appropriate, NCEI can only certify that the data it distributes are an authentic copy of the records that were accepted for inclusion in the NCEI archives.

Dataset Citation	Cite as: Rafter, Patrick; Bagnell, Aaron; DeVries, Timothy; Marconi, Dario (2024). Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs) on 2019-05-28 (NCEI Accession 0291582). [indicate subset used]. NOAA National Centers for Environmental Information. Dataset. https://www.ncei.noaa.gov/archive/accession/0291582. Accessed [date].
Cited Authors	Rafter, Patrick University of California-Irvine Bagnell, Aaron University of California-Santa Barbara DeVries, Timothy University of California-Santa Barbara Marconi, Dario Princeton University
Principal Investigators	Patrick Rafter University of California - Irvine Aaron Bagnell University of California - Santa Barbara (UCSB) Timothy DeVries University of California - Santa Barbara (UCSB) Dario Marconi Princeton University
Contributors	University of California - Santa Barbara (UCSB) Princeton University University of California - Irvine
Resource Providers	Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Points of Contact	Biological and Chemical Oceanography Data Management Office (BCO-DMO)
Publishers	NOAA National Centers for Environmental Information
Acknowledgments	Funding provided by NSF Division of Ocean Sciences (NSF OCE) Award Number: OCE-1658392 Award URL: http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1658392

Theme keywords	NODC DATA TYPES THESAURUS DELTA NITROGEN-15 NODC OBSERVATION TYPES THESAURUS chemical WMO_CategoryCode oceanography BCO-DMO Standard Parameters dN15_NO3 depth latitude longitude Global Change Master Directory (GCMD) Science Keywords EARTH SCIENCE > OCEANS > OCEAN CHEMISTRY > NITROGEN Originator Parameter Names d15N d15N_stdev depth latitude longitude
Data Center keywords	NODC COLLECTING INSTITUTION NAMES THESAURUS Princeton University University of California - Irvine University of California - Santa Barbara NODC SUBMITTING INSTITUTION NAMES THESAURUS Biological and Chemical Oceanography Data Management Office Global Change Master Directory (GCMD) Data Center Keywords BCO-DMO > Biological and Chemical Oceanography Data Management Office
Place keywords	NODC SEA AREA NAMES THESAURUS Adriatic Sea Aegean Sea Andaman Sea or Burma Sea Arabian Sea Arafura Sea Arctic Ocean Balearic (or Iberian) Sea Bali Sea Baltic Sea Banda Sea Barents Sea Bass Strait Bay of Bengal Bering Sea Bismarck Sea Black Sea Caspian Sea Celebes Sea (Sulawesi Sea and Mindanao Sea) Ceram Sea or Seram Sea Coral Sea East China Sea (Tung Hai) East Siberian Sea English Channel Flores Sea Great Australian Bight Greenland Sea (including Iceland Sea and North Greenland Sea) Gulf of Aden Gulf of Bothnia Gulf of Finland Gulf of Guinea Gulf of Oman Gulf of Riga Gulf of Thailand Gulf of Tomini Halmahera Sea Indian Ocean Ionian Sea Japan Sea Java Sea Kara Sea Kattegat, The Sound, Great Belt, Little Belt Laccadive Sea Laptev (or Nordenskjold) Sea Ligurian Sea Makassar Strait Malacca Straits Mediterranean Sea Mediterranean Sea - Eastern Basin Mediterranean Sea - Western Basin Molucca Sea Mozambique Channel North Atlantic Ocean North Pacific Ocean North Sea Norwegian Sea Pacific Remote Islands Marine National Monument Papahānaumokuākea Marine National Monument Persian Gulf (Gulf of Iran) Philippine Sea Red Sea Ross Sea Savu Sea Sea of Okhotsk Skagerrak Solomon Sea South Atlantic Ocean South China Sea (Nan Hai) South Pacific Ocean Southern Ocean Sulu Sea Tasman Sea Timor Sea Tyrrhenian Sea White Sea Yellow Sea (Hwang Hai) Global Change Master Directory (GCMD) Location Keywords CONTINENT > ASIA > WESTERN ASIA > BLACK SEA CONTINENT > ASIA > WESTERN ASIA > CASPIAN SEA CONTINENT > EUROPE > EASTERN EUROPE > BLACK SEA OCEAN > ARCTIC OCEAN OCEAN > ARCTIC OCEAN > BARENTS SEA OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN > BALTIC SEA OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN > MEDITERRANEAN SEA OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN > MEDITERRANEAN SEA > ADRIATIC SEA OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN > NORTH SEA OCEAN > ATLANTIC OCEAN > NORTH ATLANTIC OCEAN > NORWEGIAN SEA OCEAN > ATLANTIC OCEAN > SOUTH ATLANTIC OCEAN OCEAN > INDIAN OCEAN OCEAN > INDIAN OCEAN > ARABIAN SEA OCEAN > INDIAN OCEAN > ARABIAN SEA > PERSIAN GULF OCEAN > INDIAN OCEAN > BAY OF BENGAL OCEAN > INDIAN OCEAN > RED SEA OCEAN > PACIFIC OCEAN > CENTRAL PACIFIC OCEAN > HAWAIIAN ISLANDS OCEAN > PACIFIC OCEAN > NORTH PACIFIC OCEAN OCEAN > PACIFIC OCEAN > NORTH PACIFIC OCEAN > BERING SEA OCEAN > PACIFIC OCEAN > NORTH PACIFIC OCEAN > SEA OF JAPAN OCEAN > PACIFIC OCEAN > NORTH PACIFIC OCEAN > SEA OF OKHOTSK OCEAN > PACIFIC OCEAN > SOUTH PACIFIC OCEAN OCEAN > PACIFIC OCEAN > WESTERN PACIFIC OCEAN > EAST CHINA SEA OCEAN > PACIFIC OCEAN > WESTERN PACIFIC OCEAN > SOUTH CHINA AND EASTERN ARCHIPELAGIC SEAS OCEAN > PACIFIC OCEAN > WESTERN PACIFIC OCEAN > SOUTH CHINA SEA OCEAN > PACIFIC OCEAN > WESTERN PACIFIC OCEAN > YELLOW SEA OCEAN > SOUTHERN OCEAN OCEAN > SOUTHERN OCEAN > ROSS SEA
Project keywords	BCO-DMO Standard Projects Collaborative research: Combining models and observations to constrain the marine iron cycle (Fe Cycle Models and Observations) Provider Funding Award Information Funding provided by NSF Division of Ocean Sciences (NSF OCE) Award Number: OCE-1658392
Keywords	NCEI ACCESSION NUMBER 0291582

Use Constraints	Cite as: Rafter, Patrick; Bagnell, Aaron; DeVries, Timothy; Marconi, Dario (2024). Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs) on 2019-05-28 (NCEI Accession 0291582). [indicate subset used]. NOAA National Centers for Environmental Information. Dataset. https://www.ncei.noaa.gov/archive/accession/0291582. Accessed [date].
Data License	This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. SPDX License: Creative Commons Attribution 4.0 International (CC-BY-4.0)
Access Constraints	Use liability: NOAA and NCEI cannot provide any warranty as to the accuracy, reliability, or completeness of furnished data. Users assume responsibility to determine the usability of these data. The user is responsible for the results of any application of this data for other than its intended purpose.
Fees	In most cases, electronic downloads of the data are free. However, fees may apply for custom orders, data certifications, copies of analog materials, and data distribution on physical media.

Lineage information for: dataset
Processing Steps	2024-04-21T18:34:41Z - NCEI Accession 0291582 v1.1 was published.
Output Datasets	NCEI Accession 0291582 v1.1 NCEI Accession 0291582 v1.1 (download) published 2024-04-21T18:34:41Z

Last Modified: 2024-05-31T15:15:28Z
For questions about the information on this page, please email: ncei.info@noaa.gov

Estimated nitrate d15N modeled using an ensemble of artificial neural networks (EANNs) on 2019-05-28 (NCEI Accession 0291582)

Follow us

Contact us