This readme.txt file was generated on <20220424> by <Xuanting Hao>
Recommended citation for the data: 


-------------------
GENERAL INFORMATION
-------------------
Suggested Citation for dataset: Hao, Xuanting; Shen, Lian. (2022). Supporting Data for "A novel machine learning method for accelerated modeling of the downwelling irradiance field in the upper ocean". Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/742c-c863.

1. Title of Dataset 
Supporting Data for "A novel machine learning method for accelerated modeling of the downwelling irradiance field in the upper ocean"

2. Author Information


  Principal Investigator Contact Information
        Name: Lian Shen
        Institution: Department of Mechanical Engineering and St. Anthony Falls Laboratory, University of Minnesota
           Address: 2 3rd Ave SE, Minneapolis, MN 55414 
           Email: shen@umn.edu
	   ORCID: 0000-0003-3762-3829

  Associate or Co-investigator Contact Information
        Name: Xuanting Hao
        Institution: Department of Mechanical Engineering and St. Anthony Falls Laboratory, University of Minnesota
        Address: 2 3rd Ave SE, Minneapolis, MN 55414 
        Email: haoxx081@umn.edu
	  ORCID: 0000-0003-4898-1074


3. Date of data collection: 20220424


4. Overview of the data (abstract): The training data are generated from the Monte Carlo simulation of oceanic irradiance field. They can be used for training a neural network that significantly accelerates the prediction of irradiance in the upper ocean.


--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 


1. Licenses/restrictions placed on the data: Attribution-NonCommercial 3.0 United States

2. Links to publications that cite or use the data: Hao, X., & Shen, L. (2022). A novel machine learning method for accelerated modeling of the downwelling irradiance field in the upper ocean. Geophysical Research Letters, 49, e2022GL097769.
https://doi.org/10.1029/2022GL097769

3. Was data derived from another source?
           If yes, list source(s):

4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use




---------------------
DATA & FILE OVERVIEW
---------------------


1. File List
   A. Filename:   wavelight_6to5.png      
      Short description: Thumbnail image showing a wave field and the irradiance field        

   B. Filename:  mldat3d_clear.tar.gz      
      Short description: Training data for the clear case       
        
   C. Filename:  mldat3d_turbid.tar.gz 
      Short description:  Training data for the turbid case

   D. Filename:  trainingcode.zip
      Short description:  Training code and result


2. Relationship between files:        



--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data: 
   The training data are calculated from the raw data generated using the Monte Carlo simulation of irradiance field.

2. Methods for processing the data: 
   Step 1: The irradiance fields in the raw data are transformed into coefficients of the plane response function (PRF, for              
           definition see the paper) as the output of the neural network.
   Step 2: The wave geometry (input) and the coefficients of PRF (output) are normalized as the training data.

3. Instrument- or software-specific information needed to interpret the data:
   Data files can be read using any programming language that supports the HDF5 library. The pretrained neural network parameters and the source code files can be read using Julia. 

4. Standards and calibration information, if appropriate:


5. Environmental/experimental conditions: 
   To redo the training, follow these steps: 
	(1) Extract contents from trainingcode.zip and you will get two folders case_clear and case_turbid;
      (2) Delete all "args*.bson", "lossconv*.h5", and "model*.bson" files under these two folders;
      (3) Extract contents from mldat3d_clear.tar.gz and move the folder mldat3d to case_clear;
      (4) Extract contents from mldat3d_turbid.tar.gz and move the folder mldat3d to case_turbid;
      (5) Enter the folder case_clear or case_turbid and run "julia trainingedrfit3d.jl nr" in the command line to train the neural network. Here, nr can be 2, 3, 4 or 5. 

   To validate the pre-trained model, follow these steps: 
	(1) Extract contents from trainingcode.zip and you will get two folders case_clear and case_turbid;
      (2) Extract contents from mldat3d_clear.tar.gz and move the folder mldat3d to case_clear;
      (3) Extract contents from mldat3d_turbid.tar.gz and move the folder mldat3d to case_turbid;
      (4) Enter the folder case_clear, comment out line 321, uncomment line 324, and run "julia trainingedrfit3d.jl nr" in the command line to reconstruct irradiance at 4 randomly selected locations. Here, nr can be 2, 3, 4 or 5. 
      (5) Enter the folder case_turbid, comment out line 306, uncomment line 309, and run "julia trainingedrfit3d.jl nr" in the command line to reconstruct irradiance at 4 randomly selected locations. Here, nr can be 2, 3, 4 or 5.    
   


6. Describe any quality-assurance procedures performed on the data:


7. People involved with sample collection, processing, analysis and/or submission:
   Xuanting Hao (email: haoxx081@umn.edu)



-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: nr(2-5)/strain(1-2500).h5 in mldat3d_clear.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 2500 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: x
       Description: a 3 (row) by 331 (column) matrix. Each row is a dimensional input vector (eta-z, deta/dx, deta/dy) to the neural network. Here eta-z (Unit: meter) is the vertical distance from the photon emission location to the plane of interest, deta/dx is the partial derivative of eta(x,y) to x, and deta/dy is the partial derivative of eta(x,y) to y. Each column corresponds to a specific depth.


    B. Name: y
       Description: a 3+nr (row) by 331 (column) matrix. Each row is a dimensional output vector (xm, ym, Em, C1, ... C(nr)) to the neural network. Here xm and ym are the coordinates (Unit: meter) of the peak irradiance, Em is the value of the peak irradiance, and C1 to C(nr) (nr=2-5) are the coefficients of the PRF. Each column corresponds to a specific depth.


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: trans(2-5).bson in mldat3d_clear.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: xtrans
       Description: transformation for normalizing the input vectors to the neural network. 


    B. Name: sytrans
       Description: transformation for normalizing the output vectors to the neural network. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: allstrainnorm(2-5).h5 in mldat3d_clear.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 4 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: x
       Description: a 3 (row) by 427500 (column) matrix. Each row is a dimensionless input vector to the neural network. 


    B. Name: y
       Description: a 3+nr (row) by 427500 (column) matrix. Each row is a dimensionless output vector to the neural network. 



-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: nr(2-5)/strain(1-4000).h5 in mldat3d_turbid.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 4000 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: x
       Description: a 3 (row) by 331 (column) matrix. Each row is a dimensional input vector (eta-z, deta/dx, deta/dy) to the neural network. Here eta-z (Unit: meter) is the vertical distance from the photon emission location to the plane of interest, deta/dx is the partial derivative of eta(x,y) to x, and deta/dy is the partial derivative of eta(x,y) to y. Each column corresponds to a specific depth.


    B. Name: y
       Description: a 3+nr (row) by 331 (column) matrix. Each row is a dimensional output vector (xm, ym, Em, C1, ... C(nr)) to the neural network. Here xm and ym are the coordinates (Unit: meter) of the peak irradiance, Em is the value of the peak irradiance, and C1 to C(nr) (nr=2-5) are the coefficients of the PRF. Each column corresponds to a specific depth.

-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: trans(2-5).bson in mldat3d_turbid.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: xtrans
       Description: transformation for normalizing the input vectors to the neural network. 


    B. Name: sytrans
       Description: transformation for normalizing the output vectors to the neural network. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: allstrainnorm(2-5).h5 in mldat3d_turbid.tar.gz
-----------------------------------------

1. Number of variables: 2


2. Number of cases/rows: 4 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: x
       Description: a 3 (row) by 1324000 (column) matrix. Each row is a dimensionless input vector to the neural network. 


    B. Name: y
       Description: a 3+nr (row) by 1324000 (column) matrix. Each row is a dimensionless output vector to the neural network. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_clear/args(2-5).bson in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: args
       Description: a struct variable containing the arguments for neural network training. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_clear/model(2-5).bson in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: m
       Description: parameters of the pretrained neural network. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_clear/lossconv(2-5).h5 in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: lossconv
       Description: a 1D array recording the values of the loss function at each iteration.


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_clear/trainingedrfit3d.jl in trainingcode.zip
-----------------------------------------

This is the source code for training the neural network.

1. Number of variables: None


2. Number of cases/rows: None


3. Missing data codes: None


4. Variable List: None


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_turbid/args(2-5).bson in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: args
       Description: a struct variable containing the arguments for neural network training. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_turbid/model(2-5).bson in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.bson file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: m
       Description: parameters of the pretrained neural network. 


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_turbid/lossconv(2-5).h5 in trainingcode.zip
-----------------------------------------

1. Number of variables: 1


2. Number of cases/rows: 4 (each *.h5 file corresponds to one case)


3. Missing data codes: None


4. Variable List
                  

    A. Name: lossconv
       Description: a 1D array recording the values of the loss function at each iteration.


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: case_turbid/trainingedrfit3d.jl in trainingcode.zip
-----------------------------------------

This is the source code for training the neural network.

1. Number of variables: None


2. Number of cases/rows: None


3. Missing data codes: None


4. Variable List: None