#############################################################################
    Data Repository and Archive for:
#############################################################################

Title:
    Tuning conformational asymmetry in particle-forming diblock copolymer alloys.
Authors:
    Logan J. Case
    Frank S. Bates
    Kevin D. Dorfman
Journal:
    Soft Matter
Year:
    2023
DOI:
    10.1039/d2sm01332k

#############################################################################
    Software
#############################################################################

The C++ version of the PSCF software was used for calculations in this
work.
Github: https://github.com/dmorse/pscfpp.git

The exact source code version used on this project is also included in
the repository in directory root/pscfpp_code/

Analysis of data was done via Jupyter Notebooks hosted on MSI resources.

#############################################################################
    Data Files and Naming Conventions
#############################################################################

To reduce the size of the repository, many of the raw data files
(such as fields and data files output by PSCFpp) have been omitted from this
repository. Instead, the repository contains all files necessary to rerun
the calculations and regenerate the raw data files.
Thoughout this repository, anywhere that raw data was contained
(and where it may be regenerated) you will find the following files:

        param files
            Parameter files read by PSCF. These may have different
            prefixes ( such as `up_param` or `down_param` )
            but the filename will always end with `param`
        
        command files
            Command sequences executed by PSCF for the calculation.
            These will always be named `command`.
        
        initial potential fields
            These are chemical potential fields in the symmetry-adapted
            basis format. These are the initial fields used for the
            calculation. They will be named either `in.omega` or `in.bf`
        
        jobscript
            These are shell scripts used to launch a batch job via SLURM
            on the MSI supercomputer system. These will show you the
            command used when running the calculation.
        
Files related to data parsing and data analysis are all included.
A brief overview of relevant files:

        sweepData.csv
            After running calculations, the raw data files output
            by PSCFpp were parsed using a python script (dataCollection.py)
            and the relevant data were placed in a CSV file.
            These files will appear a directory above where the calculation
            was actually completed. Wherever this file is seen,
            the calculations reflected in the file will have been run in
            a sub-directory named for each phase. 
            For example, within this repository at [root]/chi/chi30_0/
            you will find a sweepData.csv file, in addition to directories
            named for A15, C14, and other phases.
            The data in [root]/chi/chi30_0/sweepData.csv represents the data
            at each state point converged during the sweeps in the
            [root]/chi/chi30_0/[phase]/ sub-directories.
            These .csv files contain essentially all relevant phase data.
            Column labels for the CSV are listed below, with a description
            of the data contained in that column.
            ---------------
            phase :
                Name of the phase, as used for the directory (i.e. A15, bccAB, althex_i)
            prefix :
                The data in this row was obtained from file [phase]/[prefix].dat relative
                to the directory containing the sweepData.csv file.
            fHelm :
                Helmholtz free energy returned by PSCFpp
            kuhn0, kuhn1, kuhn2 :
                Statistical segment length of monomer A (kuhn0), B (kuhn1) and C (kuhn2).
            chiAB, chiAC, chiBC :
                Flory-Huggins parameter between monomers AB, AC, and BC, respectively.
            Na0, Nb0 :
                The length of the A (Na0) and B (Nb0) blocks in the AB chain.
            Nc1, Nb1 :
                The length of the C (Nc1) and B` (Nb1) blocks in the B`C chain.
            phi0, phi1 :
                The blend fraction of AB (phi0) and B`C (phi1) chains.
                If calculation is Canonical ensemble, this is specified in the param
                file. If Grand Canonical, this is returned by PSCFpp based on the
                converged field.
            mu0, mu1, dMu :
                The chemical potential of the AB (mu0) and B`C (mu1) chains,
                as well as the difference between them ( dMu = mu1 - mu0 ).
                If calculation is Canonical ensemble, this is the value returned by
                PSCFpp, which has no useful interpretation since the Pressure
                is unspecified. dMu values are used occasionally as an estimate
                when starting Grand Canonical calculations.
            system, cellparam0, cellparam1 :
                The crystal system (system) of the unit cell (cubic, hexagonal, etc)
                and unit cell parameters (cellparam0, cellparam1) from the converged
                solution. Unit cell parameters are as output by PSCFpp. If only one
                unit cell parameter is required (such as in cubic unit cells) the
                last column (cellparam1) will be blank for the row.
            
        *.ipynb
            All data analysis was done in Jupyter Notebooks, hosted on
            the MSI clusters. All of these Notebooks are included in
            the repository and contain analysis code as well as many
            results.
        
        *.pickle
            After computing the common tangents for each canonical
            dataset, the python objects used to organize that data
            (and storing the common tangent) were written out to a
            file via pickle. Loading objects from these pickled files
            meant that there was no need to rerun the common tangents.
            These files are all generated by and should only be read from
            the Jupyter notebooks used for analysis.
        
        contained_data_files.txt
            This is a text file generated by the dataCollection.py python
            script. It contains a list of the (absolute) path to all
            `sweepData.csv` files. The paths are absolute as stored on MSI,
            thus they will not be strictly accurate on any other system.
            They can be used as a reference to see where calculations were
            run relative to the alloys root directory.
    
In general, scripts used to set up calculations, launch SLURM jobs, and
organize data will not be of much interest.
Many of these setup scripts and template files have been moved from their
original locations into a sub-directory called 'setupScripts/'.
When a directory of this name is seen, the scripts it contains should be
run from one directory up (where you see `setupScripts/`).
These scripts are not very cleanly written, nor very robust.
Generally, directories or files with names such as `startData` `setup`
`init` or other such phrases will relate to this.


#############################################################################
    Repository Structure
#############################################################################

To reduce the size of the repository, many of the raw data files have been
excluded. Instead, the repository contains all files necessary to rerun
the calculations and regenerate the raw data files.

Files generated from analysis of data, or data collections (such as csv files)
are still included. These contain all data relevant for the analysis.

=================================
    root/
=================================

    The root directory contains several files.
    A few of interest
    
        `README`
                This file
                
        `phase_diagram_analysis.ipynb`
                Jupyter Notebook in which
                analysis was done to produce the phase diagrams.
                
        `*.json`
                JSON files used to store the many data series
                that make up a phase diagram.
                Several of these are incremental backups of
                each other, i.e. `phaseDiagSeries*.json`
                The most recent version of data for the phase diagram
                in Fig. 4 are found in `phaseDiagSeries_new.json`
                Data used for Fig. 5 are found in `ac15_phaseDiagSeries.json`.
                
        `Tangent_Figures.pdf`
                A PDF document containing all common tangent figures used during
                analysis of Canonical ensemble data. The figures are organized
                according to conditions at which data was collected, and the figure
                in which the data was used.

The root also contains 4 sub-directories that are of interest
regarding presented results. (3 others are merely for setup)

=================================
    root/kuhn/
=================================

    This directory contains the results from analyzing
    the impact of chain length and conformational asymmetry
    (Fig 2 and S2).
    
    Data in this directory is organized in a tiered manner.
    Directories root/kuhn/chi25/ and root/kuhn/chi28/ contain
    the data at chiN=25 and chiN=28, respectively.
    
    Within each of these directories are sub-directories
    a/, c/, and ac/ which identify the statistical segment
    length being varied to produce conformational asymmetries
    Thus, data in a/ varies epsilon_AB, c/ varies epsilon_BC,
    and ac/ varies both epsilon values. 
    
    Each directory chi[25,28]/[a,c,ac]/ contains an init/ directory
    used to sweep upward in statistical segment length to introduce
    conformational asymmetry. They also contain sub-directories
    kuhn100/, kuhn125/, kuhn150/, which contain data at the referenced
    conformational asymmetry for the appropriate chain;
    kuhn100 is conformationally symmetric, kuhn125 and kuhn150 are
    conformational asymmetries of 1.25 and 1.5, respectively.
    
    Finally, each kuhn[100,125,150]/ directory contains an init
    directory used to sweep the chain lengths to acquire starting
    fields for each sweep in phi. They also contain directories
    for each chain length asymmetry, named according to the
    length of the B`C chain (with AB chain fixed at N_AB = 1)
    as `nbc_#_#` where the `#` values symbolize the digits before
    and after a decimal point. Thus, nbc_0_5 contains data for
    N_BC/N_AB=0.5, while nbc_1_2 contains data for N_BC/N_AB=1.2.
    
    The calculations were actually performed in the phase-named sub-directories
    below this, with the `sweepData.csv` files being found in the nbc_#_#
    directories (see entry above on the .csv files).
    
    
=================================
    root/chi/
=================================

    Canonical ensemble calculations for the primary
    phase diagram (Fig. 4) for a system with N_BC/N_AB = 1.0
    and epsilon_BC = 1.5. This directory contains its own
    README which can be referenced.
                        
=================================
    root/grandcanonical/
=================================

    Canonical ensemble calculations for the primary phase diagram.
    This directory contains its own README file describing its contents.
    Unlike other directory trees in this repository, the grandcanonical/
    directory contains all of its raw data files.
                        
=================================
    root/ac_conf_asym/canonical
=================================
    
    Canonical ensemble calculations for the case of both chains having
    conformational asymmetry (Figure 5)